Below is an example of what we’ll be doing in this article: In early 2022, I wrote a very basic Python program to scrape some articles from an Irish website named The Journal. Long story short, all I needed at that time was to capture the content of Covid-related articles as well as their attached user comments, and attempt to train a model on that data. A bit less than six months later, that simple .
Have you ever wondered what makes a language be a good fit for a particular space or not? Its design choices, overall syntax, and to a lesser extent speed and performance are arguably some of the first elements that you’ll likely hear when asking this question around. I personally think that tooling and the landscape of existing dependencies also play a huge role in the adoption of a given language by a specific community.
Earlier this year, I got involved in a fun little side project at work. As part of our “Build Great Teams” initiative, I was tasked with providing a simple tech newsletter for my co-workers. We didn’t need anything fancy really: just a bi-weekly curation of articles that I could find about data science, programming, and the tech industry in general. As the project gained in scope, I decided to refactor the couple of Python scripts I had written so that I could get the whole pipeline to run from a Raspberry Pi at home.
An example of what we’ll be doing in this article I was reading some random conversation threads on HackerNews the other day when I came across an article which announced that Polars had just been ported to the Ruby programming language. Now, unless you have been living under a rock for the past year or so, you probably know that Polars is a data manipulation library that was written entirely in Rust.
An example of what we’ll be doing in this article Most of the articles I write for this website are inspired by problems that I come across at work. Seeing large influx of spammy, and most likely automatically generated user content is pretty common. And to be fair, most social media platforms have become quite good at catching that type of content before it can even start causing harm to their userbase.
Though a quick Google search confirms that the first attempts at having computers identify and extract spoken language date back to the early 1950s, voice recognition technology has only recently been made accessible to the general public. Like most of my friends, I own a small Alexa device at home. And like them, I never use it. I was yet very excited when I discovered about OpenAI’s Whisper on Hacker News a little while ago.
An example of what we’ll be doing in this article Foreword: this post is dedicated to my workmate and friend Martin, who recently showed me some pretty cool stuff he has been doing with sankey charts Back in the early days of the 2020 pandemic, I got a bit bored at home and started thinking about creating a website. I remember that the first idea that I got was that I would write a couple of articles dedicated to topic modelling, and see where that would take me to.
An example of what we’ll be doing in this article: Wouldn’t sentiment analysis be made easier if we could find a way to show which terms or chunks of terms within a given corpus contribute to the overall sentiment score of the corpus or some of its parts? I recently came across this pretty neat library named SHAP which amongst many other things provides some useful tools for explaining sentiment scores.