AUTOMATING PIPELINES WITH AIRFLOW'S TASKGROUP

Earlier this year, I got involved in a fun little side project at work. As part of our “Build Great Teams” initiative, I was tasked with providing a simple tech newsletter for my co-workers. We didn’t need anything fancy really: just a bi-weekly curation of articles that I could find about data science, programming, and the tech industry in general. As the project gained in scope, I decided to refactor the couple of Python scripts I had written so that I could get the whole pipeline to run from a Raspberry Pi at home.

THE POLARS DATAFRAME LIBRARY, BUT FOR RUBY

An example of what we’ll be doing in this article I was reading some random conversation threads on HackerNews the other day when I came across an article which announced that Polars had just been ported to the Ruby programming language. Now, unless you have been living under a rock for the past year or so, you probably know that Polars is a data manipulation library that was written entirely in Rust.

EXPLORING STRING DISTANCES WITH TYPESCRIPT AND TALISMAN

An example of what we’ll be doing in this article Most of the articles I write for this website are inspired by problems that I come across at work. Seeing large influx of spammy, and most likely automatically generated user content is pretty common. And to be fair, most social media platforms have become quite good at catching that type of content before it can even start causing harm to their userbase.

OPENAI'S WHISPER IS SO GOOD IT CAN TRANSCRIBE ANY SONG'S LYRICS

Though a quick Google search confirms that the first attempts at having computers identify and extract spoken language date back to the early 1950s, voice recognition technology has only recently been made accessible to the general public. Like most of my friends, I own a small Alexa device at home. And like them, I never use it. I was yet very excited when I discovered about OpenAI’s Whisper on Hacker News a little while ago.

TOPIC MODELLING VISUALISATION WITH ANYCHART.JS

An example of what we’ll be doing in this article Foreword: this post is dedicated to my workmate and friend Martin, who recently showed me some pretty cool stuff he has been doing with sankey charts Back in the early days of the 2020 pandemic, I got a bit bored at home and started thinking about creating a website. I remember that the first idea that I got was that I would write a couple of articles dedicated to topic modelling, and see where that would take me to.

ARQUERO: A GREAT DATAFRAME TOOLKIT FOR JAVASCRIPT

An example of what we’ll be doing in this article: Most open positions for data related jobs on any popular employment website will likely list Python or R as the languages that applicants must be skilled in. But hey, nobody leaves JavaScript in the corner! Data manipulation packages for the Node ecosystem have grown a lot over the past three or four years, to a point where they have become a credible alternative to using more popular Python or R based libraries such as Pandas or Dplyr.

EXPLAINING SENTIMENT SCORES WITH TRANSFORMERS AND SHAP

An example of what we’ll be doing in this article: Wouldn’t sentiment analysis be made easier if we could find a way to show which terms or chunks of terms within a given corpus contribute to the overall sentiment score of the corpus or some of its parts? I recently came across this pretty neat library named SHAP which amongst many other things provides some useful tools for explaining sentiment scores.

EXPLORING POS TAGS CO-OCCURRENCE WITH WINKNLP AND HIGHCHARTS.JS

An example of what we’ll be doing in this article: I’ve been playing around a lot with NeuralCoref lately, a pipeline extension for spaCy developed by Hugging Face. If you’re interested in coreference resolution, this article from Hugging Face’s Thomas Wolfe seems like a great place to start. Are we going to discuss neural coreferencing today? Absolutely not. If you head over to NeuralCoref GitHub page, your eyes will probably immediately feel drawn towards this very fancy visualisation that maps the semantic relationship between each terms within a short sentence:

CREATE A SIMPLE IN-BROWSER SQL PLAYGROUND WITH PYSCRIPT

An example of what we’ll be doing in this article Finding an online SQL playground that’s both free and user-friendly can be a little bit challenging. Most platforms, such as StrataScratch for instance, restrict what free tier users can do, while others hide the querying interface under layers or ads and pop-ups. That being said, it’s still possible to find a couple of high-quality solutions, and I personally really like Coderpad.

GOING BEYOND THE SENTIMENT SCORE, PART 1: SENTIMENT.JS

An example of what we’ll be doing in this article A good few years back, I used to work for a bank where part of my daily job was to monitor and evaluate the “happiness score” of our customers across several social media platforms, using a tool called Brandwatch. Amongst many other things, this platform offered its customers the ability to define a set of rules and add a corresponding sentiment tag to each and every mention of their brand or of any of their competitors.