TEXT SUMMARISATION IN TYPESCRIPT WITH TRANSFORMERS.JS
An example of what we’ll be doing in this article If you’re a long-time follower of this website, you probably know by now how much I’ve been advocating for the use of JavaScript (and TypeScript) as a second language for any data practitionner that might want to broaden their horizon and learn some new and useful skills. I was therefore very excited when HugginFace recently announced that they would soon be porting their state-of-the-art transformers libraries to the JavaScript ecosystem.
SIMPLIFY WEBSITE SCRAPING WITH TRAFILATURA
Below is an example of what we’ll be doing in this article: In early 2022, I wrote a very basic Python program to scrape some articles from an Irish website named The Journal. Long story short, all I needed at that time was to capture the content of Covid-related articles as well as their attached user comments, and attempt to train a model on that data. A bit less than six months later, that simple .
STARBOARD.GG, AND OTHER NOTEBOOK ENVIRONMENTS FOR NON-PYTHON DATA SCIENCE: PART I
Have you ever wondered what makes a language be a good fit for a particular space or not? Its design choices, overall syntax, and to a lesser extent speed and performance are arguably some of the first elements that you’ll likely hear when asking this question around. I personally think that tooling and the landscape of existing dependencies also play a huge role in the adoption of a given language by a specific community.
AUTOMATING PIPELINES WITH AIRFLOW'S TASKGROUP
Earlier this year, I got involved in a fun little side project at work. As part of our “Build Great Teams” initiative, I was tasked with providing a simple tech newsletter for my co-workers. We didn’t need anything fancy really: just a bi-weekly curation of articles that I could find about data science, programming, and the tech industry in general. As the project gained in scope, I decided to refactor the couple of Python scripts I had written so that I could get the whole pipeline to run from a Raspberry Pi at home.
THE POLARS DATAFRAME LIBRARY, BUT FOR RUBY
An example of what we’ll be doing in this article I was reading some random conversation threads on HackerNews the other day when I came across an article which announced that Polars had just been ported to the Ruby programming language. Now, unless you have been living under a rock for the past year or so, you probably know that Polars is a data manipulation library that was written entirely in Rust.
EXPLORING STRING DISTANCES WITH TYPESCRIPT AND TALISMAN
An example of what we’ll be doing in this article Most of the articles I write for this website are inspired by problems that I come across at work. Seeing large influx of spammy, and most likely automatically generated user content is pretty common. And to be fair, most social media platforms have become quite good at catching that type of content before it can even start causing harm to their userbase.
OPENAI'S WHISPER IS SO GOOD IT CAN TRANSCRIBE ANY SONG'S LYRICS
Though a quick Google search confirms that the first attempts at having computers identify and extract spoken language date back to the early 1950s, voice recognition technology has only recently been made accessible to the general public. Like most of my friends, I own a small Alexa device at home. And like them, I never use it. I was yet very excited when I discovered about OpenAI’s Whisper on Hacker News a little while ago.
TOPIC MODELLING VISUALISATION WITH ANYCHART.JS
An example of what we’ll be doing in this article Foreword: this post is dedicated to my workmate and friend Martin, who recently showed me some pretty cool stuff he has been doing with sankey charts Back in the early days of the 2020 pandemic, I got a bit bored at home and started thinking about creating a website. I remember that the first idea that I got was that I would write a couple of articles dedicated to topic modelling, and see where that would take me to.
ARQUERO: A GREAT DATAFRAME TOOLKIT FOR JAVASCRIPT
An example of what we’ll be doing in this article: Most open positions for data related jobs on any popular employment website will likely list Python or R as the languages that applicants must be skilled in. But hey, nobody leaves JavaScript in the corner! Data manipulation packages for the Node ecosystem have grown a lot over the past three or four years, to a point where they have become a credible alternative to using more popular Python or R based libraries such as Pandas or Dplyr.
EXPLAINING SENTIMENT SCORES WITH TRANSFORMERS AND SHAP
An example of what we’ll be doing in this article: Wouldn’t sentiment analysis be made easier if we could find a way to show which terms or chunks of terms within a given corpus contribute to the overall sentiment score of the corpus or some of its parts? I recently came across this pretty neat library named SHAP which amongst many other things provides some useful tools for explaining sentiment scores.