VISUALIZING TEXT DATA HIERARCHY WITH WORD TREES

An example of what we’ll be doing in this article Over the past few weeks, I have been looking for a quick and effective way of representing the structural differences within a set of similar-looking short sentences. To provide a bit of context, as we approached the end of 2022, my workmates and I got heavily involved in a planning phase for the new year to start. More specifically, we were asked to write a set of objectives and key results that would help drive a common strategy across our supported programs and pillars over the months to come.

NETWORK GRAPHS PART I: PYTHON AND JAVASCRIPT

An example of what we’ll be doing in this article A quick note before we start. The purpose of today’s article isn’t to show how network graphs work and discuss their underlying mathematical structure. Instead we’re going to focus on practical applications and easy to reproduce examples, using two of the most popular programming languages of the early 2020s: Python and JavaScript. Typically, a network graph will allow us to visualise the various entities that live within a complex network structure, and see how densily its nodes are connected.

MINIMALIST CSS FRAMEWORKS FOR YOUR DATA SCIENCE PROJECTS

An example of what we’ll be doing in this article: Disclaimer: I am absolutely not a web developer, which is actually why I wanted to write this article! One of the biggest challenges for data practitioners isn’t to explore or process data, but to find effective ways to showcase their work. Of course, we can always build dashboards, share some Jupyter notebooks with our workmates, or paste a few charts and a couple of tables into a Microsoft Word document.

BM25: AN ALTERNATIVE TO TF-IDF IN JAVASCRIPT USING WINKNLP

Below is an example of what we’ll be doing in this article: The JavaScript ecosystem frequently receives a fair amount of criticism for being a bit of a mess, with poorly maintained packages and new front-end or back-end frameworks appearing almost every month. However, if you’re a data science practioner, the past few years have seen JavaScript slowly rise as a perfectly viable option for a second programming language to learn.

PYSCRIPT, AKA PYTHON IN THE BROWSER

An example of what we’ll be doing in this article This article is going to be slightly shorter than the ones I usually post on my blog. As PyScript is after all still fairly new, I’ll need to get a bit more familiar with what this library has to offer before I can start working on more interesting projects and share them here. Why write an article about PyScript then if I’m not too comfortable with this framework yet, you might wonder?

DANFO.JS AND DNOTEBOOK, A PANDAS / JUPYTER COMBO FOR JAVASCRIPT

An example of what we’ll be doing in this article If I had the money to make a very opinionated John Carmack-like bet, I would probably wager that the future language of choice for data science is going to be none other than JavaScript. Jeff Atwood’s Law If you’re a frequent reader of this blog, you are probably pretty familiar with either Python, R, Matlab, or Julia. And yet as mentioned earlier, I think that the programming language you should really start investing some time into is JavaScript.

BERTOPIC, OR HOW TO COMBINE TRANSFORMERS AND TF-IDF FOR TOPIC MODELLING

If you follow this blog, you are probably aware of my interest for natural language processing, and more specifically for topic modelling. As a matter of fact, some of the first articles that I wrote back in 2020 were themed around discussing things like TF-IDF and popular text clustering models. Anyway, if you also happen to share my passion for this niche field, it is quite likely that you have already worked with some or all of the following models:

PANDAS-BOKEH: THE SIMPLICITY OF PANDAS PLOTS, THE INTERACTIVITY OF BOKEH

An example of what we’ll be doing in this article Sometimes, all we want is to be able to use a framework or a library that we’re not too familiar with, without necessarily spending too much time learning its syntax in depth. Personally, and though I have extensively used some visualisation packages such as Matplotlib, Seaborn, Plotly, or Altair, I must confess that Bokeh is one of these tools that I have never given much attention to.

CHOOSING A MODEL FOR TIME SERIES REGRESSION WITH PYCARET

I have always found working with times series extremely interesting. I especially like the fact that it often starts with a single succession of data points, aka a “line”, and that understanding the dynamics behind this “line” usually involves creating adaptations of that initial set of values, like calculating a moving average or looking for indicators of seasonality for instance. PyCaret is a low code machine learning framework for the Python programming language.

USING THE COMMAND LINE TO EXTRACT CONTEXTUAL WORDS FROM TEXTUAL DATA

September 2008 was my first encounter with computational linguistics, as it used to be called back then. I was starting my second (and last year) of MA at the University of Bordeaux and as I began doing some research for my disseration thesis, I encountered a problem that I thought a computer would be able to help with. What I was trying to do was to obtain, within a single text corpus, the n preceding and following words for any given word.