CHOOSING A MODEL FOR TIME SERIES REGRESSION WITH PYCARET
I have always found working with times series extremely interesting. I especially like the fact that it often starts with a single succession of data points, aka a “line”, and that understanding the dynamics behind this “line” usually involves creating adaptations of that initial set of values, like calculating a moving average or looking for indicators of seasonality for instance. PyCaret is a low code machine learning framework for the Python programming language.
USING THE COMMAND LINE TO EXTRACT CONTEXTUAL WORDS FROM TEXTUAL DATA
September 2008 was my first encounter with computational linguistics, as it used to be called back then. I was starting my second (and last year) of MA at the University of Bordeaux and as I began doing some research for my disseration thesis, I encountered a problem that I thought a computer would be able to help with. What I was trying to do was to obtain, within a single text corpus, the n preceding and following words for any given word.
BEAUTIFUL CHARTS WITH VEGALITE.JS
I’m always a bit surprised when I read negative comments about JavaScript. My take on it might not be a very popular one, but I quite like the language it has evolved to be, and its ecosystem isn’t as bad as some people like to picture it. Though there have been some very interesting initiatives over the past few years, such as DataForge or more recently Arquero, JavaScript is still lacking a strong library for querying and manipulating data tables.
BASIC IN-BROWSER TEXT PROCESSING USING COMPROMISE.JS
Though JavaScript might not be as obvious a choice as Python when it comes to Natural Language Processing libraries, its ecosystem actually features some highly performing text processing packages. And this actually makes perfect sense, as such dependencies are very much needed to build mobile or web based applications such as chatbots for instance. Finding the right tool for the job Over the past few months, I have experimented a bit with the following Node packages:
EVALUATE TOPICS COHERENCE WITH PALMETTO
Over the last couple of years, I have been dabbling a bit with Topic modelling, and to this day I still find this niche subset of computational linguistics to be quite fascinating. Some very interesting research papers have been published over the past decade, and recent forays into the field of natural language processing such as transformers, combined with the development of some new libraries, have certainly brought a welcome breath of fresh air to that field.
METHOD CHAINING WITH PANDAS
This article is going to be slightly shorter than what I usually tend to post, but I hope you will enjoy it nonetheless! I like a good podcast I was recently looking for some podcasts I could listen to while running, and stumbled upon a series of interesting interviews with Matt Harrisson. The name sounded familiar, and I realised that I had purchased a couple of his books and thoroughly enjoyed them.
MAKE YOUR WEBSITES PRETTIER WITH MVP.CSS
A couple of weeks ago, as I was browsing through recent comments on Hacker News when I stumbled upon a conversation around minimalistic web design. As my HTML and CSS skills are quite limited, I thought I might take a look at some of the resources that were being shared there and see how they could benefit my own work. Make your ugly HTML files prettier So, what is MVP.css? According to its author, Andy Brewer, it is a “a minimalist stylesheet for HTML elements”.
I LOVE YOU, TWEET-PREPROCESSOR
This is going to be a short article, but I really wanted to share a pretty neat library named tweet-preprocessor that I stumbled upon while reading some random stuff on Hacker News. The cool bit I have a confession to make: I have never been able to remember anything about regular expressions. To this day, I still struggle to implement even the most basic character filtering routine. I find myself having to go through the same traumatising process each and every time.
SAVE A PANDAS DATAFRAME AS A SQL TABLE USING SQLALCHEMY
There are many reasons why we might want to save tabular data into a SQL database rather than simply outputting Pandas dataframes into .csv files. We might, for instance, have an automated script that runs daily and extracts a certain amount of tweets using the Tweepy library. The script could be pulling tweets from different users, or use different hashtags and / or search terms. This would be best saved into separate tables, gathered within a single database.
POS TAGGING AND NAMED ENTITIES RECOGNITION USING SPACY
I have often found that one of the easiest and most effective ways to approach short textual data like comments or tweets, is to try and discover high-level patterns and visualise them. Topic modelling requires a bit of trial and error, while looking for recurring contextual words might be more suited for larger chunks of unstructured data such as blog articles or novels. So, most of the times, I start by peforming the following two tasks: