Explore Your Dataframes With PyGWalker

alt text

I know quite a few people who will do anything to avoid having to work with a Pandas dataframe. Funnily enough, they’re usually much better programmers than I will ever be. But they’re stuck in some sort of Catch-22-like situation where they can’t memorise Pandas’ most basic functionalities because they never use the library at all, which in turn makes them even more relunctant to try and manipulate dataframes as they can’t remember which methods and attributes to use. As a matter of fact, a lot of them much pefer relying on SQL-based data exploration tools like Apache Superset or Meta’s Daiqueri notebooks.

Though I personally live working with Pandas dataframes and find them extremely powerful, I must admit that some basic tasks like data exploration can be made much easier through the use of libraries like PyGWalker.

Please note that this article is likely going to be slightly shorter than the ones I have posted recently!

A little bird told me

PyGWalker was brought to the Python community by a China-based company named Kanaries, that has developed a set of very useful exploratory data tools such as:

  • Rath: a graphical interface that is capable of automatically understanding your data and turning it into insights and visualisations
  • Graphic-Walker: a more traditional but very comprehensive dashboard solution, whose interface looks somewhat similar to that of Tableau. Though I really like this open-source tool, it can only ingest csv and json files at the moment and might therefore not be a good candidate for larger user adoption yet

The reason why this is important to mention is that most previous attempts at building a graphical interface for Pandas dataframes have been abandoned. From what I can see, libraries like PandasGUI or Tabloo haven’t been updated in over 2 or 3 years. Which is absolutely fine in itself: we’re all very busy with our jobs, and starting a side-project doesn’t mean you have to remain committed to it if you no longer can or want to.

That being said, as an end-user you might want to take this issue into consideration if you plan on basing a long-term project on a particular solution and expect regular updates and bug fixes. Meanwhile, as the folks at Kanaries seem to have started a company through which they intend to monetise and further develop their suite of products, we can hope that PyGWalker as well as their other products might have a slightly longer life cycle than their aforementioned alternatives.

PygWalker

So what exactly is a graphical interface for Pandas dataframe? Let’s see if we can find anyhting of interest on PyGWalker’s GitHub page:

“PyGWalker can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration.”

Well that sounds promising! Now what we need first is some data, that we can then feed into PyGWalker and start exploring. We’ll be using the YFinance Python library to retrieve some stocks information for Amazon, Google, Tesla, and Meta:

import yfinance as yf

import pandas as pd



companies = ["AMZN","GOOG","TSLA","META"]

tickers = yf.Tickers(companies)



def getDataFrame(start_date,end_date):

    dataframe = (

        tickers

        .history(

            start=start_date,

            end=end_date,

            interval="1d"

        )

        .stack(level=1)

        .rename_axis(["Date", "Ticker"])

        .reset_index()

        .iloc[:,:3]

    )

    return dataframe



df = getDataFrame("2023-06-30","2023-08-01")

alt text

If you however prefer generating your own synthetic data instead, you can achieve something similar using the Faker library:

import pandas as pd

from faker import Faker

import random

from datetime import date, timedelta

from pprint import pprint



fake = Faker()

companies = [fake.company() for _ in range(4)]

today = date.today()



def getFakeData(howmany):

    struct = {

        "Date": [],

        "Company": [],

        "Open Price": [],

        "Close Price": []

    }

    for i in range(howmany):

        for c in companies:

            price = random.randint(100,150)

            struct["Date"].append(str(today - timedelta(i)))

            struct["Company"].append(c)

            struct["Open Price"].append(price)

            struct["Close Price"].append(price + random.randint(1,10))

    return struct



data = getFakeData(200)

df = pd.DataFrame(data)

df.head(20)

alt text

Regardless of the option you chose, all we have to do now is simply pass our dataframe object into PyGWalker and see what happens:

import pygwalker as pyg

import warnings



warnings.filterwarnings("ignore")



walker = pyg.walk(df)

alt text

We are greeted with this Tableau-inspired UI that shows us various fields grouped by data type, a comprehensive menu bar on the top, as well as a filters section in the middle. Let’s see what happens when we drag the Date serie into the X-Axis field and then Close column into the Y-Axis field this time:

alt text

As expected, we get a nice line chart that should look somewhat familiar to you if you’ve ever used Vega-Lite (and Altair if you’re coming from Python). Which makes sense, as PyGWalker’s visualisations are based on the popular JavaScript charting package.

Please also note that our Close column as been automatically aggregated by sum, and we should change this to mean instead. To get an individual line for each of our tickers, simply drag and drop the Ticker serie onto the color field:

alt text

Now to be fair this is basic Tableau stuff, so we should move on to what I think really sets PyGWalker apart from its competitors.

First thing is, we might want to pass some parameters to pygwalker.walk() and reload the application:

import pygwalker as pyg



walker = pyg.walk(

    df,

    use_kernel_calc=True,

    dark="dark"

    )

alt text

What setting use_kernel_calc to True does, is allow for the use of DuckDB as the main computing engine for our data. This leads to PyGWalker being able to process larger dataset faster within a local environment.

The menu bar on top lists a few important features that you’ll end up using a lot:

  • Mark Type lets you to change the chart type

alt text

  • Stack Mode offers various options for bar and line charts, allowing you to stack / center / normalise your data points

  • Export: charts can be saved locally as svg or png files, while your dataset can be exported to csv format

  • the Config menu gives you access to more parameters, including the possibility to pick from a wide range of supported colour palettes

alt text

Final thoughts

Though PyGWalker definitely has a lot more to offer, there isn’t much I can add to this article without ending up simply replicating some of the examples that the Kanaries team are sharing on their GitHub page. At this point I think you should really try and play around with PyGWalker to see if its ease-of-use and no-code approach is something that you might enjoy or not.

I also recommend you to check a library named Sweetviz, which has a much less customisable interface but offers more statistical insights into your data.