I know quite a few people who will do anything to avoid having to work with a Pandas dataframe. Funnily enough, they're usually much better programmers than I will ever be. But they're stuck in some sort of Catch-22-like situation where they can't memorise Pandas' most basic functionalities because they never use the library at all, which in turn makes them even more relunctant to try and manipulate dataframes as they can't remember which methods and attributes to use. As a matter of fact, a lot of them much pefer relying on SQL-based data exploration tools like Apache Superset or Meta's Daiqueri notebooks.
Though I personally live working with Pandas dataframes and find them extremely powerful, I must admit that some basic tasks like data exploration can be made much easier through the use of libraries like PyGWalker.
Please note that this article is likely going to be slightly shorter than the ones I have posted recently!
PyGWalker was brought to the Python community by a China-based company named Kanaries, that has developed a set of very useful exploratory data tools such as:
The reason why this is important to mention is that most previous attempts at building a graphical interface for Pandas dataframes have been abandoned. From what I can see, libraries like PandasGUI or Tabloo haven't been updated in over 2 or 3 years. Which is absolutely fine in itself: we're all very busy with our jobs, and starting a side-project doesn't mean you have to remain committed to it if you no longer can or want to.
That being said, as an end-user you might want to take this issue into consideration if you plan on basing a long-term project on a particular solution and expect regular updates and bug fixes. Meanwhile, as the folks at Kanaries seem to have started a company through which they intend to monetise and further develop their suite of products, we can hope that PyGWalker as well as their other products might have a slightly longer life cycle than their aforementioned alternatives.
So what exactly is a graphical interface for Pandas dataframe? Let's see if we can find anyhting of interest on PyGWalker's GitHub page:
"PyGWalker can simplify your Jupyter Notebook data analysis and data visualization workflow, by turning your pandas dataframe (and polars dataframe) into a Tableau-style User Interface for visual exploration."
Well that sounds promising! Now what we need first is some data, that we can then feed into PyGWalker and start exploring. We'll be using the YFinance Python library to retrieve some stocks information for Amazon, Google, Tesla, and Meta:
import yfinance as yf
import pandas as pd
companies = ["AMZN","GOOG","TSLA","META"]
tickers = yf.Tickers(companies)
def getDataFrame(start_date,end_date):
dataframe = (
tickers
.history(
start=start_date,
end=end_date,
interval="1d"
)
.stack(level=1)
.rename_axis(["Date", "Ticker"])
.reset_index()
.iloc[:,:3]
)
return dataframe
df = getDataFrame("2023-06-30","2023-08-01")
If you however prefer generating your own synthetic data instead, you can achieve something similar using the Faker library:
import pandas as pd
from faker import Faker
import random
from datetime import date, timedelta
from pprint import pprint
fake = Faker()
companies = [fake.company() for _ in range(4)]
today = date.today()
def getFakeData(howmany):
struct = {
"Date": [],
"Company": [],
"Open Price": [],
"Close Price": []
}
for i in range(howmany):
for c in companies:
price = random.randint(100,150)
struct["Date"].append(str(today - timedelta(i)))
struct["Company"].append(c)
struct["Open Price"].append(price)
struct["Close Price"].append(price + random.randint(1,10))
return struct
data = getFakeData(200)
df = pd.DataFrame(data)
df.head(20)
Regardless of the option you chose, all we have to do now is simply pass our dataframe object into PyGWalker and see what happens:
import pygwalker as pyg
import warnings
warnings.filterwarnings("ignore")
walker = pyg.walk(df)
We are greeted with this Tableau-inspired UI that shows us various fields grouped by data type, a comprehensive menu bar on the top, as well as a filters section in the middle. Let's see what happens when we drag the Date
serie into the X-Axis field and then Close
column into the Y-Axis field this time:
As expected, we get a nice line chart that should look somewhat familiar to you if you've ever used Vega-Lite (and Altair if you're coming from Python). Which makes sense, as PyGWalker's visualisations are based on the popular JavaScript charting package.
Please also note that our Close
column as been automatically aggregated by sum, and we should change this to mean instead. To get an individual line for each of our tickers, simply drag and drop the Ticker
serie onto the color
field:
Now to be fair this is basic Tableau stuff, so we should move on to what I think really sets PyGWalker apart from its competitors.
First thing is, we might want to pass some parameters to pygwalker.walk()
and reload the application:
import pygwalker as pyg
walker = pyg.walk(
df,
use_kernel_calc=True,
dark="dark"
)
What setting use_kernel_calc
to True
does, is allow for the use of DuckDB as the main computing engine for our data. This leads to PyGWalker being able to process larger dataset faster within a local environment.
The menu bar on top lists a few important features that you'll end up using a lot:
Mark Type
lets you to change the chart typeStack Mode
offers various options for bar and line charts, allowing you to stack / center / normalise your data points
Export
: charts can be saved locally as svg or png files, while your dataset can be exported to csv format
the Config
menu gives you access to more parameters, including the possibility to pick from a wide range of supported colour palettes
Though PyGWalker definitely has a lot more to offer, there isn't much I can add to this article without ending up simply replicating some of the examples that the Kanaries team are sharing on their GitHub page. At this point I think you should really try and play around with PyGWalker to see if its ease-of-use and no-code approach is something that you might enjoy or not.
I also recommend you to check a library named Sweetviz, which has a much less customisable interface but offers more statistical insights into your data.