An example of what we’ll be discussing in this article
My team recently came across a rather unusual request: build a report that's interactive, easy on the eye, and that can be sent over to multiple stakeholders as a single file.
Now what made the whole thing challenging (and interesting) was the set of constraints that came with this exercise:
In a normal situation, a Business Analyst would have been called in to gather requirements from whoever needs their data visualised. The same Business Analyst would then have passed those requirements on to a team of Data Analysts / Scientists. who would have then built a more or less complex dashboard using either PowerBI or Tableau.
Now for reasons that I won't be going through today, we found ourselves unable to access any of these reporting platforms.
Because of all these constraints, we decided to come up with a different approach: create a single html file that anybody could open in their browser and share across their department. Obviously the same result could have been achieved through a pdf file, but we'll see in a follow-up article why we decided not to go in that direction.
If you've been following my website for a while, you probably know that I'm a big fan of JavaScript. As a matter of fact, I've been advocating for years that anybody working in the field of data should at least learn the basics of web development and get comfortable enough with html and css.
And this is exactly what we'll be trying to show throughout this article. We're going to build a bridge between the most popular data manipulation library for the Python ecosystem and some modern data-friendly frameworks for your browser.
JavaScript has some very powerful data manipulation libraries (make sure you check out Arquero if you want to know more on this topic), but the tool of choice for any modern data practitioner is without a doubt Pandas.
Let's imagine a two-phase scenario where we could use a combination of languages and frameworks:
For phase 1 say we need to perform some advanced cleaning on a dataset, fit a clustering model to it, and output the result of that model to a Pandas DataFrame object
. All that work is usually done in Python and will not be covered in today's article.
We're instead going to focus on phase 2 and see how we can integrate a DataFrame object
into a webpage. We could leverage a visualisation library like ApexCharts and easily build a really fancy-looking dashboard that would beat anything that can be done in Python.
For now I think it makes more sense if we keep things simple and generate some random data using the Faker library:
from faker import Faker
import pandas as pd
import random
from typing import Dict, List, Any
faker = Faker()
def generateData(howmany: int) -> Dict[str,List[Any]]:
x: int = 0
result: Dict[str,List[Any]] = {
"Name": [],
"Job": [],
"Age": [],
"Has children": []
}
while x < howmany:
name: str = faker.name()
job: str = faker.job()
age: int = random.randint(18,66)
has_a_car: str = random.choice([1,0])
result["Name"].append(name)
result["Job"].append(job)
result["Age"].append(age)
result["Has children"].append(has_a_car)
x+=1
return result
fake_data: Dict[str,List[Any]] = generateData(20)
df: pd.DataFrame = pd.DataFrame(fake_data)
df.head()
Alright, we are ready to move on to our next step!
Before we start writing some Python code we should create a file named input.html
and paste the following html5 boilerplate code into it:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://unpkg.com/mvp.css">
<title>Tabulator</title>
</head>
<body>
<main class="container">
<h1>Data table</h1>
{{placeholder_table}}
</main>
</body>
</html>
As you can see we're using a css framework named MVP that's going to take care of the look and feel of our html page for us. If you want to know more about minimalist css frameworks feel free to check out this article that I wrote a couple of years ago.
You're probably also wondering why we inserted this strange {{placeholder_table}}
line in the middle of our template, but let's first open our brand new input.html
file as a string
in Python and save it to a variable:
def openHTMLTemplate(path_to_file: str) -> str:
with open(path_to_file,"r") as html_file:
blueprint: str = html_file.read()
return blueprint
If we were to print the output of openHTMLTemplate()
what we'd see is the html code we wrote a few seconds ago, saved as a python string. Our next step is to write two new functions:
{{placeholder_table}}
string
with some html tagsstring
to a new .html fileHTML_FILE_INPUT: str = "input.html"
HTML_FILE_OUTPUT: str = "output.html"
def saveToHTML(path_to_file: str, content: str) -> None:
with open(path_to_file, "w") as html_file:
html_file.write(content)
def createTable() -> None:
html_blueprint: str = openHTMLTemplate(HTML_FILE_INPUT)
html_blueprint = (
html_blueprint
.replace("{{placeholder_table}}", "<p>Hi mom!</p>")
)
saveToHTML(HTML_FILE_OUTPUT,html_blueprint)
if __name__ == "__main__":
createTable()
After we've run the above code, we can indeed see a brand new html file named output.html
that should look like this:
Alright, so we've replaced our placeholder with some random html tags. What if we could convert the DataFrame object
that we created earlier into an html table? Luckily enough, we do that using Pandas' to_html()
method and obtain a nicely formatted string
that should do the job:
CSV_FILE: str = "tabulator_data.csv"
HTML_FILE_INPUT: str = "input.html"
HTML_FILE_OUTPUT: str = "output.html"
def dataframeToHTML(csv_file: str) -> str:
df_to_html: str = (
pd
.read_csv(csv_file)
.to_html()
)
return df_to_html
As you've probably already guessed, our next step is to slightly amend the createTable()
function and this time replace {{placeholder_table}}
with our freshly created html table:
def createTable():
html_blueprint: str = openHTMLTemplate(HTML_FILE_INPUT)
df_to_html: str = dataframeToHTML(CSV_FILE)
html_blueprint = (
html_blueprint
.replace("{{placeholder_table}}", str(df_to_html))
)
saveToHTML(HTML_FILE_OUTPUT,html_blueprint)
if __name__ == "__main__":
createTable()
Going for this approach has some advantages. We get a very small sized html file that looks relatively nice and that can be shared with a non-technical audience or added to a presentation. We could also add in some more elaborated css or create a bunch of filters, but we'll keep that for a separate article.
Tabulator is an open-source JavaScript library that allows for the creation of interactive and fully customisable tables in your browser.
It's perfectly suited for visualising the output of SQL queries for instance or any type of tabular data that you can think of. To get started, let's create a new html page named input_tabulator.html
and paste the following boilerplate code into it:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link href="https://unpkg.com/tabulator-tables/dist/css/tabulator.min.css" rel="stylesheet">
<script type="text/javascript" src="https://unpkg.com/tabulator-tables@6.2.5/dist/js/tabulator.min.js"></script>
<title>Tabulator</title>
</head>
<body>
</body>
</html>
You'll notice that we have also added links to the following files:
tabulator.min.css
tabulator.min.js
We should first use the simple example that can be found on the Tabulator website. To do so, let's add in a simple a <div>
element with the id of data-table
as well as a pair a <script></script>
tags:
<div id="data-table"></div>
Within that pair a <script></script>
tags, our next move is to create an object
named tabledata
and paste in the few lines of code found on the Tabulator website:
<script>
const tabledata = [
{id:1, name:"Oli Bob", age:"12", col:"red", dob:""},
{id:2, name:"Mary May", age:"1", col:"blue", dob:"14/05/1982"},
{id:3, name:"Christine Lobowski", age:"42", col:"green", dob:"22/05/1982"},
{id:4, name:"Brendon Philips", age:"125", col:"orange", dob:"01/08/1980"},
{id:5, name:"Margret Marmajuke", age:"16", col:"yellow", dob:"31/01/1999"},
];
const table = new Tabulator("#data-table", {
data:tabledata,
layout:"fitColumns",
autoColumns:true,
}
);
</script>
This is what you should see when opening our newly created html file:
Now where I think Tabulator shines is in the level of customisation it offers straight out of the box. We're going to switch gears and ditch MVP for Bulma. Doing so is going to force us to add some classes to our html tags, which I understand can be a bit daunting if you've never built a website before.
There are two reasons why we want to use a slightly more complex css framework:
Simply replace the boilerplate that we initially added to the input.html
file with the following template:
<!DOCTYPE html>
<html lang="en" data-theme="Nuclear">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@1.0.2/css/bulma.min.css">
<link href="tabulator_bulma.min.css" rel="stylesheet">
<script type="text/javascript" src="https://unpkg.com/tabulator-tables@6.2.5/dist/js/tabulator.min.js"></script>
<title>Tabulator</title>
</head>
<body>
<section class="section">
<div class="notification is-light">
<h1 class="title has-text-weight-bold">Tabulator</h1>
</div>
<div id="example-table"></div>
</section>
{{placeholder_javascript}}
</body>
</html>
This time, our JavaScript code will be contained within a Python string, and added to the html output file at a later stage. There's one thing that we should do first though: use Pandas's .to_json()
method to convert our DataFrame
object
into a data format that exactly matches what Tabulator can ingest. To do so, we need to pass orient="records"
as an argument to that method:
def dataframeToJSON(csv_file: str) -> List[Dict[str, Any]]:
df_to_json: List[Dict[str, Any]] = (
pd
.read_csv(csv_file)
.filter(["Name","Age","Job"])
.sample(20)
.to_json(orient="records")
)
return df_to_json
With this out of the way, we can now declare a string
variable named JAVASCRIPT_CODE
(how creative!) and write a few lines of code that will generate a nicely formatted html table for us:
JAVASCRIPT_CODE: str = """
<script>
const tabledata = {{placeholder_data}};
const table = new Tabulator("#example-table", {
data:tabledata,
layout:"fitColumns",
pagination:"local",
paginationSize:7,
paginationCounter:"rows",
initialSort:[ {column:"Age", dir:"desc"} ],
columnDefaults:{ tooltip:true },
columns:[
{title:"First name", field:"Name"},
{title:"Age", field:"Age"},
{title:"Age (bars)", field:"Age", hozAlign:"left", formatter:"progress", formatterParams:{color:[
"#c1e7ff",
"#94bed9",
"#6996b3",
"#3d708f",
"#004c6d"
]}},
{title:"Job", field:"Job"},
{title:"Has children", field:"Has children", hozAlign:"center", formatter:"tickCross", sorter:"boolean"}
]
}
);
</script>
"""
You will have noticed some slight differences between the previous example and this new script:
autoColumns:true
has been replaced with an array of parameters under the value columns
paginationSize
parameterconst tabledata
will be populated by replacing the {{placeholder_data}}
with our Pandas dataframeSpeaking of which, we need to do one last modification to our createTable()
function and add another replace()
string method:
def createTable():
html_blueprint: str = openHTMLTemplate(HTML_FILE_INPUT)
df_to_json: List[Dict[str, Any]] = dataframeToJSON(CSV_FILE)
html_blueprint = (
html_blueprint
.replace("{{placeholder_javascript}}", JAVASCRIPT_CODE)
.replace("{{placeholder_data}}", df_to_json)
)
saveToHTML(HTML_FILE_OUTPUT,html_blueprint)
if __name__ == "__main__":
createTable()
Let's run that code again and refresh our output.html
page:
Now how awesome is that!
You might wonder why go this route and not create a pdf report instead. We'll answer this question in a few weeks and see how we can integrate Tabulator into an interactive data exploration tool that I think provides more value than any type of static report. We'll be creating some plots, filters, and a couple of other features that a simple pdf file just won't be able to offer.
Thanks for reading and see you in a few weeks!
input.html
:
<!DOCTYPE html>
<html lang="en" data-theme="Nuclear">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@1.0.2/css/bulma.min.css">
<link href="tabulator_bulma.min.css" rel="stylesheet">
<script type="text/javascript" src="https://unpkg.com/tabulator-tables@6.2.5/dist/js/tabulator.min.js"></script>
<title>Tabulator</title>
</head>
<body>
<section class="section">
<div class="notification is-light">
<h1 class="title has-text-weight-bold">Tabulator</h1>
</div>
<div id="example-table"></div>
</section>
{{placeholder_javascript}}
</body>
</html>
script.py
:
import pandas as pd
from typing import List, Dict, Any
CSV_FILE: str = "tabulator_data.csv"
HTML_FILE_INPUT: str = "input.html"
HTML_FILE_OUTPUT: str = "output.html"
JAVASCRIPT_CODE: str = """
<script>
const tabledata = {{placeholder_data}};
const table = new Tabulator("#example-table", {
data:tabledata,
layout:"fitColumns",
pagination:"local",
paginationSize:7,
paginationCounter:"rows",
initialSort:[ {column:"Age", dir:"desc"} ],
columnDefaults:{ tooltip:true },
columns:[
{title:"First name", field:"Name"},
{title:"Age", field:"Age"},
{title:"Age (bars)", field:"Age", hozAlign:"left", formatter:"progress", formatterParams:{color:[
"#c1e7ff",
"#94bed9",
"#6996b3",
"#3d708f",
"#004c6d"
]}},
{title:"Job", field:"Job"},
{title:"Has children", field:"Has children", hozAlign:"center", formatter:"tickCross", sorter:"boolean"}
]
}
);
</script>
"""
def dataframeToJSON(csv_file: str) -> List[Dict[str, Any]]:
df_to_json: List[Dict[str, Any]] = (
pd
.read_csv(csv_file)
.filter(["Name","Age","Job","Has children"])
.sample(20)
.to_json(orient="records")
)
return df_to_json
def openHTMLTemplate(path_to_file: str) -> str:
with open(path_to_file,"r") as html_file:
blueprint: str = html_file.read()
return blueprint
def saveToHTML(path_to_file: str, content: str) -> None:
with open(path_to_file, "w") as html_file:
html_file.write(content)
def createTable():
html_blueprint: str = openHTMLTemplate(HTML_FILE_INPUT)
df_to_json: List[Dict[str, Any]] = dataframeToJSON(CSV_FILE)
html_blueprint = (
html_blueprint
.replace("{{placeholder_javascript}}", JAVASCRIPT_CODE)
.replace("{{placeholder_data}}", df_to_json)
)
saveToHTML(HTML_FILE_OUTPUT,html_blueprint)
if __name__ == "__main__":
createTable()