The article provides guidance on styling Pandas dataframes to resemble Excel tables, including formatting, highlighting cells, and adding embedded bars and gradients.
Abstract
The article "How to Style Your Pandas Dataframe Like in Excel" offers a comprehensive tutorial for enhancing the visual appeal of dataframes in Python. It begins by acknowledging the preference for tables among stakeholders and proceeds to demonstrate various styling techniques using a dataset on NBA playoffs statistics. The author covers essential formatting methods, such as controlling the number of decimal places and adding currency or percentage formats. The article also delves into advanced styling, including highlighting maximum and minimum values, applying gradient styles to improve data comprehension, and embedding color bars to convey additional information about the data. The author emphasizes the importance of clear data presentation and encourages readers to explore the wide range of styling options available in Pandas to effectively communicate their findings to their audience.
Opinions
The author expresses a personal preference for data visualization but recognizes the widespread preference for tabular data presentation among stakeholders.
Highlighting the utility of the .style attribute, the author suggests that it is underutilized despite its powerful capabilities for dataframe styling in Pandas.
The author provides a subjective tip to use pd.set_options('display.max_columns',None) for better visibility of all dataframe columns, indicating a common issue with column display in Pandas.
The use of Pythonic dictionary comprehension for formatting is recommended for conciseness and readability, though the author acknowledges that the previous syntax may be clearer for some users.
The author endorses the use of .set_caption() to add titles to dataframes, enhancing the narrative around the data.
Gradient styles are advocated for their effectiveness in helping viewers quickly grasp the significance of numerical values in a dataframe.
The article concludes with an encouragement for readers to experiment with Pandas styling features, highlighting the ultimate goal of informing the audience effectively.
How to Style Your Pandas Dataframe Like in Excel
Let’s review the basics: formatting, highlighting cells, embedded bars, and gradients.
We will use dataset on NBA playoffs statistics. You can download it on Kaggle here to follow along.
Image by author
The first little tip is to use ‘pd.set_options(‘display.max_columns’,None) to let you see all your dataframe columns.
For our use case today, let’s focus on the points per game statistics for each team. We will use groupby.
Image by author
The important attribute is .style, which will return an object of the class “styler,”. We will use it to apply different formatting and styling methods to our dataframe.
2. Control the formatting
You need to use the .format() method. It allows us to format many different things: adding currency, converting a value in percentage, and the DateTime column format among others. I put Pandas official links at the end of this post.
For example, the rounding for each column is not the same. Let’s say we want to keep 1 decimal:
line 4: we call the .style attribute to start applying formatting and styling
line 5: .format method where we specify for each column to have only one decimal.
Image by author
To write it in a more pythonic way, we can use a dictionary comprehension:
Please feel free to use the previous syntax if it’s clearer to you.
One more interesting formatting option is the .set_caption() method. We can use it to add a title to our dataframe:
Line 10: set_caption() with the title you want.
Image by author
3. Highlighting max and min values
We will use the convenient highlighting_max and highlighting_min methods
Please note that if you want to highlight by row, you need to specify axis=1
4. Apply gradient styles
They make it easier for viewers to comprehend how the table’s columns and rows relate to the numbers. Gradients can be used to show whether a value is big or small, positive or negative, or even good or bad.
Let’s say we apply a gradient background color only to the ‘mean’ column:
line 6: we use the background_gradient method: subset lets us choose to style only the mean column. The chosen colormap is Blues and we specify the minimum and maximum values so gradients are more visible.
We can also apply the gradients to the text with the text_gradient method:
5. Embedded bars
Like in Excel, we can add color bars in the background to provide more visual information on the numbers.
We use the .bar method and apply it to a different column ‘obpm’ (which is the acronym for offensive box/plus minus =metric that estimates a player’s contribution to the team when that player is on the court)
6. Conclusion
In terms of styling, pandas provide a wide range of options. They are often unknown and/or underutilized. Feel free to try them out and explore the different possibilities. The goal is always the same: to inform our audience as effectively as we can.
I hope you enjoyed reading this post! Follow me on Medium for future posts. It motivates me to continue.