This article introduces four alternatives to scatter, bar, and line plots for data visualization: hexbin plots, 2D density plots, dot plots, and waterfall charts.
Abstract
The article begins by acknowledging the widespread use of scatter, bar, and line plots in data visualization but highlights their limitations when dealing with large datasets or multiple categories. It then introduces four alternative plot types: hexbin plots and 2D density plots as alternatives to scatter plots, dot plots as an alternative to bar and line plots, and waterfall charts as an alternative to bar and line plots for visualizing changes over time. Each alternative is explained with examples and use cases, emphasizing their benefits in terms of clarity, interpretability, and visual appeal. The article concludes by encouraging readers to choose the most suitable plot type for their data and purpose.
Opinions
Scatter, bar, and line plots, while popular, are not always the best choice for data visualization.
Hexbin plots and 2D density plots are more effective than scatter plots for visualizing large datasets.
Dot plots are less cluttered and offer better comprehension than bar plots, especially when dealing with many categories.
Waterfall charts are more effective than bar and line plots for visualizing changes over time.
The choice of plot type should be based on the nature of the data and the intended message.
Visualizations should be appealing and easy to interpret.
The article aims to expand readers' data visualization toolkit rather than criticize the use of scatter, bar, and line plots.
Unimpressed With Your Scatter and Bar Plots? Give These Four Classic Alternatives A Try.
Better alternatives to scatter, bar, and line plots.
If you have ever visualized your data (which I am sure you have), the first plot type that possibly came to your mind was either a scatter, bar, or line plot.
To recall quickly, these are shown below:
Scatter, Bar, and Line plot illustration (Image by Author)
While these plots do cover a wide variety of visualization use cases, I have seen many data scientists using them excessively in every possible place.
Although they are simple and easy to interpret, they are not the right choice to cover every possible use case.
Therefore, in this blog, I will demonstrate a few alternatives to these popular plots. Moreover, I will also explain how these can be more beneficial to use.
Let’s begin 🚀!
#1 Hexbin Plot
Alternative to scatter plot.
Scatter plots are extremely useful for visualizing two sets of numerical variables.
But when you have, say, thousands of data points, scatter plots can get too dense to interpret. This is shown below:
Hexbins can be a good choice in such cases. As the name suggests, they bin the area of a chart into hexagonal regions.
Moreover, each region is assigned a color intensity based on the method of aggregation used (the number of points, for instance).
When to use them?
Hexbins are especially useful for understanding the spread of data. It is often considered an elegant alternative to a scatter plot.
Moreover, binning makes it easier to identify data clusters and depict patterns.
#2 2D Density Plot
Another alternative to scatter plot.
As we noticed above, when the number of data points is large, interpreting a scatter plot to determine its distribution is immensely difficult.
Similar to a hexbin plot which depicts the density of points, a 2D density plot illustrates the distribution of a set of points in a two-dimensional space.
A contour is created by connecting points of equal density. In other words, a single contour line depicts an equal density of data points.
When to use them?
As mentioned above, if a scatter plot is hard to interpret, a 2D density plot can be your way to proceed.
They can be especially useful when you want to identify patterns and outliers in the data. Scatter plots, on the other hand, are mainly used to depict the relationship between two numeric variables.
#3 Dot Plot
Alternative to bar and line plot.
Bar plots are extremely useful for visualizing categorical variables against a continuous value.
But when you have many categories to depict, they can get too dense to interpret.
Moreover, in a bar plot with many bars, we’re often not paying attention to the individual bar lengths. Instead, we mostly consider the individual endpoints of each bar that denote the total value.
Consider the following data:
Here, we have a dummy population for two countries (Country A and Country B) from the year 1995–2010.
Let’s create a bar plot:
The individual bars take up plenty of space, which makes the graph cluttered.
A dot plot can be a better choice in such cases. They are like scatter plots but with one categorical and one continuous axis.
When to use them?
Compared to a bar plot, they are less cluttered and offer better comprehension.
This is especially true in cases where we have many categories and/or multiple categorical columns to depict in a plot.
#4 Waterfall charts
Alternative to bar and line plot.
If you want to visualize the variation/progress/change in a value over some period, a line (or bar) plot may not always be an apt choice.
Both the line plot and the bar plot depict the actual values in the chart. Thus, sometimes, it can get difficult to visually estimate the scale of incremental changes.
Consider the following data:
Here, we have dummy month-wise data.
We can create a line plot as follows:
And a bat plot as follows:
Although these do depict the data as needed, it is difficult to visually estimate the scale of rolling changes.
To address this, you can use a waterfall chart.
To create one, you can use the waterfallchartslibrary in Python.
Next, we should find the rolling difference and represent it in a new column. The final data should look as follows:
The Delta value for the first month is the same as the start value.
Much better, isn’t it?
Here, the start and final values are represented by the first and last bars. Also, the marginal changes are automatically color-coded, making them easier to interpret.
When to use them?
A waterfall chart is extremely useful to depict the incremental contributions of individual steps to a total value, and how these contributions changed over time.
Conclusion
Congratulations! You have just added four incredible plots to your data visualization kit.
Before I end this blog, understand that visualizations are always meant to depict the data in an appealing and interpretable way. Thus, choose the one which suits your purpose and is simple to digest visually.
Also, my intention in this blog is not to berate the three most fundamental plot types in data science.
Instead, it is to make you aware of other alternatives, and when to use them.
Found these tips interesting?
If you want to learn more such elegant tips and tricks about Data Science and Python, I post an informative tip daily on LinkedIn.