avatarThiago Carvalho

Summary

The web content provides a tutorial on creating Ridge plots using Python's Seaborn library, illustrating the process with a dataset of Netflix original films and their IMDB scores.

Abstract

The article introduces Ridge plots as a visually compelling method for visualizing multiple distributions, particularly useful for catching the viewer's attention due to their compact and uncommon nature. It demonstrates the creation of Ridge plots step by step, starting with a dataset of Netflix's original productions and their IMDB scores. The author guides the reader through filtering the dataset, using Seaborn's FacetGrid to create KDE plots for each language category, and then modifying these plots to achieve the Ridge plot aesthetic. The tutorial emphasizes the importance of making the charts informative while maintaining an appealing design, and it concludes with a customized Ridge plot that showcases the distribution of IMDB scores by language. The article also touches on the history of Ridge plots, their alternative names such as Joy Plots, and their limitations, while providing additional resources for readers to explore further.

Opinions

  • The author suggests that Ridge plots are more attractive than traditional plots due to their unconventional appearance.
  • Ridge plots are praised for their ability to make charts compact and informative, though the author notes that this can sometimes reduce precision.
  • The article conveys that the primary advantage of Ridge plots is to focus on the shape of the data distribution rather than precise values.
  • The author expresses that Ridge plots are particularly effective when there are clear differences in the distributions of the categories being compared.
  • It is mentioned that Ridge plots may require extra polishing to prevent overlaps from obscuring too much information.
  • The author provides an opinion that Ridge plots are excellent for introducing users to data analysis due to their aesthetic appeal and simplicity in conveying the modality or central tendency of the data.

Ridge Plots with Python’s Seaborn

A fascinating way of visualizing multiple distributions

This visualization is composed of line charts stacked vertically with slightly overlapping lines that share the same x-axis.

Ridge plot — Image by the author

Those overlaps reduce the whitespace, many times at the cost of precision, and create a compact chart that’s relatively uncommon and great for catching the viewer's attention.

The names Ridge plot or Ridgeline plot is quite fitting; The charts do look like mountains. And they are also known as Joy Plots — Mainly because of the band Joy Division which used this visualization in one of its album covers.

Joy Division Album Cover Replica — Image by the author

KDE Plots and Facet Grids

Now let’s get to the example. We’ll use this dataset from Kaggle, which contains Netflix's original productions and their IMDB scores.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('../data/netflixoriginals.csv')
languages = ['English', 'Hindi', 'Spanish', 
             'French', 'Italian', 'Portuguese']
df_filtered = df[df['Language'].isin(languages)]
df_filtered
Filtered data frame — Image by the author

We’ll need Seaborns FacetGrid to create a plot for each category systematically. The function is straightforward; we only need the data frame and the field's name to group the values.

sns.set_theme(style="white")
g = sns.FacetGrid(df_filtered, row="Language")
g.map_dataframe(sns.kdeplot, x="IMDB Score")
KDE plots — Image by the author

With this default configuration, it’s hard to see and compare all the distributions. One of the main advantages of Ridge plots is to make the chart compact while still informative.

Of course, there are many different solutions for this issue, using the columns, changing plot sizes, or using another visualization.

Let’s try making the charts broader and shorter.

sns.set_theme(style="white")
g = sns.FacetGrid(df_filtered, row="Language", aspect=9, height=1.2)
g.map_dataframe(sns.kdeplot, x="IMDB Score")
KDE plots — Image by the author

That solves the problem, but there isn’t anything special about this visualization. The simple fact that Ridge plots are unconventional makes them more attractive.

Kernel Density Estimation plots are not known for representing the data with precision; they’re great for simple tasks such as visualizing the modality or central tendency.

That means we want the user to focus on the shape of the data, and the Y-ticks aren’t needed here.

sns.set_theme(style="white")
g = sns.FacetGrid(df_filtered, row="Language", aspect=9, height=1.2)
g.map_dataframe(sns.kdeplot, x="IMDB Score")
g.set_titles("")
g.set(yticks=[])
g.despine(left=True)
KDE plots — Image by the author

We’re ready to turn our density plots into a Ridge plot.

Ridge Plot

First, we’ll need to make sure the axis background is transparent.

sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

Second, we’ll need to paint/fill the inside area of the lines.

g.map_dataframe(sns.kdeplot, x="IMDB Score", fill=True, alpha=1)

We also need to differentiate the plots once they overlap. We can use different colors for each row or draw another density plot to outline the first.

Finally, we’ll need Matplotlib’s subplots_adjust to control the height space between the plots.

sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
g = sns.FacetGrid(df_filtered, row="Language", aspect=9, height=1.2)
g.map_dataframe(sns.kdeplot, x="IMDB Score", fill=True, alpha=1)
g.map_dataframe(sns.kdeplot, x="IMDB Score", color='black')
g.fig.subplots_adjust(hspace=-.5)
g.set_titles("")
g.set(yticks=[])
g.despine(left=True)
Ridge plot — Image by the author

We have our Ridge plot!

Now we can customize it as we wish. FacetGrid is great for creating multiple visualizations, and the methods .map and .map_dataframe are appreciable since they allow us to use functions to interact with all the subplots.

sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0), 'axes.linewidth':2})
palette = sns.color_palette("Set2", 12)
g = sns.FacetGrid(df_filtered, palette=palette, row="Language", hue="Language", aspect=9, height=1.2)
g.map_dataframe(sns.kdeplot, x="IMDB Score", fill=True, alpha=1)
g.map_dataframe(sns.kdeplot, x="IMDB Score", color='black')
def label(x, color, label):
    ax = plt.gca()
    ax.text(0, .2, label, color='black', fontsize=13,
            ha="left", va="center", transform=ax.transAxes)
    
g.map(label, "Language")
g.fig.subplots_adjust(hspace=-.5)
g.set_titles("")
g.set(yticks=[], xlabel="IMDB Score")
g.despine( left=True)
plt.suptitle('Netflix Originals - IMDB Scores by Language', y=0.98)
Ridge plot — Image by the author | CODE

Conclusions

Overall, Ridge plots are great for focusing on the distribution of the data. They attract the viewer’s attention with an appealing aesthetic, making them an excellent option for introducing the user to the analysis.

The overlap of the lines makes it harder to position y-ticks, limiting this visualization to pretty much density plots and histograms.

They shine when there are apparent differences in the distributions of the categories but can require some extra polishing since the overlaps may obscure too much information.

Compact Histogram — Image by the author

It’s an exciting graphic with a limited amount of information that viewers can effortlessly consume.

Below you can see the code I used to replicate the Joy Division album cover. The data comes from a gist from Borgar, and it’s an approximation of the actual graphic.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
url = "https://gist.githubusercontent.com/borgar/31c1e476b8e92a11d7e9/raw/0fae97dab6830ecee185a63c1cee0008f6778ff6/pulsar.csv"
df = pd.read_csv(url, header=None)
df = df.stack().reset_index()
df.columns = ['idx', 'x', 'y']
sns.set_theme(rc={"axes.facecolor": (0, 0, 0, 0), 'figure.facecolor':'#000000', 'axes.grid':False})
g = sns.FacetGrid(df, row='idx', aspect=50, height=0.4)
# Draw the densities in a few steps
g.map(sns.lineplot, 'x', 'y', clip_on=False, alpha=1, linewidth=1.5)
g.map(plt.fill_between, 'x', 'y', color='#000000')
g.map(sns.lineplot, 'x', 'y', clip_on=False, color='#ffffff', lw=2)
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-0.95)
g.set_titles("")
g.set(yticks=[], xticks=[], ylabel="", xlabel="")
g.despine(bottom=True, left=True)
plt.savefig('joy.png', facecolor='#000000')

Thanks for reading my article! — More Python DataViz tutorials.

Resources: Seaborn — Ridge Plot Example; Seaborn — FacetGrid; Matplotlib — Subplots Adjust; Eagereyes — Joy Plots

More content at plainenglish.io

Data Visualization
Python
Seaborn
Matplotlib
Ridgeline Plot
Recommended from ReadMedium