
PYTHON — Anscombe’s Quartet Revisited
Any sufficiently advanced technology is indistinguishable from magic. — Arthur C. Clarke

PYTHON — Remove Whitespace Exercise in Python
Anscombe’s Quartet is a group of datasets that have nearly identical statistical properties but vary significantly when graphed. In this tutorial, we’ll revisit Anscombe’s Quartet and visualize the datasets using Python and ggplot.
To begin, we’ll use the pandas library to work with the datasets.
import pandas as pd
# Anscombe's Quartet
x = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5]
y1 = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
y2 = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
y3 = [7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73]
x4 = [8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y4 = [6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
I = pd.DataFrame([x, y1], index=["x", "y1"]).T
II = pd.DataFrame([x, y2], index=["x", "y2"]).T
III = pd.DataFrame([x, y3], index=["x", "y3"]).T
IV = pd.DataFrame([x4, y4], index=["x4", "y4"]).TNext, we’ll use ggplot to visualize the datasets. We’ll import ggplot, the aesthetic, and the geometrical object to achieve this.
from plotnine import ggplot, aes, geom_point
# Visualizing the first dataset
plot1 = ggplot(I, aes(x='x', y='y1')) + geom_point()
print(plot1)
# Visualizing the second dataset
plot2 = ggplot(II, aes(x='x', y='y2')) + geom_point()
print(plot2)
# Visualizing the third dataset
plot3 = ggplot(III, aes(x='x', y='y3')) + geom_point()
print(plot3)
# Visualizing the fourth dataset
plot4 = ggplot(IV, aes(x='x4', y='y4')) + geom_point()
print(plot4)By running these code snippets, you can visualize each dataset and observe the significant differences in their plots, despite having similar statistical values.
This exercise demonstrates the importance of visualizing data to gain insights that may not be apparent from statistical descriptions alone. With just a few lines of code, you can quickly visualize and compare different datasets using ggplot in Python.
In the next lesson, we will delve deeper into the data layer. Join the conversation by becoming a member!
That’s it for revisiting Anscombe’s Quartet with Python and ggplot. Happy coding!

