avatarGeoSense ✅

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5164

Abstract

aption></figure><h2 id="d17d">Countplot</h2><p id="adb7">The <code>countplot()</code> function in the <code>Seaborn</code> library of Python displays the total count of values for each category using bars.</p><p id="1ff0">In the following count plot, you can observe the count of vehicles for each category of cylinders.</p><div id="a53d"><pre>sns.countplot(x=<span class="hljs-string">'cyl'</span>, <span class="hljs-keyword">data</span>=mtcars, palette=<span class="hljs-string">'Set1'</span>, legend=False)</pre></div><figure id="8cfd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*__xdlRfPK19IwsP_GZd53Q.png"><figcaption></figcaption></figure><p id="4704">Seaborn in Python enables the creation of horizontal count plots, where the feature column is positioned on the y-axis, while the count is represented along the x-axis.</p><div id="1b59"><pre>sns.countplot(y=<span class="hljs-string">'gear'</span>, <span class="hljs-keyword">data</span>= mtcars, palette=<span class="hljs-string">'rocket'</span>)</pre></div><figure id="c01e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Lb5fcV10qyk0dtJvQzLILA.png"><figcaption></figcaption></figure><p id="a2cd">Additionally, you can generate a grouped count plot by employing the hue parameter. This parameter allows you to specify the column for color encoding.</p><p id="9533">In the following count plot, the count of cars for each category of gears is displayed, and the data is grouped based on the number of cylinders.</p><div id="2406"><pre>sns.countplot(x=<span class="hljs-string">'gear'</span>, hue=<span class="hljs-string">'cyl'</span>, <span class="hljs-keyword">data</span>=mtcars, palette=<span class="hljs-string">'Set1'</span>)</pre></div><figure id="2739"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*i9DFK2vjDeml7ZKfiDv8bQ.png"><figcaption></figcaption></figure><h2 id="e021">Distribution Plot</h2><p id="0437">Seaborn’s library incorporates the distplot() function, which is designed to illustrate the distribution of continuous data.</p><p id="b6ab">In this particular example, you’ll be plotting the distribution of miles per gallon for various vehicles. The mpg metric quantifies the total distance a car can travel per gallon of fuel.</p><div id="db9b"><pre>sns.distplot(mtcars.mpg, bins=10, color=<span class="hljs-string">'r'</span>)</pre></div><figure id="4d6e"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*0FfBo1TpwZbS5-g-lPZnMw.png"><figcaption></figcaption></figure><h2 id="b909">Heatmap</h2><p id="ae93">Seaborn’s library provides the capability to visualize matrix-like data through heatmaps. These heatmaps represent the values of variables within a matrix as distinct colors.</p><p id="31d3">The following example showcases a heatmap depicting the correlation between each variable in the mtcars dataset.</p><div id="a785"><pre>sns.heatmap(df.<span class="hljs-built_in">corr</span>(), cbar<span class="hljs-operator">=</span><span class="hljs-literal">True</span>, linewidths<span class="hljs-operator">=</span><span class="hljs-number">0.5</span>)</pre></div><figure id="4cbe"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OyTDHyUTluUEhMau90ob1A.png"><figcaption></figcaption></figure><h2 id="1e68">Scatterplot</h2><p id="c3f4">The Seaborn <code>scatterplot()</code> function is a valuable tool for visualizing relationships between two continuous variables.</p><p id="940a">To gain a deeper understanding of scatter plots and other plotting functions, it’s recommended to use the IRIS flower dataset.</p><p id="f656">Let’s proceed by loading the iris dataset.</p><div id="7395"><pre>iris = sns.load_dataset(<span class="hljs-string">'iris'</span>) iris.head()</pre></div><figure id="13d1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ywV70PSPQzcS7iqVSw9ONg.png"><figcaption></figcaption></figure><p id="2c36">The following scatter plot illustrates the correlation between sepal length and petal length across various species of iris flowers.</p><div id="106e"><pre>sns.scatterplot(x=<span class="hljs-string">'sepal_length'</span>, y=<span class="hljs-string">'petal_length'</span>, <span class="hljs-keyword">data</span>=iris)</pre></div><figure id="8b93"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*tnVtKcNI37yMPD62QfVkTw.png"><figcaption></figcaption></figure><p id="615b">You can now employ the <code>hue</code> parameter in the function and set it to “species” to categorize the different flower species.</p><p id="fdf7">In the plot below, the three types of iris flowers are distinctly discernible based on their sepal length and petal length.</p><div id="d88b"><pre>sns.scatterplot(x=<span class="hljs-string">'sepal_length'</span>, y=<span class="hljs-string">'petal_length'</span>, <span class="hljs-keyword">data</span>=iris, hue=<span class="hljs-string">'species'</span>)</pre></div><figure id="36c2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*MgkSvU4tBhbm3bcMsCbfcw.png"><figcaption></figcaption></figure><h2 id="8297">Pairplot</h2><p id="465e">Seaborn in Python provides the capability t

Options

o visualize data through pair plots, which generate a matrix showcasing relationships between each variable in the dataset.</p><p id="3832">In the plot below, all the individual plots are histograms, offering a visual representation of the distribution for each feature.</p><div id="c216"><pre>sns<span class="hljs-selector-class">.pairplot</span>(iris)</pre></div><figure id="50bf"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*L-8fGQ4arqaOsDoD1Wjm2g.png"><figcaption></figcaption></figure><p id="b448">By utilizing the <code>hue</code> parameter, you can transform the diagonal visuals into KDE plots, while the remaining plots become scatter plots. This adjustment enhances the pairplot’s effectiveness in classifying each type of flower.</p><div id="77a1"><pre>sns.pairplot(iris, hue=<span class="hljs-string">'species'</span>, palette=<span class="hljs-string">'Set1'</span>)</pre></div><figure id="b7ae"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*OK8E9zZPtNYSnzGhN_oxfw.png"><figcaption></figcaption></figure><h2 id="7276">Linear Regression Plot</h2><p id="e9b9">Seaborn’s <code>lmplot()</code> function is employed to depict a linear relationship as deduced through regression analysis for continuous variables.</p><p id="98b2">In the plot below, you can observe the relationship between petal length and petal width across various species of iris flowers.</p><div id="2713"><pre>sns.lmplot(x=<span class="hljs-string">'petal_length'</span>, y=<span class="hljs-string">'petal_width'</span>, <span class="hljs-keyword">data</span>= iris)</pre></div><figure id="28b8"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*-Ik0u5Cw1fzkY04wQRGF4g.png"><figcaption></figcaption></figure><p id="54c6">By utilizing the <code>hue</code> parameter, you can distinguish between each species of flower, and further customize the visualization by setting distinct markers for each species.</p><div id="adad"><pre>sns.lmplot(x=<span class="hljs-string">'petal_length'</span>, y=<span class="hljs-string">'petal_width'</span>, hue=<span class="hljs-string">'species'</span>, <span class="hljs-keyword">data</span>= iris, markers=[<span class="hljs-string">'o'</span>,<span class="hljs-string">"*"</span>,<span class="hljs-string">"^"</span>])</pre></div><figure id="bfac"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*_bMaM3-tEMGV-I-vNy_pSw.png"><figcaption></figcaption></figure><h2 id="cba1">Boxplot</h2><p id="f7ab">A boxplot, often referred to as a box and whisker plot, provides a visual representation of the distribution of quantitative data. The box encapsulates the quartiles of the dataset, while the whiskers extend to showcase the remaining distribution, excluding outlier points.</p><p id="21f2">In the boxplot displayed below, you can discern the distribution of sepal widths for the three distinct species of iris flowers.</p><div id="3614"><pre>sns.boxplot(x=<span class="hljs-string">'species'</span>, y=<span class="hljs-string">'sepal_width'</span>, <span class="hljs-keyword">data</span>=iris)</pre></div><figure id="8484"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ZT3B9SPdYyWxXgbkmgl-8A.png"><figcaption></figcaption></figure><h1 id="d532">Conclusion</h1><p id="077e">Effective data visualization is crucial in exploratory data analysis, and the Seaborn library streamlines this process by offering a range of built-in plotting functions. Throughout this tutorial, we’ve delved into some of these functions, leveraging two datasets — <code>mtcars</code> and <code>iris</code> — to demonstrate their capabilities.</p><p id="8070">Get all the code in <a href="https://github.com/tnmthai/gis-medium/blob/8d39ac5665aa6c426b68638db4c6c26dbde30df0/examples/notebooks/seaborn.ipynb">GitHub</a> in the format of Jupyter Notebook.</p><h2 id="bdcc">Stay Connected</h2><p id="3345">If you enjoyed this article, we invite you to become a <a href="https://medium.com/subscribe/@tnmthai">Medium member</a> and gain access to thousands of similar articles.</p><p id="a7cc">Thanks for reading.</p><h1 id="55bd">In Plain English</h1><p id="cbc7"><i>Thank you for being a part of our community! Before you go:</i></p><ul><li><i>Be sure to <b>clap</b> and <b>follow</b> the writer! 👏</i></li><li><i>You can find even more content at <a href="https://plainenglish.io/"><b>PlainEnglish.io</b></a><b> 🚀</b></i></li><li><i>Sign up for our <a href="http://newsletter.plainenglish.io/"><b>free weekly newsletter</b></a>. 🗞️</i></li><li><i>Follow us: <a href="https://twitter.com/inPlainEngHQ"><b>Twitter</b></a><b>(X</b></i>), <a href="https://www.linkedin.com/company/inplainenglish/"><b><i>LinkedIn</i></b></a>, <a href="https://www.youtube.com/channel/UCtipWUghju290NWcn8jhyAw"><b><i>YouTube</i></b></a>, <a href="https://discord.gg/in-plain-english-709094664682340443"><b><i>Discord</i></b></a><b><i>.</i></b></li><li><i>Check out our other platforms: <a href="https://stackademic.com/"><b>Stackademic</b></a></i>, <a href="https://cofeed.app/"><b><i>CoFeed</i></b></a>, <a href="https://venturemagazine.net/"><b><i>Venture</i></b></a>.</li></ul></article></body>

Exploring Data Visualization with Seaborn in Python

Why Seaborn?

Seaborn is a highly acclaimed data visualization library in Python, widely employed in data science and machine learning endeavors. Leveraging the capabilities of the underlying Matplotlib library, Seaborn enables robust exploratory data analysis by facilitating the creation of interactive and insightful plots.

Topics Covered in this Tutorial:

  1. Importing Libraries in Jupyter Notebook
  2. Loading a Dataset
  3. Seaborn Plotting Functions in Python
  • Bar Plot
  • Count Plot
  • Distribution Plot
  • Heatmap
  • Scatter Plot
  • Pair Plot
  • Linear Regression Plot
  • Box Plot

Importing Libraries in Jupyter Notebook

When embarking on an exploratory data analysis project in Python, it’s essential to include the libraries: NumPy, Pandas, Matplotlib, and Seaborn. Let’s proceed by importing them.

Loading Dataset

For this learning exercise, it’s recommended to utilize the well-known mtcars dataset. This dataset originates from the 1974 Motor Trend US magazine and encompasses details regarding fuel consumption as well as ten distinct attributes related to automobile design and performance, encompassing 32 different cars.

mtcars = pd.read_csv('mtcars.csv')
mtcars.head()

Next, employ the info() function to display a concise summary of the data frame. This summary encompasses details about the index data type, column data types, the presence of non-null values, and memory utilization.

mtcars.info()

Check the shape of the mtcars dataframe.

mtcars.shape
(32, 12)

Seaborn Plotting Functions

Barplot

A bar plot offers an approximation of the central tendency of a numeric variable, represented by the height of each bar. It also incorporates error bars to provide insight into the level of uncertainty surrounding this estimate. Typically, when constructing this plot, a categorical column is selected for the x-axis, while a numerical column is chosen for the y-axis.

res = sns.barplot(x=mtcars['cyl'], y=mtcars['carb'])
plt.show()

In the preceding plot, the `barplot()` function was employed, with the `cylinder (cyl)` column designated for the x-axis and `carburetors (carb)` for the y-axis.

Below is an alternative code snippet to generate the same bar plot.

res = sns.barplot(x='cyl', y='carb', data=mtcars)
plt.show()

Using Python Seaborn, users have the flexibility to assign specific colors to the bars. In the following bar chart, all bars have been assigned a vibrant red hue.

res = sns.barplot(x='cyl', y='carb', data=mtcars, color='red')
plt.show()

The Seaborn library provides the `palette` attribute, which allows you to assign a variety of colors to the bars.

res = sns.barplot(x=mtcars['gear'], y=mtcars['hp'], hue=[], palette='rocket')
plt.show()

Countplot

The `countplot()` function in the Seaborn library of Python displays the total count of values for each category using bars.

In the following count plot, you can observe the count of vehicles for each category of cylinders.

sns.countplot(x='cyl', data=mtcars, palette='Set1', legend=False)

Seaborn in Python enables the creation of horizontal count plots, where the feature column is positioned on the y-axis, while the count is represented along the x-axis.

sns.countplot(y='gear', data= mtcars, palette='rocket')

Additionally, you can generate a grouped count plot by employing the `hue` parameter. This parameter allows you to specify the column for color encoding.

In the following count plot, the count of cars for each category of gears is displayed, and the data is grouped based on the number of cylinders.

sns.countplot(x='gear', hue='cyl', data=mtcars, palette='Set1')

Distribution Plot

Seaborn’s library incorporates the `distplot()` function, which is designed to illustrate the distribution of continuous data.

In this particular example, you’ll be plotting the distribution of miles per gallon for various vehicles. The `mpg` metric quantifies the total distance a car can travel per gallon of fuel.

sns.distplot(mtcars.mpg, bins=10, color='r')

Heatmap

Seaborn’s library provides the capability to visualize matrix-like data through heatmaps. These heatmaps represent the values of variables within a matrix as distinct colors.

The following example showcases a heatmap depicting the correlation between each variable in the mtcars dataset.

sns.heatmap(df.corr(), cbar=True, linewidths=0.5)

Scatterplot

The Seaborn `scatterplot()` function is a valuable tool for visualizing relationships between two continuous variables.

To gain a deeper understanding of scatter plots and other plotting functions, it’s recommended to use the IRIS flower dataset.

Let’s proceed by loading the iris dataset.

iris = sns.load_dataset('iris')
iris.head()

The following scatter plot illustrates the correlation between sepal length and petal length across various species of iris flowers.

sns.scatterplot(x='sepal_length', y='petal_length', data=iris)

You can now employ the `hue` parameter in the function and set it to “species” to categorize the different flower species.

In the plot below, the three types of iris flowers are distinctly discernible based on their sepal length and petal length.

sns.scatterplot(x='sepal_length', y='petal_length', data=iris, hue='species')

Pairplot

Seaborn in Python provides the capability to visualize data through pair plots, which generate a matrix showcasing relationships between each variable in the dataset.

In the plot below, all the individual plots are histograms, offering a visual representation of the distribution for each feature.

sns.pairplot(iris)

By utilizing the `hue` parameter, you can transform the diagonal visuals into KDE plots, while the remaining plots become scatter plots. This adjustment enhances the pairplot’s effectiveness in classifying each type of flower.

sns.pairplot(iris, hue='species', palette='Set1')

Linear Regression Plot

Seaborn’s `lmplot()` function is employed to depict a linear relationship as deduced through regression analysis for continuous variables.

In the plot below, you can observe the relationship between petal length and petal width across various species of iris flowers.

sns.lmplot(x='petal_length', y='petal_width', data= iris)

By utilizing the `hue` parameter, you can distinguish between each species of flower, and further customize the visualization by setting distinct markers for each species.

sns.lmplot(x='petal_length', y='petal_width', hue='species', data= iris, markers=['o',"*","^"])

Boxplot

A boxplot, often referred to as a box and whisker plot, provides a visual representation of the distribution of quantitative data. The box encapsulates the quartiles of the dataset, while the whiskers extend to showcase the remaining distribution, excluding outlier points.

In the boxplot displayed below, you can discern the distribution of sepal widths for the three distinct species of iris flowers.

sns.boxplot(x='species', y='sepal_width', data=iris)

Conclusion

Effective data visualization is crucial in exploratory data analysis, and the Seaborn library streamlines this process by offering a range of built-in plotting functions. Throughout this tutorial, we’ve delved into some of these functions, leveraging two datasets — mtcars and iris — to demonstrate their capabilities.

Get all the code in GitHub in the format of Jupyter Notebook.

Stay Connected

If you enjoyed this article, we invite you to become a Medium member and gain access to thousands of similar articles.

Thanks for reading.

In Plain English

Thank you for being a part of our community! Before you go:

Seaborn Python
Data Visualization
Iris Dataset
Big Data
Geosense
Recommended from ReadMedium