avatarMoez Ali

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

6103

Abstract

ainst the <code>Carat Weight</code> column in our DataFrame.</p><h2 id="846f">Bar Plots</h2><p id="dd72">A bar plot is a graph that displays categorical data with rectangular bars. We can create a bar plot in pandas using the <code>plot</code> function with the kind parameter set to <code>bar</code>:</p><div id="bd2a"><pre><span class="hljs-comment"># Import the pandas library</span> import pandas as pd

<span class="hljs-comment"># Read in the diamond.csv data from a URL using pandas</span> <span class="hljs-built_in">df</span> = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv'</span>)

<span class="hljs-comment"># bar plot on counts of diamond by cut type</span> <span class="hljs-built_in">df</span>[<span class="hljs-string">'Cut'</span>].value_counts().plot(kind = <span class="hljs-string">'bar'</span>)</pre></div><p id="c38d"><b>Output:</b></p><figure id="12e7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ya4vMddEs1cX-szj-F6S3A.png"><figcaption></figcaption></figure><p id="b570">Here, we create a bar plot of the count of diamond by <code>Cut</code> type.</p><h2 id="f136">Histograms</h2><p id="7b84">A histogram is a graph that displays the distribution of a numerical variable. We can create a histogram in pandas using the <code>plot</code> function with the kind parameter set to <code>hist</code>:</p><div id="5c71"><pre><span class="hljs-comment"># Import the pandas library</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Read in the diamond.csv data from a URL using pandas</span> df = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv'</span>)

<span class="hljs-comment"># histogram of Price</span> df[<span class="hljs-string">'Price'</span>].plot(kind = <span class="hljs-string">'hist'</span>)</pre></div><p id="542e"><b>Output:</b></p><figure id="8593"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*EHa84VJvipLfKNQshNg-LA.png"><figcaption></figcaption></figure><p id="4a07">Here, we create a histogram of the <code>Price</code> column in our DataFrame.</p><h2 id="54e7">Kernel Density Estimation Plots</h2><p id="f836">Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. We can create a KDE plot in pandas using the <code>plot</code> function with the kind parameter set to <code>density</code>:</p><div id="a2c1"><pre><span class="hljs-comment"># Import the pandas library</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Read in the diamond.csv data from a URL using pandas</span> df = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv'</span>)

<span class="hljs-comment"># histogram of Price</span> df[<span class="hljs-string">'Price'</span>].plot(kind = <span class="hljs-string">'density'</span>)</pre></div><p id="2f47"><b>Output:</b></p><figure id="2ff3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*O0ZpsS6Uh3etKKaFtUAZCw.png"><figcaption></figcaption></figure><h2 id="81c8">Box Plots</h2><p id="adfd">A box plot is a graph that displays the distribution of a numerical variable. We can create a box plot in pandas using the <code>plot</code> function with the kind parameter set to <code>box</code>:</p><div id="c74b"><pre><span class="hljs-comment"># Import the pandas library</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Read in the diamond.csv data from a URL using pandas</span> df = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv'</span>)

<span class="hljs-comment"># histogram of Price</span> df[<span class="hljs-string">'Price'</span>].plot(kind = <span class="hljs-string">'box'</span>)</pre></div><p id="42ef"><b>Output:</b></p><figure id="00dd"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*d2aeqdzs2Dm0F7csXBLjcg.png"><figcaption></figcaption></figure><h1 id="3662">Area Plots</h1><p id="3b92">An area plot is a graph that displays the evolution of numerical values of different variables over time or any other dimension. We can create an area plot in pandas using the <code>plot</code> function with the kind parameter set to <code>area</code>:</p><div id="1c39"><pre><span class="hljs-comment"># Import the pandas library</span> <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Read in the migration.csv data from a URL using pandas</span> df = pd.read_csv(<span class="hljs-string">'https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/migration.csv'</span>)

<span class="hljs-comment"># Transpose the DataFrame so that the countries are in the columns</span> df = df.transpose()

<span class="hljs-comment"># Set the column names to the values in the first row of the DataFrame</span> df.columns = df.iloc[<span class="hljs-number">0</span>]

<span class="hljs-comment"># Drop the row with the column names, which is now redundant</span> df = df.drop(index = <span class="hljs-string">'Country Name'</span>)

<span class="hljs-comment"># Rename the index to 'Year'</span> df = df.rename_axis(<span class="hljs-string">'Year'</span>)

<span class="hljs-comment"># Plot the migration data for Canada and USA</span> df[[<span class="hljs-string">'Canada'</span>, <span class="hljs-string">'United States'</span>]].plot(kind = <span class="hljs-string">'area'</span>)</pre></div><p id="641a"><b>Output:</b></p><figure id="66eb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sY7Mrt0kqExjOj4LVyrt4w.png"><figcaption></figcaption></figure><h1 id="58c0">Conclusion</h1><p id="d343">In this article, we have learned how to use pandas to create various types of plots and visualizations to

Options

explore and analyze data. We have covered some basic visualization techniques such as line plots, scatter plots, histograms, heatmaps, area plots, kernel density estimation plots, and area plots.</p><p id="a821">Pandas provides a powerful and flexible way to create visualizations with just a few lines of code. With pandas, we can easily explore and analyze our data visually and gain insights into the underlying patterns and trends. We hope this article has provided a helpful introduction to data visualization with pandas.</p><h1 id="18eb">Liked the blog? Connect with Moez Ali</h1><p id="2a1f">Moez Ali is an innovator and technologist. A data scientist turned product manager dedicated to creating modern and cutting-edge data products and growing vibrant open-source communities around them.</p><p id="54b4">Creator of <a href="https://www.pycaret.org">PyCaret</a>, 100+ publications with <a href="https://scholar.google.ca/scholar?hl=en&amp;as_sdt=0%2C5&amp;q=pycaret&amp;btnG=">500+ citations</a>, keynote speaker and globally recognized for <a href="https://www.github.com/pycaret/pycaret">open-source contributions in Python</a>.</p><h2 id="71b9">Let’s be friends! connect with me:</h2><p id="ce83">👉 <a href="https://www.linkedin.com/in/profile-moez/">LinkedIn</a> 👉 <a href="https://twitter.com/moezpycaretorg1">Twitter</a> 👉 <a href="https://medium.com/@moez-62905">Medium</a> 👉 <a href="https://www.youtube.com/channel/UCxA1YTYJ9BEeo50lxyI_B3g">YouTube</a></p><p id="e2b3"><b>🔥 Check out my brand new personal website: <a href="https://www.moez.ai">https://www.moez.ai</a>.</b></p><p id="63c1">To learn more about my open-source work: <a href="https://www.pycaret.org">PyCaret</a>, you can check out this <a href="https://www.github.com/pycaret/pycaret">GitHub repo</a> or you can follow PyCaret’s <a href="https://www.linkedin.com/company/pycaret/mycompany/?viewAsMember=true">Official LinkedIn page</a>.</p><p id="9a89" type="7">Listen to my talk on Time Series Forecasting with PyCaret in DATA+AI SUMMIT 2022 by Databricks.</p> <figure id="44aa"> <div> <div> <img class="ratio" src="http://placehold.it/16x9"> <iframe class="" src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FV7K-pFxHop4%3Fstart%3D2%26feature%3Doembed%26start%3D2&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DV7K-pFxHop4&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FV7K-pFxHop4%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" allowfullscreen="" frameborder="0" height="480" width="854"> </div> </div> </figure></iframe></div></div></figure><h2 id="b75f">🚀 My most read articles:</h2><div id="182f" class="link-block"> <a href="https://towardsdatascience.com/machine-learning-in-power-bi-using-pycaret-34307f09394a"> <div> <div> <h2>Machine Learning in Power BI using PyCaret</h2> <div><h3>A step-by-step tutorial for implementing machine learning in Power BI within minutes</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*Q34J2tT_yGrVV0NU38iMig.jpeg)"></div> </div> </div> </a> </div><div id="ecad" class="link-block"> <a href="https://towardsdatascience.com/announcing-pycaret-2-0-39c11014540e"> <div> <div> <h2>Announcing PyCaret 2.0</h2> <div><h3>An open source low-code machine learning library in Python</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*oT-VYfpNDeKJ1L9vkpESdw.png)"></div> </div> </div> </a> </div><div id="2ded" class="link-block"> <a href="https://towardsdatascience.com/time-series-forecasting-with-pycaret-regression-module-237b703a0c63"> <div> <div> <h2>Time Series Forecasting with PyCaret Regression Module</h2> <div><h3>A step-by-step tutorial for time-series forecasting using PyCaret</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*6t7FzC-AdfDlA9LI)"></div> </div> </div> </a> </div><div id="80d4" class="link-block"> <a href="https://towardsdatascience.com/multiple-time-series-forecasting-with-pycaret-bc0a779a22fe"> <div> <div> <h2>Multiple Time Series Forecasting with PyCaret</h2> <div><h3>A step-by-step tutorial on forecasting multiple time series using PyCaret</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*c8mBuCW7nP0KGhwXQC98Eg.png)"></div> </div> </div> </a> </div><div id="7f4c" class="link-block"> <a href="https://towardsdatascience.com/time-series-anomaly-detection-with-pycaret-706a6e2b2427"> <div> <div> <h2>Time Series Anomaly Detection with PyCaret</h2> <div><h3>A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCaret</h3></div> <div><p>towardsdatascience.co</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*O-lbKPXdK7716BK8MLpTQA.png)"></div> </div> </div> </a> </div></article></body>

Data Visualization with Pandas: A Comprehensive Guide

Creating Basic Plots with Pandas: Line, Scatter, Bar, Histograms and Box Plots

Generated by Moez Ali using Midjourney

Introduction

Data visualization is the graphical representation of data and information. It is a powerful tool for understanding complex data and communicating insights to others. Data visualization can be used for a variety of purposes, such as identifying trends, patterns, and outliers, and exploring relationships between variables.

Pandas is a popular open-source data analysis library for Python. It provides powerful data structures and data analysis tools, including data visualization capabilities. Pandas visualization is built on top of the matplotlib library, which provides a wide range of customizable plots.

In this article, we will explore the basics of data visualization with pandas. We will start with simple plots and progress to more complex visualizations. We will also cover best practices for creating effective visualizations and customizing pandasplots.

Setting Up Pandas and Data

Before we can start visualizing data with pandas, we need to install pandas and load data into a pandas DataFrame.

Installing Pandas

If you haven’t installed pandas yet, you can do so using pip, the Python package manager. Open a terminal or command prompt and run the following command:

pip install pandas

Importing Libraries

Once you have installed pandas, you can import it and other necessary libraries in your Python script or notebook.

import pandas as pd

Loading Data

To load data into a pandas DataFrame, we can use the pd.read_csv() function. This function reads a CSV file and creates a DataFrame object.

df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')
df.head()
df.describe()

This will print statistics such as count, mean, and standard deviation for each column in the DataFrame. These functions are useful for getting a quick sense of our data before we start visualizing it.

Visualization using Pandas plot method

Pandas provides several basic visualization techniques that allow us to quickly visualize our data. In this section, we will cover some of the most commonly used plots in pandas.

Line Plots

A line plot is a graph that displays data as a series of points connected by lines. We can create a line plot in pandas using the plot() function with the kind parameter set to ‘line’:

# Import the pandas library
import pandas as pd

# Read in the migration.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/migration.csv')

# Transpose the DataFrame so that the countries are in the columns
df = df.transpose()

# Set the column names to the values in the first row of the DataFrame
df.columns = df.iloc[0]

# Drop the row with the column names, which is now redundant
df = df.drop(index = 'Country Name')

# Rename the index to 'Year'
df = df.rename_axis('Year')

# Plot the migration data for Canada
df['Canada'].plot()

Output:

Here, we create a line plot of the Canada column against the Year column in our DataFrame.

Scatter Plots

A scatter plot is a graph that displays the relationship between two variables as a series of points. We can create a scatter plot in pandas using the plot() function with the kind parameter set to ‘scatter’:

# Import the pandas library
import pandas as pd

# Read in the diamond.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')

# scatter plot of carat weight with price
df.plot(kind='scatter', x='Carat Weight', y='Price')

Output:

Here, we create a scatter plot of the column Price against the Carat Weight column in our DataFrame.

Bar Plots

A bar plot is a graph that displays categorical data with rectangular bars. We can create a bar plot in pandas using the plot function with the kind parameter set to bar:

# Import the pandas library
import pandas as pd

# Read in the diamond.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')

# bar plot on counts of diamond by cut type
df['Cut'].value_counts().plot(kind = 'bar')

Output:

Here, we create a bar plot of the count of diamond by Cut type.

Histograms

A histogram is a graph that displays the distribution of a numerical variable. We can create a histogram in pandas using the plot function with the kind parameter set to hist:

# Import the pandas library
import pandas as pd

# Read in the diamond.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')

# histogram of Price
df['Price'].plot(kind = 'hist')

Output:

Here, we create a histogram of the Price column in our DataFrame.

Kernel Density Estimation Plots

Kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. We can create a KDE plot in pandas using the plot function with the kind parameter set to density:

# Import the pandas library
import pandas as pd

# Read in the diamond.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')

# histogram of Price
df['Price'].plot(kind = 'density')

Output:

Box Plots

A box plot is a graph that displays the distribution of a numerical variable. We can create a box plot in pandas using the plot function with the kind parameter set to box:

# Import the pandas library
import pandas as pd

# Read in the diamond.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/diamond.csv')

# histogram of Price
df['Price'].plot(kind = 'box')

Output:

Area Plots

An area plot is a graph that displays the evolution of numerical values of different variables over time or any other dimension. We can create an area plot in pandas using the plot function with the kind parameter set to area:

# Import the pandas library
import pandas as pd

# Read in the migration.csv data from a URL using pandas
df = pd.read_csv('https://raw.githubusercontent.com/pycaret/pycaret/master/datasets/migration.csv')

# Transpose the DataFrame so that the countries are in the columns
df = df.transpose()

# Set the column names to the values in the first row of the DataFrame
df.columns = df.iloc[0]

# Drop the row with the column names, which is now redundant
df = df.drop(index = 'Country Name')

# Rename the index to 'Year'
df = df.rename_axis('Year')

# Plot the migration data for Canada and USA
df[['Canada', 'United States']].plot(kind = 'area')

Output:

Conclusion

In this article, we have learned how to use pandas to create various types of plots and visualizations to explore and analyze data. We have covered some basic visualization techniques such as line plots, scatter plots, histograms, heatmaps, area plots, kernel density estimation plots, and area plots.

Pandas provides a powerful and flexible way to create visualizations with just a few lines of code. With pandas, we can easily explore and analyze our data visually and gain insights into the underlying patterns and trends. We hope this article has provided a helpful introduction to data visualization with pandas.

Liked the blog? Connect with Moez Ali

Moez Ali is an innovator and technologist. A data scientist turned product manager dedicated to creating modern and cutting-edge data products and growing vibrant open-source communities around them.

Creator of PyCaret, 100+ publications with 500+ citations, keynote speaker and globally recognized for open-source contributions in Python.

Let’s be friends! connect with me:

👉 LinkedIn 👉 Twitter 👉 Medium 👉 YouTube

🔥 Check out my brand new personal website: https://www.moez.ai.

To learn more about my open-source work: PyCaret, you can check out this GitHub repo or you can follow PyCaret’s Official LinkedIn page.

Listen to my talk on Time Series Forecasting with PyCaret in DATA+AI SUMMIT 2022 by Databricks.

🚀 My most read articles:

Python
Pandas
Programming
Data Visualization
Data Science
Recommended from ReadMedium
avatarAyesha sidhikha
Pandas Pivot Table

Pandas Pivot Table

6 min read