avatarIrfan Alghani Khalid

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3388

Abstract

s say we sort the data based on the GDP per capita from the highest to the lowest capita. Here is the GIF that shows the process:</p><figure id="3b94"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*c_BJgx504TqfC3fjmuCqkw.gif"><figcaption></figcaption></figure><p id="11d9">Lastly, we can aggregate the data using the library. Let’s say we aggregate the life expectancy based on continents using the average. For doing that, you can see the process in the below GIF:</p><figure id="dc4d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*P0zcNW5itQxEdGRVR6IumQ.gif"><figcaption></figcaption></figure><h2 id="6011">Exploratory data analysis</h2><p id="b745">With dtale, you can do different kinds of visualizations. In case you want to analyze each column, you can use the ‘Describe’ feature from the library.</p><p id="4c25">To access the feature, you can hover to the top part of the interface and then choose Visualize > Describe like this:</p><figure id="80e0"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2mnygjEB-YkB-4nY0NklQg.gif"><figcaption></figcaption></figure><p id="cbbb">On the page, you can check and analyze each column. Let’s take a look at the life expectancy column. On the top side, you can see tabs that display different visualizations. On the below side, you can see information like unique values, outliers, and differences between values inside the column. Here is the preview of the Describe page:</p><figure id="79a9"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XlOxaczmyOsYBm3_Ou1_FQ.gif"><figcaption></figcaption></figure><p id="f331">Now let me explain to you each tab from the top side. The first tab is the describe tab which contains statistical summaries of the chosen column. It also displays the box plot from the column.</p><p id="231b">The second tab is the histogram tab which visualizes a histogram of a column. You can tweak the histogram visualization by changing the number of bins or grouping the data based on a specific column.</p><p id="16b2">The third tab is the grouping tab which visualizes a bar chart that aggregates the column values based on a categorical column. You can see that I aggregate the life expectancy values based on the continent. We can also change the aggregation method, whether using mean or median.</p><p id="3b70">The last tab is the Q-Q plot. This plot basically tells us about the distribution of the column has. You can see a straight line along with data points inside of it. The closer the data points fit the line, the normal the distribution is.</p><h2 id="45e4">Data visualization</h2><p id="3422">Besides analyzing the columns, we can do more visualizations using the library. All you need to do is hover the cursor to the top of the interface, and then click visualize > chart like this:</p><figure id="5d63"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*KnmdFtzUUqaqtE6bhqSMzQ.gif"><figcaption></figcaption></figure><p id="312c">Using this feature, we can create a line chart, scatter plot, or even create visualizations using the map.</p><p id="07ea">For creating the visualization process, you need to set parameters like the variables and the aggregation method. Here are the screenshots for creating the visualizations:</p><figure id="7b46"><img src="https://cdn-images-1.readmedium.com/v2/resize:fi

Options

t:800/1*rac0D0MR3hr-oVFpDcgqww.png"><figcaption></figcaption></figure><figure id="3318"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Jw4Qyx68USRz1tLyJ0ekMA.png"><figcaption></figcaption></figure><figure id="f558"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sID-DF2Uqs22uSJnFIPN2w.png"><figcaption></figcaption></figure><figure id="2401"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*jHnKSwPuSTltVAcH77hRaQ.png"><figcaption>From top left (clockwise): Map chart, scatter plot, bar chart, line chart</figcaption></figure><h2 id="ef63">Missing data analysis</h2><p id="561a">With the Dtale library, you can also analyze missing data by visualizing it. Unlike the previous part, let’s use the titanic dataset from Kaggle, which you can access <a href="https://www.kaggle.com/c/titanic"><b>here</b></a>. Here is the GIF of the missing analysis feature:</p><figure id="9d3d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RgWH48dj8sqKxD46MXh6-A.gif"><figcaption></figcaption></figure><p id="5bb6">There are several visualizations that we can make.</p><ul><li>Matrix is the first visualization where it displays the location of the missing data for each column.</li><li>The correlation heat map displays the correlation if the presence of a value affects the other.</li><li>The dendrogram displays the correlation of each variable further than the heat map.</li><li>The bar chart displays the number of not missing data for each column. As the bar gets higher, it means less data that is missing.</li></ul><h2 id="ff15">Exporting Code</h2><p id="52ec">Because this is a Python library, we can convert our processing steps into code. Let’s take the example of aggregating life expectancy based on continents. We’ve done this before, but now let’s convert it into code. Here is the GIF of the process:</p><figure id="7ec2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*M3_Tdf4efUfLnZFQlbBmhA.gif"><figcaption></figcaption></figure><p id="1ba8">Well done! Now you have learned about the Dtale library. With that interactive user interface, I hope it helps you to analyze the data easier.</p><p id="19f2">Thank you for reading my article.</p><h2 id="21ac">References</h2><p id="c3fc">[1] GitHub. <a href="https://github.com/man-group/dtale">https://github.com/man-group/dtale</a></p><p id="a222">Another related article you might enjoy:</p><div id="e899" class="link-block"> <a href="https://towardsdatascience.com/d-tale-one-of-the-best-python-libraries-you-have-ever-seen-c2deecdfd2b"> <div> <div> <h2>D-Tale: One of the Best Python Libraries You Have Ever Seen</h2> <div><h3>Here is my take on this must-have Python library and why you should give it a try</h3></div> <div><p>towardsdatascience.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*GdSIFuUAngEZq2Xa-lIvwg.jpeg)"></div> </div> </div> </a> </div><div id="c841"><pre>Want <span class="hljs-keyword">to</span> <span class="hljs-keyword">Connect</span>?</pre></div><div id="02b3"><pre>If you have any questions, you can contact <span class="hljs-keyword">me</span> via LinkedIn.</pre></div></article></body>

Meet Dtale — A Python Library To Analyze Data Interactively Like Excel

Analyze in detail with Dtale

Photo by Shane Aldendorff on Unsplash

Python is a programming language that can be used for many cases, and one of them is to analyze data. Python is capable of analyzing data on a large scale, something that the spreadsheet software couldn’t have (i.e. Microsoft Excel and Google Sheets).

Although libraries like Pandas are already enough for analyzing the data, analyzing data interactively, like on the spreadsheet software, is still helpful in some cases. In this article, I will show you how to analyze data interactively using a library called Dtale. Without further ado, let’s get started!

Implementation

Install the library

Before we can use the library, the first step that we need to do is to install the library using pip. Here is the command for doing that:

pip install dtale

The data source

For the data source, we will use the gapminder data as an example. Gapminder provides data like population number, GDP per capita, and life expectancy for every nation worldwide. You can download the data from Kaggle, which I put the link here.

The Screenshot is captured by the author.

Let’s open the data. To access the data with dtale, you can write this below code:

import dtale
import pandas as pd
df = pd.read_csv('your_data_path')
d = dtale.show(df)
d

By doing that, it will display an interface like this:

Data manipulation

So you’ve opened the dataset, but what things you can do with it? With dtale, you can do data manipulation just like you have done with Pandas. Let’s do filtering first. Let’s say we want to filter the data that comes from the year 2007. For doing that, here is the GIF that shows the process:

We can also sort the data by clicking the specific column and setting the parameters to it. Let’s say we sort the data based on the GDP per capita from the highest to the lowest capita. Here is the GIF that shows the process:

Lastly, we can aggregate the data using the library. Let’s say we aggregate the life expectancy based on continents using the average. For doing that, you can see the process in the below GIF:

Exploratory data analysis

With dtale, you can do different kinds of visualizations. In case you want to analyze each column, you can use the ‘Describe’ feature from the library.

To access the feature, you can hover to the top part of the interface and then choose Visualize > Describe like this:

On the page, you can check and analyze each column. Let’s take a look at the life expectancy column. On the top side, you can see tabs that display different visualizations. On the below side, you can see information like unique values, outliers, and differences between values inside the column. Here is the preview of the Describe page:

Now let me explain to you each tab from the top side. The first tab is the describe tab which contains statistical summaries of the chosen column. It also displays the box plot from the column.

The second tab is the histogram tab which visualizes a histogram of a column. You can tweak the histogram visualization by changing the number of bins or grouping the data based on a specific column.

The third tab is the grouping tab which visualizes a bar chart that aggregates the column values based on a categorical column. You can see that I aggregate the life expectancy values based on the continent. We can also change the aggregation method, whether using mean or median.

The last tab is the Q-Q plot. This plot basically tells us about the distribution of the column has. You can see a straight line along with data points inside of it. The closer the data points fit the line, the normal the distribution is.

Data visualization

Besides analyzing the columns, we can do more visualizations using the library. All you need to do is hover the cursor to the top of the interface, and then click visualize > chart like this:

Using this feature, we can create a line chart, scatter plot, or even create visualizations using the map.

For creating the visualization process, you need to set parameters like the variables and the aggregation method. Here are the screenshots for creating the visualizations:

From top left (clockwise): Map chart, scatter plot, bar chart, line chart

Missing data analysis

With the Dtale library, you can also analyze missing data by visualizing it. Unlike the previous part, let’s use the titanic dataset from Kaggle, which you can access here. Here is the GIF of the missing analysis feature:

There are several visualizations that we can make.

  • Matrix is the first visualization where it displays the location of the missing data for each column.
  • The correlation heat map displays the correlation if the presence of a value affects the other.
  • The dendrogram displays the correlation of each variable further than the heat map.
  • The bar chart displays the number of not missing data for each column. As the bar gets higher, it means less data that is missing.

Exporting Code

Because this is a Python library, we can convert our processing steps into code. Let’s take the example of aggregating life expectancy based on continents. We’ve done this before, but now let’s convert it into code. Here is the GIF of the process:

Well done! Now you have learned about the Dtale library. With that interactive user interface, I hope it helps you to analyze the data easier.

Thank you for reading my article.

References

[1] GitHub. https://github.com/man-group/dtale

Another related article you might enjoy:

Want to Connect?
If you have any questions, you can contact me via LinkedIn.
Machine Learning
Data Science
Programming
Python
Coding
Recommended from ReadMedium