avatarHimanshu Sharma

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1408

Abstract

ng Skimpy using pip installation. The command given below will install Skimpy using pip.</p><div id="e8cc"><pre>pip <span class="hljs-keyword">install</span> skimpy</pre></div><h1 id="6151">Importing required libraries</h1><p id="ef42">In this step, we will import all the libraries that are required for creating the statistical analysis and loading the data.</p><div id="98df"><pre><span class="hljs-title">from</span> skimpy <span class="hljs-keyword">import</span> skim, generate_test_data <span class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns</pre></div><h1 id="cdc0">Creating Statistical Summary</h1><p id="e0cc">We will start by creating the statistical summary in a jupyter notebook, the dataset that we will be using here is defined under seaborn with the name Tips. Let’s create the Statistical Suymmary:</p><div id="474e"><pre>df = sns<span class="hljs-selector-class">.load_dataset</span>(<span class="hljs-string">"tips"</span>) <span class="hljs-function"><span class="hljs-title">skim</span><span class="hljs-params">(df)</span></span></pre></div><figure id="6d85"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Lsrsk03MC0FNgV2z9N6TfA.png"><figcaption>Summary(Source: By Author)</figcaption></figure><p id="ee1c">Here we can clearly see the analysis generated which contains all the data points and summary related to it. It contains Data

Options

Types, categories, Missing data, etc.</p><p id="bba5">Now let us see how to create this analysis in the console. We can do that by simply running the command given below. Remember to change the file name while running the command.</p><div id="b756"><pre><span class="hljs-attribute">skimpy</span> Diabetes.csv</pre></div><figure id="814c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*xGz-6Fvxf7mqLOCiJ7EnAA.png"><figcaption>Console(Source: By Author)</figcaption></figure><p id="cb1d">Here you can see how easily we can create the statistical summary easily from both console and notebook.</p><p id="1d63">Try this with different datasets, create a Statistical Summary, and let me know your comments in the response section.</p><p id="5429">This article is in collaboration with <a href="undefined">Piyush Ingale</a>.</p><h1 id="c1ef">Before You Go</h1><p id="3e0c"><b><i>Thanks</i></b><i> for reading! If you want to get in touch with me, feel free to reach me at [email protected] or my <a href="http://www.linkedin.com/in/himanshusharmads"><b>LinkedIn Profile</b></a>. You can view my <a href="https://github.com/hmix13"><b>Github</b> </a>profile for different data science projects and packages tutorials. Also, feel free to explore <a href="https://medium.com/@hmix13"><b>my profile</b></a> and read different articles I have written related to Data Science.</i></p></article></body>

Statistical Analysis in Python

Using Skimpy for Statistical Summary of Dataframes

Source: By Author

Statistical Analysis of the dataset helps in identifying trends, patterns, and relationships between different data points of quantitative data. It helps in unraveling hidden patterns that are underlying in the data and are not visible to naked eyes. It generally contains skewness, quartile, mean, etc.

Pandas provide describe function which can be used for generating the statistical analysis but it only displays summaries like mean, median, etc. and we need to use other functions like shape, type, etc to find out other properties of the dataset. But what if I tell you that we can do all this in a single line of code and generate a summary of the dataset?

Skimpy is an open-source python library that is used to generate a statistical summary of the quantitative datasets and can be used in Juptyer Notebook as well as console also.

In this article, we will explore Skimpy and create some statistical analysis using it.

Let’s get started…

Installing required libraries

We will start by installing Skimpy using pip installation. The command given below will install Skimpy using pip.

pip install skimpy

Importing required libraries

In this step, we will import all the libraries that are required for creating the statistical analysis and loading the data.

from skimpy import skim, generate_test_data
import seaborn as sns

Creating Statistical Summary

We will start by creating the statistical summary in a jupyter notebook, the dataset that we will be using here is defined under seaborn with the name Tips. Let’s create the Statistical Suymmary:

df = sns.load_dataset("tips")
skim(df)
Summary(Source: By Author)

Here we can clearly see the analysis generated which contains all the data points and summary related to it. It contains Data Types, categories, Missing data, etc.

Now let us see how to create this analysis in the console. We can do that by simply running the command given below. Remember to change the file name while running the command.

skimpy Diabetes.csv
Console(Source: By Author)

Here you can see how easily we can create the statistical summary easily from both console and notebook.

Try this with different datasets, create a Statistical Summary, and let me know your comments in the response section.

This article is in collaboration with Piyush Ingale.

Before You Go

Thanks for reading! If you want to get in touch with me, feel free to reach me at [email protected] or my LinkedIn Profile. You can view my Github profile for different data science projects and packages tutorials. Also, feel free to explore my profile and read different articles I have written related to Data Science.

Data Science
Python
Data Visualization
Machine Learning
Artificial Intelligence
Recommended from ReadMedium