avatarAlain Saamego

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3738

Abstract

iv><p id="43a3">This code will print the first five rows of the data.</p><p id="1dfb"><b>Output</b></p><figure id="9de2"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*ysDiJGMyc7ITVTPL19NtVQ.png"><figcaption></figcaption></figure><p id="cce6"><i>We can also look at the last few rows:</i></p><div id="30a3"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.tail()</span></pre></div><p id="6fc6">This code will print the last five rows of the data.</p><p id="2e41"><b>Output</b></p><figure id="5347"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*GFYXtk5qvm69baxjmu2fTw.png"><figcaption></figcaption></figure><p id="3003"><i>We can also get information about the columns in the data:</i></p><div id="7f9e"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.columns</span></pre></div><p id="3c4c">This code will print the names of the columns in the data.</p><p id="6d88"><b>Output</b></p><figure id="3a8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*2S1R02cUxJ62ZxnpJBc7ww.png"><figcaption></figcaption></figure><p id="a0eb"><i>We can also get information about the data types of the columns:</i></p><div id="034d"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.dtypes</span></pre></div><p id="5643">This code will print the data types of the columns in the data.</p><p id="6cd7"><b>Output</b></p><figure id="2539"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CmilP_CK_0qvw5UJ_Z-iFg.png"><figcaption></figcaption></figure><p id="253c">We can also get summary statistics about the data:</p><div id="c0f3"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.describe()</span></pre></div><p id="85f4">This code will print a summary of the numerical columns in the data.</p><p id="60d3"><b>Output</b></p><figure id="c36f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qYiMdh5vJNyLhXTHfq-d1g.png"><figcaption></figcaption></figure><p id="f0f5">These are just a few of the functions that pandas provide for exploring data. For more information on these and other functions, see the pandas documentation.</p><h1 id="f8b7">Cleaning Data with pandas</h1><p id="3594">Once we have explored the data, we may need to clean it before we can start performing statistical analysis.</p><p id="8989">There are a number of different things that we may need to do to clean the data, such as:</p><p id="23c3">Remove invalid or missing values</p><p id="0b5d">Convert data types</p><p id="f338">Encode categorical variables</p><p id="62a0">pandas provide a number of functions for doing these and other tasks.</p><p id="8915">For this tutorial, we will focus on two of these functions: <b><i>dropna()</i></b> and <b><i>astype()</i></b>.</p><p id="df66">The <b><i>dropna()</i></b> function can be used to remove rows or columns that contain missing values:</p><div id="9f1f"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.dropna()</span></pre></div><p id="54a2">This code will drop all rows that contain missing values.</p><p id="272e"><b>Output</b></p><figure id="d2b1"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*qUQLSIr79qbVU9-XfKy6Wg.png"><figcaption></figcaption></figure><p id="47f6">The <b><i>astype()</i></b> function can be used to convert data types:</p><div id="1c39"><pre><span class="hljs-title">data</span>[“population”] = data[“population”].<span class="hljs-keyword">as</span><span class="hljs-keyword">type</span>(float)</pre></div><p id="af09">This code will convert the “population” column to a floating point data type.</p><p id="2ca3"><b>Output</b></p><figure id="9357"><img src="https://cdn-images-1.readmedium.com/v2/r

Options

esize:fit:800/1*cxf31GX9TXCL-x6K9Mv0Qg.png"><figcaption></figcaption></figure><p id="c98a">Once the data is cleaned, we can start performing statistical analysis.</p><h1 id="e48a">Performing Statistical Analysis with pandas</h1><p id="f33f">pandas provides a number of functions for performing statistical analysis.</p><p id="c465">We will cover two of these functions in this tutorial:</p><p id="4fa9">The <b>mean()</b> function can be used to calculate the mean of a column:</p><div id="fece"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>[“population”].mean()</span></pre></div><p id="b817">This code will calculate the mean of the “population” column.</p><p id="e382"><b>Output</b></p><figure id="494c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Y3idap9pd8l2n6kGqTDFuw.png"><figcaption></figcaption></figure><p id="75e5">The median() function can be used to calculate the median of a column:</p><div id="77b2"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>[“<span class="hljs-type">Population</span>”].median()</span></pre></div><p id="2718">This code will calculate the median of the “Population” column.</p><p id="cdba"><b>Output</b></p><figure id="a32b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*sc5CwWBKWw42lBgmpEXEjQ.png"><figcaption></figcaption></figure><p id="1905">These are just a few of the statistical functions that pandas provide. For more information on these and other functions, see the pandas documentation.</p><p id="fe31">In this tutorial, we have learned how to perform statistical analysis on real-world datasets using Python. We have used the pandas package to load, manipulate, and analyze data.</p><p id="32c7">We have also covered how to clean data and how to perform statistical analysis with pandas.</p><p id="b733">This tutorial is just the beginning. There is a lot more than you can do with pandas. For more information, see the pandas documentation.</p><p id="2c85"><b><i>Before you leave:</i></b></p><div id="c50d" class="link-block"> <a href="https://medium.com/@alains/membership"> <div> <div> <h2>If you’re not a Medium member yet, join Medium today! — Alain S. </h2> <div><h3>For just $5 a month, you have access to all of the medium’s articles. A portion of your membership fee goes to writers you read that‘s an incredible deal, and there’s no extra cost to you.</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*6syvnv8DJNqRC2TT)"></div> </div> </div> </a> </div><p id="d8fd">So don’t wait — <a href="https://medium.com/@alains/membership">sign up now</a> and start enjoying all that Medium has to offer.</p><p id="b65a"><b><i>If you enjoyed this article,</i></b> please <a href="https://medium.com/@alains">follow me</a> and give me a few claps or maybe <i>if you want to support me, <a href="https://ko-fi.com/alains"><b>buy me a cup of coffee!</b></a><b> </b></i>I would greatly appreciate it! Thank you in advance.</p><blockquote id="6256"><p><b><i>All of my articles can be found on <a href="https://medium.com/@alains">this page</a>.</i></b></p></blockquote><p id="88bd"><a href="https://twitter.com/alainsamego"><i>Follow me on Twitter </i></a><i>if you want even more content on:</i></p><blockquote id="5433"><p>- Making Money Online

  • Passive Income
  • Digital Assets
  • Side Hustles
  • Self-improvement
  • Personal Development
  • Relationship
  • Programming
  • Data Science
  • Artificial Intelligence
  • Fiction</p></blockquote></article></body>

Python Tutorial: Perform Powerful Statistical Analysis 0n Real-World Data With Ease!

Learn how to perform statistical analysis on real-world datasets!

Photo by Anton on Unsplash

In this tutorial, we will learn how to perform statistical analysis on real-world datasets using Python. We will use the Python package pandas for this tutorial.

pandas is a powerful Python package that provides extensive functionality for data analysis and manipulation. It is built on top of the popular numerical computing library NumPy and features a rich set of functions for working with data.

pandas are particularly well-suited for working with tabular data, such as data from a database or spreadsheet. In this tutorial, we will learn how to use pandas to load, manipulate, and analyze real-world datasets.

We will cover the following topics in this tutorial:

Loading data into pandas

Exploring data with pandas

Cleaning data with pandas

Performing statistical analysis with pandas

Let’s get started!

Loading Data into pandas

Before we can start working with data in pandas, we first need to load it into the package. pandas provide a number of functions for reading data from different sources.

For this tutorial, we will be working with the following dataset:

The dataset contains information on different countries around the world, including their population, GDP, and life expectancy.

In this tutorial, we will use Google Colab to load out the dataset.

The dataset is available for free here

We can load this dataset into pandas using the read_csv() function:

import io
data = pd.read_csv(io.BytesIO(uploaded['countries of the world.csv']))

This code will read the CSV file “countries of the world.csv” and store the data in a pandas DataFrame object called “data”.

A DataFrame is a two-dimensional data structure that contains rows and columns. It is similar to a spreadsheet or a database table.

Once the data is loaded into a DataFrame, we can start manipulating and analyzing it.

Exploring Data with pandas

Once the data is loaded into a DataFrame, we can start exploring it. pandas provide a number of functions for getting information about the data.

We can start by looking at the first few rows of the data:

data.head()data.head()

This code will print the first five rows of the data.

Output

We can also look at the last few rows:

data.tail()

This code will print the last five rows of the data.

Output

We can also get information about the columns in the data:

data.columns

This code will print the names of the columns in the data.

Output

We can also get information about the data types of the columns:

data.dtypes

This code will print the data types of the columns in the data.

Output

We can also get summary statistics about the data:

data.describe()

This code will print a summary of the numerical columns in the data.

Output

These are just a few of the functions that pandas provide for exploring data. For more information on these and other functions, see the pandas documentation.

Cleaning Data with pandas

Once we have explored the data, we may need to clean it before we can start performing statistical analysis.

There are a number of different things that we may need to do to clean the data, such as:

Remove invalid or missing values

Convert data types

Encode categorical variables

pandas provide a number of functions for doing these and other tasks.

For this tutorial, we will focus on two of these functions: dropna() and astype().

The dropna() function can be used to remove rows or columns that contain missing values:

data.dropna()

This code will drop all rows that contain missing values.

Output

The astype() function can be used to convert data types:

data[“population”] = data[“population”].astype(float)

This code will convert the “population” column to a floating point data type.

Output

Once the data is cleaned, we can start performing statistical analysis.

Performing Statistical Analysis with pandas

pandas provides a number of functions for performing statistical analysis.

We will cover two of these functions in this tutorial:

The mean() function can be used to calculate the mean of a column:

data[“population”].mean()

This code will calculate the mean of the “population” column.

Output

The median() function can be used to calculate the median of a column:

data[“Population”].median()

This code will calculate the median of the “Population” column.

Output

These are just a few of the statistical functions that pandas provide. For more information on these and other functions, see the pandas documentation.

In this tutorial, we have learned how to perform statistical analysis on real-world datasets using Python. We have used the pandas package to load, manipulate, and analyze data.

We have also covered how to clean data and how to perform statistical analysis with pandas.

This tutorial is just the beginning. There is a lot more than you can do with pandas. For more information, see the pandas documentation.

Before you leave:

So don’t wait — sign up now and start enjoying all that Medium has to offer.

If you enjoyed this article, please follow me and give me a few claps or maybe if you want to support me, buy me a cup of coffee! I would greatly appreciate it! Thank you in advance.

All of my articles can be found on this page.

Follow me on Twitter if you want even more content on:

- Making Money Online - Passive Income - Digital Assets - Side Hustles - Self-improvement - Personal Development - Relationship - Programming - Data Science - Artificial Intelligence - Fiction

Python
Data Science
Statistics
Machine Learning
Tutorial
Recommended from ReadMedium