avatarCoucou Camille

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

7721

Abstract

="hljs-number">2</span>, <span class="hljs-string">'z'</span>: <span class="hljs-number">3</span>}) <span class="hljs-keyword">data</span>[<span class="hljs-string">"a"</span>] >>> KeyError: <span class="hljs-string">'a'</span></pre></div><ul><li>Multiple index values</li></ul><p id="26bb"><b>Syntax</b>: <code>series[index_list]</code></p><p id="fa53">Pass in the index values as as list:</p><div id="fdf9"><pre>data1 = pd.Series({<span class="hljs-string">'x'</span>: <span class="hljs-number">1</span>, <span class="hljs-string">'y'</span>: <span class="hljs-number">2</span>, <span class="hljs-string">'z'</span>: <span class="hljs-number">3</span>}) data1<span class="hljs-string">[["x", "z"]]</span> >>> x <span class="hljs-number">1</span> z <span class="hljs-number">3</span> dtype: int64 </pre></div><div id="23c9"><pre>data2 = pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>]) data2<span class="hljs-string">[[1, 2]]</span> >>> <span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">2</span> <span class="hljs-number">3</span> dtype: int64</pre></div><h2 id="eeec">Series like a Dictionary</h2><p id="c5be">Like a dictionary, Series has similar functions to check if it consists of certain elements, or to update certain elements by index.</p><ul><li>Check whether in index</li></ul><div id="71b5"><pre><span class="hljs-attr">data</span> = pd.Series({<span class="hljs-string">'x'</span>: <span class="hljs-number">1</span>, <span class="hljs-string">'y'</span>: <span class="hljs-number">2</span>, <span class="hljs-string">'z'</span>: <span class="hljs-number">3</span>})</pre></div><div id="91fa"><pre><span class="hljs-string">'x'</span> <span class="hljs-keyword">in</span> <span class="hljs-class"><span class="hljs-keyword">data</span></span> >>> <span class="hljs-type">True</span></pre></div><div id="77b2"><pre>1 in data <span class="hljs-meta prompt_">>>></span> <span class="language-python"><span class="hljs-literal">False</span></span></pre></div><ul><li>Check whether in value</li></ul><div id="7a45"><pre>1 in data.values <span class="hljs-meta prompt_">>>></span> <span class="language-python"><span class="hljs-literal">True</span></span></pre></div><ul><li>Update value by index</li></ul><div id="c7d2"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.<span class="hljs-type">Series</span>({'<span class="hljs-title">x'</span>: 1, '<span class="hljs-title">y'</span>: 2, '<span class="hljs-title">z'</span>: 3})</span> <span class="hljs-class"><span class="hljs-keyword">data</span>['x'] = 5</span> <span class="hljs-class"><span class="hljs-keyword">data</span></span> >>> <span class="hljs-title">x</span> <span class="hljs-number">5</span> <span class="hljs-title">y</span> <span class="hljs-number">2</span> <span class="hljs-title">z</span> <span class="hljs-number">3</span> <span class="hljs-title">dtype</span>: int64</pre></div><p id="22cb">If the index was not originally in Series, it will create a new entry for the index and value, just like in dictionary.</p><div id="aed7"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.<span class="hljs-type">Series</span>({'<span class="hljs-title">x'</span>: 1, '<span class="hljs-title">y'</span>: 2, '<span class="hljs-title">z'</span>: 3})</span> <span class="hljs-class"><span class="hljs-keyword">data</span>['a'] = 0</span> <span class="hljs-class"><span class="hljs-keyword">data</span></span> >>> <span class="hljs-title">x</span> <span class="hljs-number">1</span> <span class="hljs-title">y</span> <span class="hljs-number">2</span> <span class="hljs-title">z</span> <span class="hljs-number">3</span> <span class="hljs-title">a</span> <span class="hljs-number">0</span> <span class="hljs-title">dtype</span>: int64</pre></div><h2 id="3dbe">Series like an Array</h2><p id="883b">Using the same Series for the following operations:</p><div id="d83c"><pre><span class="hljs-attr">data</span> = pd.Series({<span class="hljs-string">'x'</span>: <span class="hljs-number">1</span>, <span class="hljs-string">'y'</span>: <span class="hljs-number">2</span>, <span class="hljs-string">'z'</span>: <span class="hljs-number">3</span>})</pre></div><p id="83da">It allows similar operations like <code>numpy.ndarray</code> :</p><ul><li>Query Data Type of Series with <code>.dtype</code></li></ul><div id="7768"><pre><span class="hljs-title">data</span>.d<span class="hljs-keyword">type</span> >>> d<span class="hljs-keyword">type</span>('int64')</pre></div><ul><li>Shape of Series with <code>.shape</code></li></ul><div id="b3c9"><pre><span class="hljs-keyword">data</span>.<span class="hljs-built_in">shape</span> >>> (<span class="hljs-number">3</span>, )</pre></div><ul><li>Number of elements in Series: <code>size</code> by definition is 1</li></ul><div id="3feb"><pre><span class="hljs-keyword">data</span>.<span class="hljs-built_in">size</span> >>> <span class="hljs-number">3</span></pre></div><ul><li>Slicing of Series by index position instead of key. The index could be a single index or a a list of indices. <code>iloc</code> using the index value would throw a <b>TypeError </b>instead.</li></ul><div id="cd27"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.iloc[0]</span> >>> <span class="hljs-number">1</span></pre></div><div id="f4c2"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.iloc[1]</span> >>> <span class="hljs-number">2</span></pre></div><div id="277b"><pre>data.iloc<span class="hljs-string">[[0, 2]]</span> >>> x <span class="hljs-number">1</span> z <span class="hljs-number">3</span> dtype: int64</pre></div><div id="3a6e"><pre>data.iloc[<span class="hljs-string">'x'</span>] >>> TypeError: Cannot <span class="hljs-keyword">index</span> <span class="hljs-keyword">by</span> <span class="hljs-keyword">location</span> <span class="hljs-keyword">index</span> <span class="hljs-keyword">with</span> a non-<span class="hljs-type">integer</span> key</pre></div><h2 id="989f">Boolean Filter</h2><ul><li>Check whether <b>all </b>elements are true</li></ul><div id="d913"><pre>pd.Series([<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>]).<span class="hljs-built_in">all</span>() <span class="hljs-meta">>>> </span><span class="hljs-literal">False</span></pre></div><div id="b4b4"><pre>pd.Series([<span class="hljs-literal">True</span>, <span class="hljs-literal">True</span>]).<span class="hljs-built_in">all</span>() <span class="hljs-meta">>>> </span><span class="hljs-literal">True</span></pre></div><ul><li>Check whether <b>any </b>element is true</li></ul><div id="b217"><pre>pd.Series([<span class="hljs-literal">True</span>, <span class="hljs-literal">False</span>]).<span class="hljs-built_in">any</span>() <span class="hljs-meta">>>> </span><span class="hljs-literal">True</span></pre></div><p id="93e7">Both can be applied to check if all/any element <b>satisfies certain conditions</b>.</p><div id="06ec"><pre>data = pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>]) data > <span class="hljs-number">3</span> <span class="hljs-meta">>>> </span> <span class="hljs-number">0</span> <span class="hljs-literal">False</span> <span class="hljs-number">1</span> <span class="hljs-literal">False</span> <span cl

Options

ass="hljs-number">2</span> <span class="hljs-literal">False</span> <span class="hljs-number">3</span> <span class="hljs-literal">True</span> <span class="hljs-number">4</span> <span class="hljs-literal">True</span> dtype: <span class="hljs-built_in">bool</span></pre></div><div id="f17f"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> - 3</span> >>> <span class="hljs-number">0</span> -<span class="hljs-number">2</span> <span class="hljs-number">1</span> -<span class="hljs-number">1</span> <span class="hljs-number">2</span> <span class="hljs-number">0</span> <span class="hljs-number">3</span> <span class="hljs-number">1</span> <span class="hljs-number">4</span> <span class="hljs-number">2</span> <span class="hljs-title">dtype</span>: int64</pre></div><p id="cf22">Hence, we could write <b>boolean filters</b> as follow:</p><p id="028d"><b>Example 1</b>: Check if any element is greater than 3</p><div id="dc6c"><pre>(<span class="hljs-class"><span class="hljs-keyword">data</span> > 3).any()</span> >>> <span class="hljs-type">True</span></pre></div><p id="7dc0"><b>Example 2</b>: Check if all elements are odd</p><div id="03d2"><pre>(<span class="hljs-class"><span class="hljs-keyword">data</span> % 2 == 1).all()</span> >>> <span class="hljs-type">False</span></pre></div><h2 id="bbba">Check for NA</h2><p id="0b44">Sometimes the Series might contain some NA values, such as <code>None</code> or <code>numpy.NaN</code></p><ul><li>Check if there are any NaNs</li></ul><div id="f930"><pre>pd.Series([1, 2, 3]).hasnans <span class="hljs-meta prompt_">>>></span> <span class="language-python"><span class="hljs-literal">False</span></span></pre></div><div id="3b86"><pre><span class="hljs-variable">pd</span><span class="hljs-operator">.</span><span class="hljs-built_in">Series</span><span class="hljs-punctuation">(</span><span class="hljs-punctuation">[</span><span class="hljs-number">1</span><span class="hljs-operator">,</span> <span class="hljs-number">2</span><span class="hljs-operator">,</span> <span class="hljs-built_in">None</span><span class="hljs-punctuation">]</span><span class="hljs-punctuation">)</span><span class="hljs-operator">.</span><span class="hljs-variable">hasnans</span> <span class="hljs-operator">>>></span> <span class="hljs-built_in">True</span></pre></div><div id="eac6"><pre><span class="hljs-keyword">from</span> numpy <span class="hljs-keyword">import</span> <span class="hljs-keyword">NaN</span> pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-keyword">NaN</span>]).hasnans >>> <span class="hljs-keyword">True</span></pre></div><ul><li>Check if values in Series is NA: returns a boolean object with same size</li></ul><div id="08dc"><pre>pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">None</span>]).isna() <span class="hljs-meta">>>> </span> <span class="hljs-number">0</span> <span class="hljs-literal">False</span> <span class="hljs-number">1</span> <span class="hljs-literal">False</span> <span class="hljs-number">2</span> <span class="hljs-literal">True</span> dtype: <span class="hljs-built_in">bool</span></pre></div><ul><li>Check if values in Series is not NA: returns a boolean object with same size</li></ul><div id="369f"><pre>pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">None</span>]).notna() <span class="hljs-meta">>>> </span> <span class="hljs-number">0</span> <span class="hljs-literal">True</span> <span class="hljs-number">1</span> <span class="hljs-literal">True</span> <span class="hljs-number">2</span> <span class="hljs-literal">False</span> dtype: <span class="hljs-built_in">bool</span></pre></div><ul><li>Drop NA: returns a new Series with NAs removed</li></ul><div id="fac2"><pre><span class="hljs-keyword">from</span> numpy <span class="hljs-keyword">import</span> <span class="hljs-literal">NaN</span> pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">NaN</span>]).dropna() >>> <span class="hljs-number">0</span> <span class="hljs-number">1.0</span> <span class="hljs-number">1</span> <span class="hljs-number">2.0</span> dtype: float64</pre></div><ul><li>Fill NA: returns a new Series with NAs replaced with specified value</li></ul><div id="7876"><pre><span class="hljs-keyword">from</span> numpy <span class="hljs-keyword">import</span> <span class="hljs-literal">NaN</span> pd.Series([<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">NaN</span>]).fillna(<span class="hljs-number">0</span>) >>> <span class="hljs-number">0</span> <span class="hljs-number">1.0</span> <span class="hljs-number">1</span> <span class="hljs-number">2.0</span> <span class="hljs-number">2</span> <span class="hljs-number">0.0</span> dtype: float64</pre></div><h2 id="d85a">Sorting</h2><ul><li>Sorting by index: <code>Series.sort_index(ascending=True, ...)</code></li></ul><div id="020f"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.<span class="hljs-type">Series</span>({'<span class="hljs-title">x'</span>: 3, '<span class="hljs-title">y'</span>: 2, '<span class="hljs-title">z'</span>: 1})</span> <span class="hljs-class"><span class="hljs-keyword">data</span>.sort_index()</span> >>> <span class="hljs-title">z</span> <span class="hljs-number">1</span> <span class="hljs-title">y</span> <span class="hljs-number">2</span> <span class="hljs-title">x</span> <span class="hljs-number">3</span> <span class="hljs-title">dtype</span>: int64</pre></div><p id="80f9">In descending order:</p><div id="422d"><pre><span class="hljs-class"><span class="hljs-keyword">data</span>.sort_index(<span class="hljs-title">ascending</span>=<span class="hljs-type">False</span>)</span> >>> <span class="hljs-title">z</span> <span class="hljs-number">1</span> <span class="hljs-title">y</span> <span class="hljs-number">2</span> <span class="hljs-title">x</span> <span class="hljs-number">3</span> <span class="hljs-title">dtype</span>: int64</pre></div><ul><li>Sorting by value:</li></ul><div id="9d9e"><pre><span class="hljs-class"><span class="hljs-keyword">data</span> = pd.<span class="hljs-type">Series</span>({'<span class="hljs-title">x'</span>: 3, '<span class="hljs-title">y'</span>: 2, '<span class="hljs-title">z'</span>: 1})</span> <span class="hljs-class"><span class="hljs-keyword">data</span>.sort_values()</span> >>> <span class="hljs-title">z</span> <span class="hljs-number">1</span> <span class="hljs-title">y</span> <span class="hljs-number">2</span> <span class="hljs-title">x</span> <span class="hljs-number">3</span> <span class="hljs-title">dtype</span>: int64</pre></div><h2 id="7740">Max & Min</h2><ul><li>Position of greatest value: <code>argmax()</code></li></ul><div id="9014"><pre>pd<span class="hljs-selector-class">.Series</span>(<span class="hljs-selector-attr">[1, 2, 3, 4, 5]</span>)<span class="hljs-selector-class">.argmax</span>() >>> <span class="hljs-number">4</span></pre></div><ul><li>Position of smallest value: <code>argmin()</code></li></ul><div id="c7a5"><pre>pd<span class="hljs-selector-class">.Series</span>(<span class="hljs-selector-attr">[1, 2, 3, 4, 5]</span>)<span class="hljs-selector-class">.argmin</span>() >>> <span class="hljs-number">0</span></pre></div><p id="1648">Thanks for reading and stay tuned for more articles on Python!</p></article></body>

Pandas Operations in Python — Series

Pandas is built on top of numpy, provides efficient data structures and data processing tools in Python. It is commonly used together with packages matplotlib, seaborn, statsmodels, scikit-learn for data analysis and visualization.

Image by Author

Pandas

Compared to numpy, pandas mostly deals with tabular data, which allows different data types for each column, while numpy requires a uniform data type. It provides two main data structures: one-dimensional labelled Series and two-dimensional labelled DataFrame.

Pandas is usually imported under the alias “pd”:

import pandas as pd

This article will be introducing the common functions for pandas Series:Series CreationSeries ValuesSeries IndexAccess Series by IndexSeries like a DictionarySeries like an ArrayBoolean FilterCheck for NASortingMax & Min

Pandas Data Structure - Series

Series is a one-dimensional labelled array, with an index column and a data column.

Series Creation

Syntax: pd.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

  • data : a dictionary, array or any iterable object
  • index : optional, default being [0, … len(data)-1]
  • dtype : optional, set data type for output Series
  • name : optional string for name of Series

Example 1: index not specified

# using list as data
data = pd.Series([1, 2, 3])
data
>>> 
0    1
1    2
2    3
dtype: int64

When using a dictionary for data , the keys will automatically be taken as index, and values as data.

# using dictionary as data
data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data
>>>
x    1
y    2
z    3
dtype: int64

Example 2: specified index

# using list as data
data = pd.Series([1, 2, 3], index=list('abc'))
data
>>> 
a    1
b    2
c    3
dtype: int64

When using a dictionary for data , index values have no effect if they match with the dictionary keys. Otherwise the Series will be re-indexed with the given index values, hence getting NaN as a result.

# match
data = pd.Series({'x': 1, 'y': 2, 'z': 3}, index=list('xyz'))
data
>>> 
x    1
y    2
z    3
dtype: int64
# no match
data = pd.Series({'x': 1, 'y': 2, 'z': 3}, index=list('abc'))
data
>>>
a   NaN
b   NaN
c   NaN
dtype: float64

Series Values

Syntax: series.values returns the values of Series as an array.

data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data.values
>>> array([1, 2, 3], dtype=int64)

Series Index

Syntax: series.index returns the index values as an array.

data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data.index
>>> Index(['x', 'y', 'z'], dtype='object')

Access Series by Index

  • Single index value

Syntax: series[index]

data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data["x"]

The program throws a KeyError if index is not in Series:

data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data["a"]
>>> KeyError: 'a'
  • Multiple index values

Syntax: series[index_list]

Pass in the index values as as list:

data1 = pd.Series({'x': 1, 'y': 2, 'z': 3})
data1[["x", "z"]]
>>> 
x    1
z    3
dtype: int64
data2 = pd.Series([1, 2, 3])
data2[[1, 2]]
>>> 
1    2
2    3
dtype: int64

Series like a Dictionary

Like a dictionary, Series has similar functions to check if it consists of certain elements, or to update certain elements by index.

  • Check whether in index
data = pd.Series({'x': 1, 'y': 2, 'z': 3})
'x' in data
>>> True
1 in data
>>> False
  • Check whether in value
1 in data.values
>>> True
  • Update value by index
data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data['x'] = 5
data
>>> 
x    5
y    2
z    3
dtype: int64

If the index was not originally in Series, it will create a new entry for the index and value, just like in dictionary.

data = pd.Series({'x': 1, 'y': 2, 'z': 3})
data['a'] = 0
data
>>>
x    1
y    2
z    3
a    0
dtype: int64

Series like an Array

Using the same Series for the following operations:

data = pd.Series({'x': 1, 'y': 2, 'z': 3})

It allows similar operations like numpy.ndarray :

  • Query Data Type of Series with .dtype
data.dtype
>>> dtype('int64')
  • Shape of Series with .shape
data.shape
>>> (3, )
  • Number of elements in Series: size by definition is 1
data.size
>>> 3
  • Slicing of Series by index position instead of key. The index could be a single index or a a list of indices. iloc using the index value would throw a TypeError instead.
data.iloc[0]
>>> 1
data.iloc[1]
>>> 2
data.iloc[[0, 2]]
>>> 
x    1
z    3
dtype: int64
data.iloc['x']
>>> TypeError: Cannot index by location index with a non-integer key

Boolean Filter

  • Check whether all elements are true
pd.Series([True, False]).all()
>>> False
pd.Series([True, True]).all()
>>> True
  • Check whether any element is true
pd.Series([True, False]).any()
>>> True

Both can be applied to check if all/any element satisfies certain conditions.

data = pd.Series([1, 2, 3, 4, 5])
data > 3
>>> 
0    False
1    False
2    False
3     True
4     True
dtype: bool
data - 3
>>> 
0   -2
1   -1
2    0
3    1
4    2
dtype: int64

Hence, we could write boolean filters as follow:

Example 1: Check if any element is greater than 3

(data > 3).any()
>>> True

Example 2: Check if all elements are odd

(data % 2 == 1).all()
>>> False

Check for NA

Sometimes the Series might contain some NA values, such as None or numpy.NaN

  • Check if there are any NaNs
pd.Series([1, 2, 3]).hasnans
>>> False
pd.Series([1, 2, None]).hasnans
>>> True
from numpy import NaN
pd.Series([1, 2, NaN]).hasnans
>>> True
  • Check if values in Series is NA: returns a boolean object with same size
pd.Series([1, 2, None]).isna()
>>> 
0    False
1    False
2     True
dtype: bool
  • Check if values in Series is not NA: returns a boolean object with same size
pd.Series([1, 2, None]).notna()
>>> 
0     True
1     True
2    False
dtype: bool
  • Drop NA: returns a new Series with NAs removed
from numpy import NaN
pd.Series([1, 2, NaN]).dropna()
>>> 
0    1.0
1    2.0
dtype: float64
  • Fill NA: returns a new Series with NAs replaced with specified value
from numpy import NaN
pd.Series([1, 2, NaN]).fillna(0)
>>> 
0    1.0
1    2.0
2    0.0
dtype: float64

Sorting

  • Sorting by index: Series.sort_index(ascending=True, ...)
data = pd.Series({'x': 3, 'y': 2, 'z': 1})
data.sort_index()
>>> 
z    1
y    2
x    3
dtype: int64

In descending order:

data.sort_index(ascending=False)
>>>
z    1
y    2
x    3
dtype: int64
  • Sorting by value:
data = pd.Series({'x': 3, 'y': 2, 'z': 1})
data.sort_values()
>>> 
z    1
y    2
x    3
dtype: int64

Max & Min

  • Position of greatest value: argmax()
pd.Series([1, 2, 3, 4, 5]).argmax()
>>> 4
  • Position of smallest value: argmin()
pd.Series([1, 2, 3, 4, 5]).argmin()
>>> 0

Thanks for reading and stay tuned for more articles on Python!

Python
Series
Pandas Dataframe
Recommended from ReadMedium