avatarAmit Chauhan

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9642

Abstract

The correlation means that how one feature depend on other feature.</p><div id="f7bf"><pre><span class="hljs-keyword">def</span> <span class="hljs-title function_">histogram_intersection</span>(<span class="hljs-params">x, y</span>): v = np.minimum(x, y).<span class="hljs-built_in">sum</span>().<span class="hljs-built_in">round</span>(decimals=<span class="hljs-number">1</span>) <span class="hljs-keyword">return</span> v

s1 = pd.Series([<span class="hljs-number">.44</span>, <span class="hljs-number">.0</span>, <span class="hljs-number">.5</span>, <span class="hljs-number">.15</span>, <span class="hljs-number">.10</span>, <span class="hljs-number">.3</span>]) s2 = pd.Series([<span class="hljs-number">.0</span>, <span class="hljs-number">.5</span>, <span class="hljs-number">.2</span>, <span class="hljs-number">.14</span>, <span class="hljs-number">.19</span>, <span class="hljs-number">.24</span>]) s1.corr(s2, method=histogram_intersection)

<span class="hljs-comment">#Output:</span> <span class="hljs-number">0.7</span></pre></div><p id="ea9a"><b>7. N_largest Method</b></p><p id="db68">In this method, it will return the most largest count occurred in the data frame.</p><div id="0f74"><pre>data = {<span class="hljs-string">'col_0'</span>: [-<span class="hljs-number">44</span>,<span class="hljs-number">0</span>,<span class="hljs-number">5</span>, <span class="hljs-number">15</span>, <span class="hljs-number">10</span>, -<span class="hljs-number">3</span>], <span class="hljs-string">'col_1'</span>: [<span class="hljs-number">0</span>,<span class="hljs-number">5</span>,-<span class="hljs-number">2</span>, -<span class="hljs-number">14</span>, <span class="hljs-number">19</span>, <span class="hljs-number">24</span>]}

df = pd.DataFrame(data) df.nlargest(<span class="hljs-number">5</span>, <span class="hljs-string">'col_0'</span>)

<span class="hljs-comment">#output:</span></pre></div><figure id="4d7c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4BW2QAdP6ZflaBC33XDQXw.png"><figcaption></figcaption></figure><p id="d66f"><b>8. N_Smallest Method</b></p><p id="1db2">In this method, it will return the most smallest count occurred in the data frame.</p><div id="ee77"><pre>data = {<span class="hljs-string">'col_0'</span>: [-<span class="hljs-number">44</span>,<span class="hljs-number">0</span>,<span class="hljs-number">5</span>, <span class="hljs-number">15</span>, <span class="hljs-number">10</span>, -<span class="hljs-number">3</span>], <span class="hljs-string">'col_1'</span>: [<span class="hljs-number">0</span>,<span class="hljs-number">5</span>,-<span class="hljs-number">2</span>, -<span class="hljs-number">14</span>, <span class="hljs-number">19</span>, <span class="hljs-number">24</span>]}

df = pd.DataFrame(data) df.nsmallest(<span class="hljs-number">5</span>, <span class="hljs-string">'col_0'</span>)

<span class="hljs-comment">#output:</span></pre></div><figure id="84bb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hjMDTsVjpiHZEUzPN71mkw.png"><figcaption></figcaption></figure><p id="cb30"><b>9. Unique Method</b></p><p id="abd3">This method is used to find the unique values in the features/columns.</p><div id="17d2"><pre>series = pd.Series([<span class="hljs-number">4.0</span>, <span class="hljs-number">6.0</span>, <span class="hljs-number">7.0</span>, <span class="hljs-number">6.0</span>, <span class="hljs-number">4.0</span>]) series.unique()

<span class="hljs-comment">#Output:</span> array([<span class="hljs-number">4.</span>, <span class="hljs-number">6.</span>, <span class="hljs-number">7.</span>])</pre></div><div id="52c8" class="link-block"> <a href="https://readmedium.com/40-most-insanely-usable-methods-in-python-a983c78f5bfd"> <div> <div> <h2>40 Most Insanely Usable Methods in Python</h2> <div><h3>Data cleaning and wrangling in data science and machine learning</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*jQnlimgu0VdPkTFh)"></div> </div> </div> </a> </div><div id="53ce" class="link-block"> <a href="https://readmedium.com/step-by-step-depth-introduction-of-matplotlib-with-python-8386d75b361d"> <div> <div> <h2>Step-by-Step Depth Introduction of Matplotlib with Python</h2> <div><h3>A useful handful of examples for data science and machine learning projects</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*yifcgzdFj7blB_iXSeQ-LA.png)"></div> </div> </div> </a> </div><p id="2538"><b>10. Value_count Method</b></p><p id="586f">In this method, it will count the values that occurs repeatedly in the data frame or in series.</p><div id="74c8"><pre>series = pd.Series([<span class="hljs-number">4.0</span>, <span class="hljs-number">6.0</span>, <span class="hljs-number">7.0</span>, <span class="hljs-number">6.0</span>, <span class="hljs-number">4.0</span>]) series.unique()

<span class="hljs-comment">#Output:</span> <span class="hljs-number">6.0</span> <span class="hljs-number">2</span> <span class="hljs-number">4.0</span> <span class="hljs-number">2</span> <span class="hljs-number">7.0</span> <span class="hljs-number">1</span> dtype: int64</pre></div><p id="f832"><b>11. Drop Method</b></p><p id="cb4b">This method very much useful in the data frame to remove the unnecessary columns from the data frame.</p><div id="36e8"><pre>data = {<span class="hljs-string">'Age'</span>: [-<span class="hljs-number">44</span>,<span class="hljs-number">0</span>,<span class="hljs-number">5</span>, <span class="hljs-number">15</span>, <span class="hljs-number">10</span>, -<span class="hljs-number">3</span>], <span class="hljs-string">'Salary'</span>: [<span class="hljs-number">0</span>,<span class="hljs-number">5</span>,-<span class="hljs-number">2</span>, -<span class="hljs-number">14</span>, <span class="hljs-number">19</span>, <span class="hljs-number">24</span>]}

df2 = pd.DataFrame(data) df2.drop(<span class="hljs-string">'Age'</span>, axis=<span class="hljs-string">'columns'</span>)

<span class="hljs-comment">#output:</span></pre></div><figure id="fc82"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6gUUbnEHZfMemRqLDKiZmA.png"><figcaption></figcaption></figure><p id="5f48"><b>12. Head Method</b></p><p id="d77c">This method is used to view the some starting rows of the data frame.</p><div id="e89f"><pre>data = {<span class="hljs-string">'Age'</span>: [-<span class="hljs-number">44</span>,<span class="hljs-number">0</span>,<span class="hljs-number">5</span>, <span class="hljs-number">15</span>, <span class="hljs-number">10</span>, -<span class="hljs-number">3</span>], <span class="hljs-string">'Salary'</span>: [<span class="hljs-number">0</span>,<span class="hljs-number">5</span>,-<span class="hljs-number">2</span>, -<span class="hljs-number">14</span>, <span class="hljs-number">19</span>, <span class="hljs-number">24</span>]}

df2 = pd.DataFrame(data) df.head(<span class="hljs-number">2</span>)

<span class="hljs-comment">#output:</span> Age Salary <span class="hljs-number">0</span> -<span class="hljs-number">44</span> <span class="hljs-number">0</span> <span class="hljs-number">1</span> <span class="hljs-number">0</span> <span class="hljs-number">5</span></pre></div><p id="f068"><b>13. Truncate Method</b></p><p id="c9f3">This method is useful to get the data based on the condition in the data frame.</p><div id="ebb6"><pre>data = {<span class="hljs-string">'Age'</span>: [-<span class="hljs-number">44</span>,<span class="hljs-number">0</span>,<span class="hljs-number">5</span>, <span class="hljs-number">15</span>, <span class="hljs-number">10</span>, -<span class="hljs-number">3</span>], <span class="hljs-string">'Salary'</span>: [<span class="hljs-number">0</span>,<span class="hljs-number">5</span>,-<span class="hljs-number">2</span>, -<span class="hljs-number">14</span>, <span class="hljs-number">19</span>, <span class="hljs-number">24</span>]}

df2 = pd.DataFrame(data) df2

<span class="hljs-comment">#output:</span></pre></div><figure id="7b8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CLB5mL4Fc8TCur2LaWWg3g.png"><figcaption></figcaption></figure><div id="0a5b"><pre><span class="hljs-comment">#truncate the data by using index as condition.</span> df2.truncate(before=<span class="hljs-number">2</span>, after=<span class="hljs-number">4</span>)

<span class="hljs-comment">#output:</span></pre></div><figure id="0ec7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ot5FS9VnNrnyjT1BQ5L-3A.png"><figcaption></figcaption></figure><p id="ca84"><b>14. Filter Method</b></p><p id="5815">This method is used to take the small subset from the data frame.</p><div id="f208"><pre>df = pd.DataFrame(np.array(([<span class="hljs-number">4</span>,<span class="hljs-number">6</span>,<span class="hljs-number">9</span>], [<span class="hljs-number">11</span>,<span class="hljs-number">14</span>,<span class="hljs-number">17</span>])), index=[<span class="hljs-string">'Apple'</span>, <span class="hljs-string">'Kiwi'</span>], columns=[<span class="hljs-string">'mm'</span>, <span clas

Options

s="hljs-string">'cm'</span>, <span class="hljs-string">'kg'</span>])

df

output:</pre></div><figure id="c859"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yroqA3XMHrpVzzsZnJdJjQ.png"><figcaption></figcaption></figure><div id="5840"><pre>df.<span class="hljs-built_in">filter</span>(items=[<span class="hljs-string">'cm'</span>, <span class="hljs-string">'kg'</span>]) <span class="hljs-comment">#output:</span></pre></div><figure id="047f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RSUP_k0HwNCaX3izAYN94A.png"><figcaption></figcaption></figure><p id="3b51"><b>15. Interpolation Method</b></p><p id="efef">In this method, the interpolation method is used to fill the nan values in linear approach in series or in data frame.</p><div id="9e7b"><pre>series = pd.Series([<span class="hljs-number">11</span>, <span class="hljs-number">12</span>, np.nan, <span class="hljs-number">14</span>]) series.interpolate()

<span class="hljs-comment">#output:</span> <span class="hljs-number">0</span> <span class="hljs-number">11.0</span> <span class="hljs-number">1</span> <span class="hljs-number">12.0</span> <span class="hljs-number">2</span> <span class="hljs-number">13.0</span> <span class="hljs-number">3</span> <span class="hljs-number">14.0</span> dtype: float64

<span class="hljs-comment">#if we give padding (existing values) and limit </span> series = pd.Series([<span class="hljs-number">11</span>, np.nan, <span class="hljs-number">12</span>, np.nan,np.nan,np.nan,<span class="hljs-number">14</span>]) series.interpolate(method=<span class="hljs-string">'pad'</span>, limit=<span class="hljs-number">2</span>)

<span class="hljs-comment">#output:</span> <span class="hljs-number">0</span> <span class="hljs-number">11.0</span> <span class="hljs-number">1</span> <span class="hljs-number">11.0</span> <span class="hljs-number">2</span> <span class="hljs-number">12.0</span> <span class="hljs-number">3</span> <span class="hljs-number">12.0</span> <span class="hljs-number">4</span> <span class="hljs-number">12.0</span> <span class="hljs-number">5</span> NaN <span class="hljs-number">6</span> <span class="hljs-number">14.0</span> dtype: float64</pre></div><p id="238b">In padding and limit parameters, the nan values are filled with existing values and the limit=2 means to fill up to two consecutive values.</p><p id="9f24"><b>16. Isna Method</b></p><p id="0df5">This method is used to detect the missing values in the data frame or series.</p><div id="7abc"><pre><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np <span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

series1 = pd.Series([<span class="hljs-number">5</span>, <span class="hljs-number">6</span>, np.NaN]) series1.isna()

<span class="hljs-comment">#output:</span> <span class="hljs-number">0</span> <span class="hljs-literal">False</span> <span class="hljs-number">1</span> <span class="hljs-literal">False</span> <span class="hljs-number">2</span> <span class="hljs-literal">True</span> dtype: <span class="hljs-built_in">bool</span></pre></div><p id="0271"><b>17. Replace Method</b></p><p id="35b5">This method is used to replace the value with the given value in the series.</p><div id="2870"><pre>series1 = pd.Series([<span class="hljs-string">"Amit"</span>, <span class="hljs-string">"Apple"</span>, <span class="hljs-string">"Orange"</span>, <span class="hljs-string">"Banana"</span>, <span class="hljs-string">"Kiwi"</span>]) series1.replace(<span class="hljs-string">"Apple"</span>, <span class="hljs-string">"Avacado"</span>)

<span class="hljs-comment">#output:</span> <span class="hljs-number">0</span> Amit <span class="hljs-number">1</span> Avacado <span class="hljs-number">2</span> Orange <span class="hljs-number">3</span> Banana <span class="hljs-number">4</span> Kiwi dtype: <span class="hljs-built_in">object</span></pre></div><p id="7362"><b>18. Argmin and argmax Method</b></p><p id="1145">These methods are used to find the index position of minimum and maximum value.</p><div id="88b0"><pre><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment">#data 1</span> s = pd.Series({<span class="hljs-string">'Apple'</span>: <span class="hljs-number">90.0</span>, <span class="hljs-string">'Orange'</span>: <span class="hljs-number">10.0</span>, <span class="hljs-string">'Banana'</span>: <span class="hljs-number">20.0</span>, <span class="hljs-string">'Kiwi'</span>: <span class="hljs-number">300.0</span>})

<span class="hljs-built_in">print</span>(s.argmax()) <span class="hljs-built_in">print</span>(s.argmin())

<span class="hljs-comment">#output:</span> <span class="hljs-number">3</span> <span class="hljs-number">1</span>

<span class="hljs-comment">#data 2</span> series = pd.Series([<span class="hljs-number">4.0</span>, <span class="hljs-number">6.0</span>, <span class="hljs-number">7.0</span>, <span class="hljs-number">12.0</span>, <span class="hljs-number">15.0</span>]) <span class="hljs-built_in">print</span>(series.argmax()) <span class="hljs-built_in">print</span>(series.argmin())

<span class="hljs-comment">#output:</span> <span class="hljs-number">4</span> <span class="hljs-number">0</span></pre></div><p id="f290"><b>19. Compare Method</b></p><p id="1786">In this method, the values are compared with the series of other column, if they are not similar then the output will print those rows.</p><div id="0b5a"><pre>series1 = pd.Series([<span class="hljs-string">"Amit"</span>, <span class="hljs-string">"Apple"</span>, <span class="hljs-string">"Orange"</span>, <span class="hljs-string">"Banana"</span>, <span class="hljs-string">"Kiwi"</span>]) series2 = pd.Series([<span class="hljs-string">"Amit"</span>, <span class="hljs-string">"Orange"</span>, <span class="hljs-string">"Orange"</span>, <span class="hljs-string">"Banana"</span>, <span class="hljs-string">"Lemon"</span>])

series1.compare(series2)

<span class="hljs-comment">#output:</span> self other <span class="hljs-number">1</span> Apple Kiwi <span class="hljs-number">4</span> Orange Lemon</pre></div><p id="5e24"><b>20. Groupby Method</b></p><p id="a0bd">This method is used to group all the same classes in the features based on the column name.</p><div id="63dd"><pre>df = pd.read_csv(<span class="hljs-string">"sample.csv"</span>)

df

<span class="hljs-comment">#output:</span> Fruits Weight <span class="hljs-number">0</span> Apple <span class="hljs-number">10</span> <span class="hljs-number">1</span> Orange <span class="hljs-number">15</span> <span class="hljs-number">2</span> Apple <span class="hljs-number">20</span> <span class="hljs-number">3</span> Apple <span class="hljs-number">30</span> <span class="hljs-number">4</span> Orange <span class="hljs-number">25</span>

<span class="hljs-comment">#groupby operation based on Fruits column</span> df.groupby(pd.Grouper(key=<span class="hljs-string">"Fruits"</span>)).mean()

<span class="hljs-comment">#output:</span> Weight Fruits Apple <span class="hljs-number">20.0</span> Orange <span class="hljs-number">20.0</span></pre></div><p id="20da">I hope you like the article. Reach me on my <a href="https://www.linkedin.com/in/data-scientist-95040a1ab/">LinkedIn</a> and <a href="https://twitter.com/amitprius">twitter</a>.</p><h2 id="7d85">Recommended Articles</h2><p id="1be5">1. <a href="https://pub.towardsai.net/8-active-learning-insights-of-python-collection-module-6c9e0cc16f6b">8 Active Learning Insights of Python Collection Module</a> 2. <a href="https://pub.towardsai.net/numpy-linear-algebra-on-images-ed3180978cdb?source=friends_link&amp;sk=d9afa4a1206971f9b1f64862f6291ac0">NumPy: Linear Algebra on Images</a> 3. <a href="https://pub.towardsai.net/exception-handling-concepts-in-python-4d5116decac3?source=friends_link&amp;sk=a0ed49d9fdeaa67925eac34ecb55ea30">Exception Handling Concepts in Python</a> 4. <a href="https://pub.towardsai.net/pandas-dealing-with-categorical-data-7547305582ff?source=friends_link&amp;sk=11c6809f6623dd4f6dd74d43727297cf">Pandas: Dealing with Categorical Data</a> 5. <a href="https://pub.towardsai.net/hyper-parameters-randomseachcv-and-gridsearchcv-in-machine-learning-b7d091cf56f4?source=friends_link&amp;sk=cab337083fb09601114a6e466ec59689">Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning</a> 6. <a href="https://readmedium.com/fully-explained-linear-regression-with-python-fe2b313f32f3?source=friends_link&amp;sk=53c91a2a51347ec2d93f8222c0e06402">Fully Explained Linear Regression with Python</a> 7. <a href="https://readmedium.com/fully-explained-logistic-regression-with-python-f4a16413ddcd?source=friends_link&amp;sk=528181f15a44e48ea38fdd9579241a78">Fully Explained Logistic Regression with Python</a> 8. <a href="https://pub.towardsai.net/data-distribution-using-numpy-with-python-3b64aae6f9d6?source=friends_link&amp;sk=809e75802cbd25ddceb5f0f6496c9803">Data Distribution using Numpy with Python</a> 9. <a href="https://pub.towardsai.net/decision-trees-vs-random-forests-in-machine-learning-be56c093b0f?source=friends_link&amp;sk=91377248a43b62fe7aeb89a69e590860">Decision Trees vs. Random Forests in Machine Learning</a> 10. <a href="https://pub.towardsai.net/standardization-in-data-preprocessing-with-python-96ae89d2f658?source=friends_link&amp;sk=f348435582e8fbb47407e9b359787e41">Standardization in Data Preprocessing with Python</a></p></article></body>

20 Most Usable Pandas Shortcut Methods in Python

Useful concepts for data science and machine learning

Photo by Pankaj Patel on Unsplash

This article will cover some useful pandas methods for data science and data analytics. As data scientists, people need fast speed to carry out the computation for output. These methods will come in handy in most cases of business analysis, and data analysis.

Topics to be covered:

1. Memory usage          11. Drop Method 
2. Copy Method           12. Head Method
3. At Method             13. Truncate Method
4. Loc Method            14. Filter Method
5. Clip Method           15. Interpolation Method
6. Correlation Method    16. Isna Method
7. N_largest Method      17. Replace Method
8. N_Smallest Method     18. Argmin and argmax Method
9. Unique Method         19. Compare Method
10.Value_count Method    20. Groupby Method
  1. Memory Usage

This method is very useful to know the information of the large data. It returns the usage of memory taken by the columns.

#memory usage of a series
series = pd.Series(range(10))
series.memory_usage()

#output:
208

Now, we will see the memory usage of data frame with many columns.

dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
data = dict([(t, np.ones(shape=1000, dtype=int).astype(t)) for t in
                                                           dtypes])

df = pd.DataFrame(data)
df.memory_usage()

#output:
Index           128
int64          8000
float64        8000
complex128    16000
object         8000
bool           1000
dtype: int64

2. Copy Method

This method is used to copy the data to other variable. The copy method takes deep parameter to be “true” or “false” for copying the objects and indexes.

series = pd.Series([4.0, 6.0, 7.0, 12.0, 15.0], index=["a", "b",
                                                  "c", "d", "e"])

#simple copy
#default deep=true
series_copy = series.copy()
series_copy

#output:
a     4.0
b     6.0
c     7.0
d    12.0
e    15.0
dtype: float64
#shallow copy
shallow_copy = series.copy(deep=False)

3. At Method

It is used to find the value at particular position in the data frame or in series.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df.at['Apple', 'cm']

#output:
6

4. Loc Method

This method is used to get the values by giving the index position.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df.loc['Kiwi']

#output:
mm    11
cm    14
kg    17
Name: Kiwi, dtype: int32

#for columns
df.loc[df['cm'] > 6]

#output:
           mm   cm   kg
    Kiwi   11   14   17

5. Clip Method

This method is used to trim the data with some threshold values given to the clip method.

data={'col_0':[-44,0,5,15,10,-3], 'col_1':[0,5,-2,-14,19,24]}

df = pd.DataFrame(data)
df.clip(-5, 8)

#output:

If we observe in the above output, the data comes in between the threshold values. If the value is out of the range of the threshold then the extreme value of the threshold will be updated on that position.

6. Correlation Method

If is used to measure the correlation between the features. The correlation means that how one feature depend on other feature.

def histogram_intersection(x, y):
    v = np.minimum(x, y).sum().round(decimals=1)
    return v

s1 = pd.Series([.44, .0, .5, .15, .10, .3])
s2 = pd.Series([.0, .5, .2, .14, .19, .24])
s1.corr(s2, method=histogram_intersection)

#Output:
0.7

7. N_largest Method

In this method, it will return the most largest count occurred in the data frame.

data = {'col_0': [-44,0,5, 15, 10, -3], 
        'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data)
df.nlargest(5, 'col_0')

#output:

8. N_Smallest Method

In this method, it will return the most smallest count occurred in the data frame.

data = {'col_0': [-44,0,5, 15, 10, -3], 
        'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data)
df.nsmallest(5, 'col_0')

#output:

9. Unique Method

This method is used to find the unique values in the features/columns.

series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0])
series.unique()

#Output:
array([4., 6., 7.])

10. Value_count Method

In this method, it will count the values that occurs repeatedly in the data frame or in series.

series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0])
series.unique()

#Output:
6.0    2
4.0    2
7.0    1
dtype: int64

11. Drop Method

This method very much useful in the data frame to remove the unnecessary columns from the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df2.drop('Age', axis='columns')

#output:

12. Head Method

This method is used to view the some starting rows of the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df.head(2)

#output:
     Age  Salary
  0  -44    0
  1   0     5

13. Truncate Method

This method is useful to get the data based on the condition in the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df2

#output:
#truncate the data by using index as condition.
df2.truncate(before=2, after=4)

#output:

14. Filter Method

This method is used to take the small subset from the data frame.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df

output:
df.filter(items=['cm', 'kg'])
#output:

15. Interpolation Method

In this method, the interpolation method is used to fill the nan values in linear approach in series or in data frame.

series = pd.Series([11, 12, np.nan, 14])
series.interpolate()

#output:
0    11.0
1    12.0
2    13.0
3    14.0
dtype: float64

#if we give padding (existing values) and limit 
series = pd.Series([11, np.nan, 12, np.nan,np.nan,np.nan,14])
series.interpolate(method='pad', limit=2)

#output:
0    11.0
1    11.0
2    12.0
3    12.0
4    12.0
5     NaN
6    14.0
dtype: float64

In padding and limit parameters, the nan values are filled with existing values and the limit=2 means to fill up to two consecutive values.

16. Isna Method

This method is used to detect the missing values in the data frame or series.

import numpy as np
import pandas as pd

series1 = pd.Series([5, 6, np.NaN])
series1.isna()

#output:
0    False
1    False
2     True
dtype: bool

17. Replace Method

This method is used to replace the value with the given value in the series.

series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"])
series1.replace("Apple", "Avacado")

#output:
0       Amit
1    Avacado
2     Orange
3     Banana
4       Kiwi
dtype: object

18. Argmin and argmax Method

These methods are used to find the index position of minimum and maximum value.

import pandas as pd

#data 1
s = pd.Series({'Apple': 90.0, 'Orange': 10.0, 'Banana': 20.0,
                                                   'Kiwi': 300.0})

print(s.argmax())
print(s.argmin())

#output:
3
1

#data 2
series = pd.Series([4.0, 6.0, 7.0, 12.0, 15.0])
print(series.argmax())
print(series.argmin())

#output:
4
0

19. Compare Method

In this method, the values are compared with the series of other column, if they are not similar then the output will print those rows.

series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"])
series2 = pd.Series(["Amit", "Orange", "Orange", "Banana", "Lemon"])

series1.compare(series2)

#output:
     self      other
1    Apple      Kiwi
4    Orange     Lemon

20. Groupby Method

This method is used to group all the same classes in the features based on the column name.

df = pd.read_csv("sample.csv")

df

#output:
   Fruits   Weight
0  Apple      10
1  Orange     15
2  Apple      20
3  Apple      30
4  Orange     25

#groupby operation based on Fruits column
df.groupby(pd.Grouper(key="Fruits")).mean()

#output:
        Weight
Fruits
Apple    20.0
Orange   20.0

I hope you like the article. Reach me on my LinkedIn and twitter.

Recommended Articles

1. 8 Active Learning Insights of Python Collection Module 2. NumPy: Linear Algebra on Images 3. Exception Handling Concepts in Python 4. Pandas: Dealing with Categorical Data 5. Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning 6. Fully Explained Linear Regression with Python 7. Fully Explained Logistic Regression with Python 8. Data Distribution using Numpy with Python 9. Decision Trees vs. Random Forests in Machine Learning 10. Standardization in Data Preprocessing with Python

Python
Programming
Data Science
Business
Artificial Intelligence
Recommended from ReadMedium