Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9642

Abstract

The correlation means that how one feature depend on other feature.<div id="f7bf"><pre>def histogram_intersection(x, y): v = np.minimum(x, y).sum().round(decimals=1) return v

s1 = pd.Series([.44, .0, .5, .15, .10, .3]) s2 = pd.Series([.0, .5, .2, .14, .19, .24]) s1.corr(s2, method=histogram_intersection)

#Output: 0.7</pre></div>7. N_largest MethodIn this method, it will return the most largest count occurred in the data frame.<div id="0f74"><pre>data = {'col_0': [-44,0,5, 15, 10, -3], 'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data) df.nlargest(5, 'col_0')

#output:</pre></div><figure id="4d7c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*4BW2QAdP6ZflaBC33XDQXw.png"><figcaption></figcaption></figure>8. N_Smallest MethodIn this method, it will return the most smallest count occurred in the data frame.<div id="ee77"><pre>data = {'col_0': [-44,0,5, 15, 10, -3], 'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data) df.nsmallest(5, 'col_0')

#output:</pre></div><figure id="84bb"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*hjMDTsVjpiHZEUzPN71mkw.png"><figcaption></figcaption></figure>9. Unique MethodThis method is used to find the unique values in the features/columns.<div id="17d2"><pre>series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0]) series.unique()

#Output: array([4., 6., 7.])</pre></div><div id="52c8" class="link-block"> <a href="https://readmedium.com/40-most-insanely-usable-methods-in-python-a983c78f5bfd"> <div> <div> <h2>40 Most Insanely Usable Methods in Python</h2> <div><h3>Data cleaning and wrangling in data science and machine learning</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*jQnlimgu0VdPkTFh)"></div> </div> </div> </a> </div><div id="53ce" class="link-block"> <a href="https://readmedium.com/step-by-step-depth-introduction-of-matplotlib-with-python-8386d75b361d"> <div> <div> <h2>Step-by-Step Depth Introduction of Matplotlib with Python</h2> <div><h3>A useful handful of examples for data science and machine learning projects</h3></div> <div>medium.com</div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*yifcgzdFj7blB_iXSeQ-LA.png)"></div> </div> </div> </a> </div>10. Value_count MethodIn this method, it will count the values that occurs repeatedly in the data frame or in series.<div id="74c8"><pre>series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0]) series.unique()

#Output: 6.0 2 4.0 2 7.0 1 dtype: int64</pre></div>11. Drop MethodThis method very much useful in the data frame to remove the unnecessary columns from the data frame.<div id="36e8"><pre>data = {'Age': [-44,0,5, 15, 10, -3], 'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data) df2.drop('Age', axis='columns')

#output:</pre></div><figure id="fc82"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*6gUUbnEHZfMemRqLDKiZmA.png"><figcaption></figcaption></figure>12. Head MethodThis method is used to view the some starting rows of the data frame.<div id="e89f"><pre>data = {'Age': [-44,0,5, 15, 10, -3], 'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data) df.head(2)

#output: Age Salary 0 -44 0 1 0 5</pre></div>13. Truncate MethodThis method is useful to get the data based on the condition in the data frame.<div id="ebb6"><pre>data = {'Age': [-44,0,5, 15, 10, -3], 'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data) df2

#output:</pre></div><figure id="7b8c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*CLB5mL4Fc8TCur2LaWWg3g.png"><figcaption></figcaption></figure><div id="0a5b"><pre>#truncate the data by using index as condition. df2.truncate(before=2, after=4)

#output:</pre></div><figure id="0ec7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Ot5FS9VnNrnyjT1BQ5L-3A.png"><figcaption></figcaption></figure>14. Filter MethodThis method is used to take the small subset from the data frame.<div id="f208"><pre>df = pd.DataFrame(np.array(([4,6,9], [11,14,17])), index=['Apple', 'Kiwi'], columns=['mm', <span clas

Options

s="hljs-string">'cm', 'kg'])

output:</pre></div><figure id="c859"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*yroqA3XMHrpVzzsZnJdJjQ.png"><figcaption></figcaption></figure><div id="5840"><pre>df.filter(items=['cm', 'kg']) #output:</pre></div><figure id="047f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*RSUP_k0HwNCaX3izAYN94A.png"><figcaption></figcaption></figure>15. Interpolation MethodIn this method, the interpolation method is used to fill the nan values in linear approach in series or in data frame.<div id="9e7b"><pre>series = pd.Series([11, 12, np.nan, 14]) series.interpolate()

#if we give padding (existing values) and limit series = pd.Series([11, np.nan, 12, np.nan,np.nan,np.nan,14]) series.interpolate(method='pad', limit=2)

#output: 0 11.0 1 11.0 2 12.0 3 12.0 4 12.0 5 NaN 6 14.0 dtype: float64</pre></div>In padding and limit parameters, the nan values are filled with existing values and the limit=2 means to fill up to two consecutive values.16. Isna MethodThis method is used to detect the missing values in the data frame or series.<div id="7abc"><pre>import numpy as np import pandas as pd

series1 = pd.Series([5, 6, np.NaN]) series1.isna()

#output: 0 False 1 False 2 True dtype: bool</pre></div>17. Replace MethodThis method is used to replace the value with the given value in the series.<div id="2870"><pre>series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"]) series1.replace("Apple", "Avacado")

#output: 0 Amit 1 Avacado 2 Orange 3 Banana 4 Kiwi dtype: object</pre></div>18. Argmin and argmax MethodThese methods are used to find the index position of minimum and maximum value.<div id="88b0"><pre>import pandas as pd

#data 1 s = pd.Series({'Apple': 90.0, 'Orange': 10.0, 'Banana': 20.0, 'Kiwi': 300.0})

print(s.argmax()) print(s.argmin())

#output: 3 1

#data 2 series = pd.Series([4.0, 6.0, 7.0, 12.0, 15.0]) print(series.argmax()) print(series.argmin())

#output: 4 0</pre></div>19. Compare MethodIn this method, the values are compared with the series of other column, if they are not similar then the output will print those rows.<div id="0b5a"><pre>series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"]) series2 = pd.Series(["Amit", "Orange", "Orange", "Banana", "Lemon"])

series1.compare(series2)

#output: self other 1 Apple Kiwi 4 Orange Lemon</pre></div>20. Groupby MethodThis method is used to group all the same classes in the features based on the column name.<div id="63dd"><pre>df = pd.read_csv("sample.csv")

#output: Fruits Weight 0 Apple 10 1 Orange 15 2 Apple 20 3 Apple 30 4 Orange 25

#groupby operation based on Fruits column df.groupby(pd.Grouper(key="Fruits")).mean()

#output: Weight Fruits Apple 20.0 Orange 20.0</pre></div>I hope you like the article. Reach me on my <a href="https://www.linkedin.com/in/data-scientist-95040a1ab/">LinkedIn</a> and <a href="https://twitter.com/amitprius">twitter</a>.<h2 id="7d85">Recommended Articles</h2>1. <a href="https://pub.towardsai.net/8-active-learning-insights-of-python-collection-module-6c9e0cc16f6b">8 Active Learning Insights of Python Collection Module</a> 2. <a href="https://pub.towardsai.net/numpy-linear-algebra-on-images-ed3180978cdb?source=friends_link&sk=d9afa4a1206971f9b1f64862f6291ac0">NumPy: Linear Algebra on Images</a> 3. <a href="https://pub.towardsai.net/exception-handling-concepts-in-python-4d5116decac3?source=friends_link&sk=a0ed49d9fdeaa67925eac34ecb55ea30">Exception Handling Concepts in Python</a> 4. <a href="https://pub.towardsai.net/pandas-dealing-with-categorical-data-7547305582ff?source=friends_link&sk=11c6809f6623dd4f6dd74d43727297cf">Pandas: Dealing with Categorical Data</a> 5. <a href="https://pub.towardsai.net/hyper-parameters-randomseachcv-and-gridsearchcv-in-machine-learning-b7d091cf56f4?source=friends_link&sk=cab337083fb09601114a6e466ec59689">Hyper-parameters: RandomSeachCV and GridSearchCV in Machine Learning</a> 6. <a href="https://readmedium.com/fully-explained-linear-regression-with-python-fe2b313f32f3?source=friends_link&sk=53c91a2a51347ec2d93f8222c0e06402">Fully Explained Linear Regression with Python</a> 7. <a href="https://readmedium.com/fully-explained-logistic-regression-with-python-f4a16413ddcd?source=friends_link&sk=528181f15a44e48ea38fdd9579241a78">Fully Explained Logistic Regression with Python</a> 8. <a href="https://pub.towardsai.net/data-distribution-using-numpy-with-python-3b64aae6f9d6?source=friends_link&sk=809e75802cbd25ddceb5f0f6496c9803">Data Distribution using Numpy with Python</a> 9. <a href="https://pub.towardsai.net/decision-trees-vs-random-forests-in-machine-learning-be56c093b0f?source=friends_link&sk=91377248a43b62fe7aeb89a69e590860">Decision Trees vs. Random Forests in Machine Learning</a> 10. <a href="https://pub.towardsai.net/standardization-in-data-preprocessing-with-python-96ae89d2f658?source=friends_link&sk=f348435582e8fbb47407e9b359787e41">Standardization in Data Preprocessing with Python</a></article></body>

20 Most Usable Pandas Shortcut Methods in Python

Useful concepts for data science and machine learning

This article will cover some useful pandas methods for data science and data analytics. As data scientists, people need fast speed to carry out the computation for output. These methods will come in handy in most cases of business analysis, and data analysis.

Topics to be covered:

1. Memory usage          11. Drop Method 
2. Copy Method           12. Head Method
3. At Method             13. Truncate Method
4. Loc Method            14. Filter Method
5. Clip Method           15. Interpolation Method
6. Correlation Method    16. Isna Method
7. N_largest Method      17. Replace Method
8. N_Smallest Method     18. Argmin and argmax Method
9. Unique Method         19. Compare Method
10.Value_count Method    20. Groupby Method

Memory Usage

This method is very useful to know the information of the large data. It returns the usage of memory taken by the columns.

#memory usage of a series
series = pd.Series(range(10))
series.memory_usage()

#output:
208

Now, we will see the memory usage of data frame with many columns.

dtypes = ['int64', 'float64', 'complex128', 'object', 'bool']
data = dict([(t, np.ones(shape=1000, dtype=int).astype(t)) for t in
                                                           dtypes])

df = pd.DataFrame(data)
df.memory_usage()

#output:
Index           128
int64          8000
float64        8000
complex128    16000
object         8000
bool           1000
dtype: int64

2. Copy Method

This method is used to copy the data to other variable. The copy method takes deep parameter to be “true” or “false” for copying the objects and indexes.

series = pd.Series([4.0, 6.0, 7.0, 12.0, 15.0], index=["a", "b",
                                                  "c", "d", "e"])

#simple copy
#default deep=true
series_copy = series.copy()
series_copy

#output:
a     4.0
b     6.0
c     7.0
d    12.0
e    15.0
dtype: float64
#shallow copy
shallow_copy = series.copy(deep=False)

3. At Method

It is used to find the value at particular position in the data frame or in series.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df.at['Apple', 'cm']

#output:
6

4. Loc Method

This method is used to get the values by giving the index position.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df.loc['Kiwi']

#output:
mm    11
cm    14
kg    17
Name: Kiwi, dtype: int32

#for columns
df.loc[df['cm'] > 6]

#output:
           mm   cm   kg
    Kiwi   11   14   17

5. Clip Method

This method is used to trim the data with some threshold values given to the clip method.

data={'col_0':[-44,0,5,15,10,-3], 'col_1':[0,5,-2,-14,19,24]}

df = pd.DataFrame(data)
df.clip(-5, 8)

#output:

If we observe in the above output, the data comes in between the threshold values. If the value is out of the range of the threshold then the extreme value of the threshold will be updated on that position.

6. Correlation Method

If is used to measure the correlation between the features. The correlation means that how one feature depend on other feature.

def histogram_intersection(x, y):
    v = np.minimum(x, y).sum().round(decimals=1)
    return v

s1 = pd.Series([.44, .0, .5, .15, .10, .3])
s2 = pd.Series([.0, .5, .2, .14, .19, .24])
s1.corr(s2, method=histogram_intersection)

#Output:
0.7

7. N_largest Method

In this method, it will return the most largest count occurred in the data frame.

data = {'col_0': [-44,0,5, 15, 10, -3], 
        'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data)
df.nlargest(5, 'col_0')

#output:

8. N_Smallest Method

In this method, it will return the most smallest count occurred in the data frame.

data = {'col_0': [-44,0,5, 15, 10, -3], 
        'col_1': [0,5,-2, -14, 19, 24]}

df = pd.DataFrame(data)
df.nsmallest(5, 'col_0')

#output:

9. Unique Method

This method is used to find the unique values in the features/columns.

series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0])
series.unique()

#Output:
array([4., 6., 7.])

40 Most Insanely Usable Methods in Python

Data cleaning and wrangling in data science and machine learning

medium.com

Step-by-Step Depth Introduction of Matplotlib with Python

A useful handful of examples for data science and machine learning projects

medium.com

10. Value_count Method

In this method, it will count the values that occurs repeatedly in the data frame or in series.

series = pd.Series([4.0, 6.0, 7.0, 6.0, 4.0])
series.unique()

#Output:
6.0    2
4.0    2
7.0    1
dtype: int64

11. Drop Method

This method very much useful in the data frame to remove the unnecessary columns from the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df2.drop('Age', axis='columns')

#output:

12. Head Method

This method is used to view the some starting rows of the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df.head(2)

#output:
     Age  Salary
  0  -44    0
  1   0     5

13. Truncate Method

This method is useful to get the data based on the condition in the data frame.

data = {'Age': [-44,0,5, 15, 10, -3], 
        'Salary': [0,5,-2, -14, 19, 24]}

df2 = pd.DataFrame(data)
df2

#output:

#truncate the data by using index as condition.
df2.truncate(before=2, after=4)

#output:

14. Filter Method

This method is used to take the small subset from the data frame.

df = pd.DataFrame(np.array(([4,6,9], [11,14,17])),
                  index=['Apple', 'Kiwi'],
                  columns=['mm', 'cm', 'kg'])

df

output:

df.filter(items=['cm', 'kg'])
#output:

15. Interpolation Method

In this method, the interpolation method is used to fill the nan values in linear approach in series or in data frame.

series = pd.Series([11, 12, np.nan, 14])
series.interpolate()

#output:
0    11.0
1    12.0
2    13.0
3    14.0
dtype: float64

#if we give padding (existing values) and limit 
series = pd.Series([11, np.nan, 12, np.nan,np.nan,np.nan,14])
series.interpolate(method='pad', limit=2)

#output:
0    11.0
1    11.0
2    12.0
3    12.0
4    12.0
5     NaN
6    14.0
dtype: float64

In padding and limit parameters, the nan values are filled with existing values and the limit=2 means to fill up to two consecutive values.

16. Isna Method

This method is used to detect the missing values in the data frame or series.

import numpy as np
import pandas as pd

series1 = pd.Series([5, 6, np.NaN])
series1.isna()

#output:
0    False
1    False
2     True
dtype: bool

17. Replace Method

This method is used to replace the value with the given value in the series.

series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"])
series1.replace("Apple", "Avacado")

#output:
0       Amit
1    Avacado
2     Orange
3     Banana
4       Kiwi
dtype: object

18. Argmin and argmax Method

These methods are used to find the index position of minimum and maximum value.

import pandas as pd

#data 1
s = pd.Series({'Apple': 90.0, 'Orange': 10.0, 'Banana': 20.0,
                                                   'Kiwi': 300.0})

print(s.argmax())
print(s.argmin())

#output:
3
1

#data 2
series = pd.Series([4.0, 6.0, 7.0, 12.0, 15.0])
print(series.argmax())
print(series.argmin())

#output:
4
0

19. Compare Method

In this method, the values are compared with the series of other column, if they are not similar then the output will print those rows.

series1 = pd.Series(["Amit", "Apple", "Orange", "Banana", "Kiwi"])
series2 = pd.Series(["Amit", "Orange", "Orange", "Banana", "Lemon"])

series1.compare(series2)

#output:
     self      other
1    Apple      Kiwi
4    Orange     Lemon

20. Groupby Method

This method is used to group all the same classes in the features based on the column name.

df = pd.read_csv("sample.csv")

df

#output:
   Fruits   Weight
0  Apple      10
1  Orange     15
2  Apple      20
3  Apple      30
4  Orange     25

#groupby operation based on Fruits column
df.groupby(pd.Grouper(key="Fruits")).mean()

#output:
        Weight
Fruits
Apple    20.0
Orange   20.0

I hope you like the article. Reach me on my LinkedIn and twitter.