avatarAnmol Tomar

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

4571

Abstract

h column (Image by Author)</figcaption></figure><h2 id="4f83">USE CASE 3: Difference between min and max of each column</h2><p id="248d">We can also use apply with lambda to find the difference between the minimum and maximum values within each column of the DataFrame.</p><p id="5e28">In the below example, for column ‘A’, the lambda function calculates the maximum as 3 and the minimum as 1, giving the output as 2. Similarly, it outputs 2 for column ‘B’ too.</p><div id="da7a"><pre><span class="hljs-comment"># Define a lambda function that calculates the difference between the maximum and minimum value of each column in a DataFrame</span> <span class="hljs-built_in">range</span> = <span class="hljs-keyword">lambda</span> x: x.<span class="hljs-built_in">max</span>() - x.<span class="hljs-built_in">min</span>()

<span class="hljs-comment"># Create a DataFrame</span> df = pd.DataFrame({<span class="hljs-string">'A'</span>: [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>], <span class="hljs-string">'B'</span>: [<span class="hljs-number">4</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>]})

<span class="hljs-comment"># Apply the lambda function to each column of the DataFrame</span> df.apply(<span class="hljs-built_in">range</span>)</pre></div><figure id="0b1d"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*3kpDPk6IzvHKMv9rux47_A.jpeg"><figcaption>Output of the min and max column values(Image by Author)</figcaption></figure><h2 id="2657">USE CASE 4: String Manipulation</h2><p id="ce07">Many times, while working with the string columns, we need to perform data manipulation such as converting strings to lowercase, splitting the sentences into words, string replacement, etc.</p><p id="d63f">In this example, we will look at one such use case where we will convert string values within the columns of a DataFrame into lowercase.</p><div id="b4f4"><pre><span class="hljs-comment"># Define a lambda function that converts a string to lowercase</span> to_lower = <span class="hljs-keyword">lambda</span> x: x.<span class="hljs-built_in">str</span>.lower() <span class="hljs-keyword">if</span> <span class="hljs-built_in">isinstance</span>(x, <span class="hljs-built_in">object</span>) <span class="hljs-keyword">else</span> x

<span class="hljs-comment"># Create a DataFrame</span> df = pd.DataFrame({<span class="hljs-string">'A'</span>: [<span class="hljs-string">'A'</span>, <span class="hljs-string">'B'</span>, <span class="hljs-string">'C'</span>], <span class="hljs-string">'B'</span>: [<span class="hljs-string">'X'</span>, <span class="hljs-string">'Y'</span>, <span class="hljs-string">'Z'</span>]})

<span class="hljs-comment"># Apply the lambda function to each element of the DataFrame</span> df.apply(to_lower)</pre></div><figure id="5680"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*5CjnDB73KRak2qcNR6rCZw.jpeg"><figcaption>Lowercase using Lambda (Image by Author)</figcaption></figure><h2 id="c17a">USE CASE 5: Normalisation of variable</h2><p id="691d">Data normalization is a very important step before applying machine learning algorithms such as K-means.</p><p id="e45d">In the below example, we are using apply method with the lambda function to normalize the values within the columns of a DataFrame.</p><div id="951c"><pre><span class="hljs-comment"># Define a lambda function that normalizes the values of a DataFrame</span> normalize = lambda x: (x - x.mean()) / x.std()

<span class="hljs-comment"># Create a DataFrame</span> <span class="hljs-built_in">df</span> = pd.DataFrame({<span class="hljs-string">'A'</span>: [1, 2, 3], <span class="hljs-string">'B'</span>: [4, 5, 6]})

<span class="hljs-comment"># Apply the lambda function to each element of the DataFrame</span> df.apply(normalize)</pre></div><figure id="6171"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*N8grGN9rLTxO9FeJGJ_8yw.jpeg"><figcaption>The normalization of Column(Image by Author)</figcaption></figure><h2 id="e9da">USE CASE 6: Replace missing values</h2><p id="c8aa">Data cleaning is a crucial pre-processing step before performing the data analysis.</p><p id="e374">Missing value treatment is part of the data cleaning process that can be implemented using apply and lambda functions (as shown below).</p><div id="a42f"><pre><span class="hljs-comment"># Define a lambda function that replaces missing values in a DataFrame with a specified value</span> replace_missing = <span class="hljs-keyword">la

Options

mbda</span> x, value: x.fillna(value)

<span class="hljs-comment"># Create a DataFrame</span> df = pd.DataFrame({<span class="hljs-string">'A'</span>: [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-literal">None</span>], <span class="hljs-string">'B'</span>: [<span class="hljs-literal">None</span>, <span class="hljs-number">5</span>, <span class="hljs-number">6</span>]})

<span class="hljs-comment"># Apply the lambda function to each element of the DataFrame</span> df.apply(replace_missing, args=(<span class="hljs-number">0</span>,))</pre></div><figure id="1df3"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*E375jhPDHeEO5LJI4-Xv7Q.jpeg"><figcaption>Missing value treatment output(Image by Author)</figcaption></figure><h2 id="df20">USE CASE 7: Cumulative Sum</h2><p id="fdfe">We can also find the cumulative sum of columns of a pandas DataFrame using the apply and lambda function as shown below.</p><div id="c1cb"><pre><span class="hljs-comment"># Define a lambda function that calculates the cumulative sum of each column in a DataFrame</span> cumsum = lambda x: x.cumsum()

<span class="hljs-comment"># Create a DataFrame</span> <span class="hljs-built_in">df</span> = pd.DataFrame({<span class="hljs-string">'A'</span>: [1, 2, 3], <span class="hljs-string">'B'</span>: [4, 5, 6]})

<span class="hljs-comment"># Apply the lambda function to each column of the DataFrame</span> df.apply(cumsum)</pre></div><figure id="4408"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*VkbhnvfJu1u-1xJtXML5Mw.jpeg"><figcaption>Cumulative sum(Image of Author)</figcaption></figure><h2 id="77e3">8. Moving Average</h2><p id="9685">We can also use apply method with a lambda function to find the moving average of the columns of a DataFrame. This comes in very handy with time series data.</p><p id="c48a">In the following example, we are finding the moving average with a window of 2 (looking at 2 consecutive rows at a time).</p><div id="3e34"><pre><span class="hljs-comment"># Define a lambda function that calculates the moving average of each column in a DataFrame</span> moving_average = lambda x: x.rolling(window=2).mean()

<span class="hljs-comment"># Create a DataFrame</span> <span class="hljs-built_in">df</span> = pd.DataFrame({<span class="hljs-string">'A'</span>: [1, 2, 3], <span class="hljs-string">'B'</span>: [4, 5, 6]})

<span class="hljs-comment"># Apply the lambda function to each column of the DataFrame</span> df.apply(moving_average)</pre></div><figure id="5b0c"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*bqbVFR6AKsvwOYN6NPGM_Q.jpeg"><figcaption>Cumulative Sum(Image by Author)</figcaption></figure><p id="5eb5">If the dataset is too big then the best practice is to use vectorized operations to manipulate the panda DataFrame. Read about vectorization here-</p><div id="19cc" class="link-block"> <a href="https://readmedium.com/say-goodbye-to-loops-in-python-and-welcome-vectorization-e4df66615a52"> <div> <div> <h2>Say Goodbye to Loops in Python, and Welcome Vectorization!</h2> <div><h3>Use Vectorization — a super fast alternative to loops in Python</h3></div> <div><p>medium.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/1*xJknqmZzptRvuQRZp_nFCg.jpeg)"></div> </div> </div> </a> </div><h2 id="c950">Conclusion</h2><p id="61bc">We looked at a variety of use cases where we can use l<i>ambda</i> and <i>apply</i> functions while working with pandas. These functions can make our code concise and easy to follow.</p><p id="cc31">When your dataset has millions of rows then you should not use lambda and apply but convert your code to vectorized form for faster execution.</p><h2 id="7f11">Thank You for reading! I hope you liked it!</h2><p id="8798"><i>You can get all my posts in your inbox.<a href="https://anmol3015.medium.com/subscribe"><b> Do that here</b>!</a></i></p><p id="0cd4"><i>If you like to experience Medium yourself, consider supporting me and thousands of other writers by <a href="https://anmol3015.medium.com/membership"><b>signing up for a membership</b></a>. It only costs $5 per month, it supports us, writers, greatly, and you get to access all the amazing stories on Medium.</i></p><p id="3565"><b><i>Follow me</i></b><i> to see my data science posts in your feed.</i></p></article></body>

8 Use Cases of Lambda and Apply functions in Pandas

Use Lambda and Apply in your next Data Science Project

Pic Credit: Unsplash

Pandas is one of the most used libraries for performing data manipulation tasks in python.

I have been using Pandas for more than 5 years now and it amazes me how flexible it is — there are n-number of ways to implement something based on what you want to achieve — whether you want your code to be faster or shorter.

Lambda and Apply functions are one such powerful functions that I use a lot. They can make the code very crisp by converting multi-line codes into one-liners.

Some background about the Lamba and Apply functions:

  1. The lambda function is used to create one-liner functions in python.
### Lambda function to multiple variables by 2. 
lambda x: x*2

2. Apply function is a one-liner alternative to loops in python.

### Using ‘apply’ function to implement a function 'func' on the rows of the dataframe. 
DataFrame.apply(func, axis=0)

This blog will take you through various practical use cases where you can use apply and lambda functions in pandas, especially if your dataset is not too big.

Let’s quickly get started with the use cases:

USE CASE 1: Adding a numeric value to columns

Starting off with a very basic example where we will learn how to add values to the columns of a DataFrame.

import pandas as pd

# Define a lambda function that adds 10 to each element of a DataFrame
add_10 = lambda x: x + 10

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Apply the lambda function to each element of the DataFrame
df.apply(add_10)  
# one line implementation:  df.apply(lambda x: x + 10)
Adding Values to Columns (Image by Author)

USE CASE 2: Max/Min of columns/rows

We can use the ‘apply’ method on a DataFrame to apply a lambda function that can find the maximum or minimum of values within columns or rows of a DataFrame.

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Define a lambda function that calculates the maximum of each row or column of a DataFrame
max_value = lambda x: x.max()

# Apply the lambda function to each row of the DataFrame
df.apply(max_value, axis=1)

# Apply the lambda function to each column of the DataFrame
df.apply(max_value, axis=0)
Apply-Lambda function to each row (Image by Author)
Apply-Lambda function to each column (Image by Author)

USE CASE 3: Difference between min and max of each column

We can also use apply with lambda to find the difference between the minimum and maximum values within each column of the DataFrame.

In the below example, for column ‘A’, the lambda function calculates the maximum as 3 and the minimum as 1, giving the output as 2. Similarly, it outputs 2 for column ‘B’ too.

# Define a lambda function that calculates the difference between the maximum and minimum value of each column in a DataFrame
range = lambda x: x.max() - x.min()

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Apply the lambda function to each column of the DataFrame
df.apply(range)
Output of the min and max column values(Image by Author)

USE CASE 4: String Manipulation

Many times, while working with the string columns, we need to perform data manipulation such as converting strings to lowercase, splitting the sentences into words, string replacement, etc.

In this example, we will look at one such use case where we will convert string values within the columns of a DataFrame into lowercase.

# Define a lambda function that converts a string to lowercase
to_lower = lambda x: x.str.lower() if isinstance(x, object) else x

# Create a DataFrame
df = pd.DataFrame({'A': ['A', 'B', 'C'], 'B': ['X', 'Y', 'Z']})

# Apply the lambda function to each element of the DataFrame
df.apply(to_lower)
Lowercase using Lambda (Image by Author)

USE CASE 5: Normalisation of variable

Data normalization is a very important step before applying machine learning algorithms such as K-means.

In the below example, we are using apply method with the lambda function to normalize the values within the columns of a DataFrame.

# Define a lambda function that normalizes the values of a DataFrame
normalize = lambda x: (x - x.mean()) / x.std()

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Apply the lambda function to each element of the DataFrame
df.apply(normalize)
The normalization of Column(Image by Author)

USE CASE 6: Replace missing values

Data cleaning is a crucial pre-processing step before performing the data analysis.

Missing value treatment is part of the data cleaning process that can be implemented using apply and lambda functions (as shown below).

# Define a lambda function that replaces missing values in a DataFrame with a specified value
replace_missing = lambda x, value: x.fillna(value)

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 5, 6]})

# Apply the lambda function to each element of the DataFrame
df.apply(replace_missing, args=(0,))
Missing value treatment output(Image by Author)

USE CASE 7: Cumulative Sum

We can also find the cumulative sum of columns of a pandas DataFrame using the apply and lambda function as shown below.

# Define a lambda function that calculates the cumulative sum of each column in a DataFrame
cumsum = lambda x: x.cumsum()

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Apply the lambda function to each column of the DataFrame
df.apply(cumsum)
Cumulative sum(Image of Author)

8. Moving Average

We can also use apply method with a lambda function to find the moving average of the columns of a DataFrame. This comes in very handy with time series data.

In the following example, we are finding the moving average with a window of 2 (looking at 2 consecutive rows at a time).

# Define a lambda function that calculates the moving average of each column in a DataFrame
moving_average = lambda x: x.rolling(window=2).mean()

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Apply the lambda function to each column of the DataFrame
df.apply(moving_average)
Cumulative Sum(Image by Author)

If the dataset is too big then the best practice is to use vectorized operations to manipulate the panda DataFrame. Read about vectorization here-

Conclusion

We looked at a variety of use cases where we can use lambda and apply functions while working with pandas. These functions can make our code concise and easy to follow.

When your dataset has millions of rows then you should not use lambda and apply but convert your code to vectorized form for faster execution.

Thank You for reading! I hope you liked it!

You can get all my posts in your inbox. Do that here!

If you like to experience Medium yourself, consider supporting me and thousands of other writers by signing up for a membership. It only costs $5 per month, it supports us, writers, greatly, and you get to access all the amazing stories on Medium.

Follow me to see my data science posts in your feed.

Python
Data Science
Data Analysis
Pandas
Recommended from ReadMedium