avatarErdem Isbilen

Summary

The provided content discusses essential Python keywords and their application in the data preparation process, offering examples for beginners.

Abstract

The article "Top Python Keywords That You Must Use in Your Data Preparation Process" outlines the significance of Python keywords in data science. It emphasizes the necessity of data preparation before deriving insights from data and introduces the reserved keywords in Python 3.8, which are integral to various data preparation tasks. The author categorizes these keywords into different groups, such as Import Keywords, Structure Keywords, and Control Flow Keywords, and provides practical examples using common libraries like pandas and numpy. The article also covers the typical stages of data preparation, including data cleaning, feature selection, data transformation, feature engineering, and dimensionality reduction. By demonstrating the use of keywords in defining functions, looping through data structures, and controlling code flow, the article aims to equip beginners with the foundational knowledge to effectively handle data preparation in Python.

Opinions

  • The author believes that using Python's built-in keywords is crucial for efficient data science workflows.
  • Importing external libraries with keywords like import and as is seen as a practical approach to leverage existing solutions and avoid reinventing the wheel.
  • Functions defined with def are considered essential for organizing code into manageable and reusable blocks.
  • The use of for and in keywords for iteration is highlighted as a fundamental technique for processing data structures in Python.
  • The article suggests that conditional logic with if and else keywords is vital for decision-making within data preparation scripts.
  • The author encourages readers to apply the discussed Python keywords in their coding practice to improve their data preparation processes.

Top Python Keywords That You Must Use in Your Data Preparation Process

With easy to follow examples for beginners

Photo by Theo Crazzolara on Unsplash

You have data.

You need insights.

Unfortunately, before you get any insight out of data, you need to tackle the data preparation process.

At this point, there are commonly used Python keywords helping you in your basic data preparation tasks.

In this article, I will explain those top Python keywords and their uses in the data preparation process with easy to follow examples.

What is Keyword

Python keywords are reserved words that cannot be used as a variable name, function name or any other identifier as they have specific purpose and meaning in Python language.

Python 3.8 has 35 keywords which are listed below;

False      await      else       import     pass
None       break      except     in         raise
True       class      finally    is         return
and        continue   for        lambda     try
as         def        from       nonlocal   while
assert     del        global     not        with
async      elif       if         or         yield

You don’t have to import keywords as they are always available but they need to be spelled exactly how it is written above.

Python keywords can be categorised as follows:

  • Import Keywords: import, from, as
  • Structure Keywords: def, class, with, pass, lambda
  • Value Keywords: True, False, None
  • Operator Keywords: and, or, not, in, is
  • Control Flow Keywords: if, elif, else
  • Iteration Keywords: for, while, break, continue
  • Returning Keywords: return, yield
  • Exception-Handling Keywords: try, except, raise, finally, assert
  • Asynchronous Programming Keywords: async, await
  • Variable Handling Keywords: del, global, nonlocal

What is Data Preparation

Data preparation process contains a set of pre-modelling tasks. These tasks can be categorised as follows:

  • Data Cleaning: Correcting or removing incorrect, corrupted, missing, duplicate, or incomplete data within a dataset
  • Feature Selection: Defining input variables that are most relevant to the task.
  • Data Transformation: Changing the scale or distribution of data.
  • Feature Engineering: Deriving new variables from available data.
  • Dimensionality Reduction: Reducing the number of input variables in a dataset while keeping the variation as much of as possible.

Which specific data preparation tasks to be used depends on data and algorithms that will be used for modelling.

Top Python Keywords Used in Data Preparation Tasks

‘import’ and ‘as’

Not to reinvent the wheel while performing specific tasks in your data science projects, you need to use modules and libraries of others. To use those libraries, you need to import them into your code with import keywords such as ‘import’, ‘as’, and ‘from’.

import pandas as pd
import numpy as np

In above code, pandas, and numpy, libraries are imported. We will be using these modules later in our code. ‘as’ keyword here helps us to rename the module. This is especially helpful when using modules with long names, or when it is needed to separate the namespace.

‘def’

‘def’ is used to define a python function. Functions are heavily used in data science projects. They help us to convert our large code blocks into logical and manageable pieces.

Let’s create a function that prints the count of missing items in a data-frame column.

def missing_item_count(df):

‘for’ and ‘in’

It is a common practice to loop through the items in a dataframe, or in a complex data object such as dictionaries or lists. ‘for’ and ‘in’ couple is a perfect match for such tasks. Below you can see that we can get the columns of the ‘airbnb’ dataframe with ‘for’ keyword.

Our loop starts with for keyword, then we add the variable ‘col’ to assign each element of the data container followed by ‘in’ keyword. After the ‘in’ keyword, finally ‘df.columns’ comes which is the data container itself.

airbnb_url = 'https://raw.githubusercontent.com/ManarOmar/New-York-Airbnb-2019/master/AB_NYC_2019.csv'
airbnb = pd.read_csv(airbnb_url)
def missing_item_count(df):
  for col in df.columns:
    missing_item_count = df[col].isna().sum()
    print(f'Column {col} has {missing_item_count} missing items')
missing_item_count(airbnb)
Output:
Column id has 0 missing items 
Column name has 16 missing items 
Column host_id has 0 missing items 
Column host_name has 21 missing items

Now, inside the for loop, we can iterate over the items in df.column object, and get them in ‘col’ variable.

‘if’ and ‘else’

‘if’, ‘else’ control flow keywords are used for decision making. Code blocks are executed depending on the value of the test expression.

def missing_item_count(df):
  for col in df.columns:
    missing_item_count = df[col].isna().sum()
   
    if pct:
      print(f'Column {col} has {missing_item_count} missing items')
    else:
      print(f'Column {col} has ZERO missing item')
missing_item_count(airbnb)
Output:
Column id has ZERO missing item 
Column name has 16 missing items 
Column host_id has ZERO missing item 
Column host_name has 21 missing items

In the above code, if the missing_item_count variable is True, (if it is not a zero integer in our case), it prints the column name and the missing_item_count value.

If the missing_item_count variable is False, (if it is a zero integer), then the code block inside else keyword is executed.

This is how you can control your code flow with ‘if’ and ‘else’ keywords.

Key Takeaways and Conclusion

  • Python keywords are reserved words that have specific meanings and purposes. You don’t have to import keywords into your code as they are always available
  • ‘def’ is used to define a python function. Functions are heavily used in data science projects. They help us to convert our large code blocks into logical and manageable pieces.
  • ‘if’, ‘else’ control flow keywords are used for decision making. Code blocks are executed depending on the value of the test expression.
  • It is a common practice to loop through the items in a dataframe, or in a complex data object such as dictionaries or lists. ‘for’ and ‘in’ couple is a perfect match for such tasks.

I hope you have found the article useful and you will start using above keywords in your own code.

Python
Keywords
Data Preprocessing
Programming
Data Science
Recommended from ReadMedium