Top Python Keywords That You Must Use in Your Data Preparation Process
With easy to follow examples for beginners
You have data.
You need insights.
Unfortunately, before you get any insight out of data, you need to tackle the data preparation process.
At this point, there are commonly used Python keywords helping you in your basic data preparation tasks.
In this article, I will explain those top Python keywords and their uses in the data preparation process with easy to follow examples.
What is Keyword
Python keywords are reserved words that cannot be used as a variable name, function name or any other identifier as they have specific purpose and meaning in Python language.
Python 3.8 has 35 keywords which are listed below;
False await else import pass
None break except in raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
async elif if or yieldYou don’t have to import keywords as they are always available but they need to be spelled exactly how it is written above.
Python keywords can be categorised as follows:
- Import Keywords:
import,from,as - Structure Keywords:
def,class,with,pass,lambda - Value Keywords:
True,False,None - Operator Keywords:
and,or,not,in,is - Control Flow Keywords:
if,elif,else - Iteration Keywords:
for,while,break,continue - Returning Keywords:
return,yield - Exception-Handling Keywords:
try,except,raise,finally,assert - Asynchronous Programming Keywords:
async,await - Variable Handling Keywords:
del,global,nonlocal
What is Data Preparation
Data preparation process contains a set of pre-modelling tasks. These tasks can be categorised as follows:
- Data Cleaning: Correcting or removing incorrect, corrupted, missing, duplicate, or incomplete data within a dataset
- Feature Selection: Defining input variables that are most relevant to the task.
- Data Transformation: Changing the scale or distribution of data.
- Feature Engineering: Deriving new variables from available data.
- Dimensionality Reduction: Reducing the number of input variables in a dataset while keeping the variation as much of as possible.
Which specific data preparation tasks to be used depends on data and algorithms that will be used for modelling.
Top Python Keywords Used in Data Preparation Tasks
‘import’ and ‘as’
Not to reinvent the wheel while performing specific tasks in your data science projects, you need to use modules and libraries of others. To use those libraries, you need to import them into your code with import keywords such as ‘import’, ‘as’, and ‘from’.
import pandas as pd
import numpy as npIn above code, pandas, and numpy, libraries are imported. We will be using these modules later in our code. ‘as’ keyword here helps us to rename the module. This is especially helpful when using modules with long names, or when it is needed to separate the namespace.
‘def’
‘def’ is used to define a python function. Functions are heavily used in data science projects. They help us to convert our large code blocks into logical and manageable pieces.
Let’s create a function that prints the count of missing items in a data-frame column.
def missing_item_count(df):‘for’ and ‘in’
It is a common practice to loop through the items in a dataframe, or in a complex data object such as dictionaries or lists. ‘for’ and ‘in’ couple is a perfect match for such tasks. Below you can see that we can get the columns of the ‘airbnb’ dataframe with ‘for’ keyword.
Our loop starts with for keyword, then we add the variable ‘col’ to assign each element of the data container followed by ‘in’ keyword. After the ‘in’ keyword, finally ‘df.columns’ comes which is the data container itself.
airbnb_url = 'https://raw.githubusercontent.com/ManarOmar/New-York-Airbnb-2019/master/AB_NYC_2019.csv'
airbnb = pd.read_csv(airbnb_url)def missing_item_count(df):
for col in df.columns:
missing_item_count = df[col].isna().sum()
print(f'Column {col} has {missing_item_count} missing items')missing_item_count(airbnb)Output:
Column id has 0 missing items
Column name has 16 missing items
Column host_id has 0 missing items
Column host_name has 21 missing itemsNow, inside the for loop, we can iterate over the items in df.column object, and get them in ‘col’ variable.
‘if’ and ‘else’
‘if’, ‘else’ control flow keywords are used for decision making. Code blocks are executed depending on the value of the test expression.
def missing_item_count(df):
for col in df.columns:
missing_item_count = df[col].isna().sum()
if pct:
print(f'Column {col} has {missing_item_count} missing items')
else:
print(f'Column {col} has ZERO missing item')missing_item_count(airbnb)Output:
Column id has ZERO missing item
Column name has 16 missing items
Column host_id has ZERO missing item
Column host_name has 21 missing itemsIn the above code, if the missing_item_count variable is True, (if it is not a zero integer in our case), it prints the column name and the missing_item_count value.
If the missing_item_count variable is False, (if it is a zero integer), then the code block inside else keyword is executed.
This is how you can control your code flow with ‘if’ and ‘else’ keywords.
Key Takeaways and Conclusion
- Python keywords are reserved words that have specific meanings and purposes. You don’t have to import keywords into your code as they are always available
- ‘def’ is used to define a python function. Functions are heavily used in data science projects. They help us to convert our large code blocks into logical and manageable pieces.
- ‘if’, ‘else’ control flow keywords are used for decision making. Code blocks are executed depending on the value of the test expression.
- It is a common practice to loop through the items in a dataframe, or in a complex data object such as dictionaries or lists. ‘for’ and ‘in’ couple is a perfect match for such tasks.
I hope you have found the article useful and you will start using above keywords in your own code.





