avatarPython Fundamentals


20 Essential Python Code Snippets for Data Scientists

Upgrade your Python skills

Photo from Pexels

Python is the go-to language for data scientists, thanks to its versatility and rich ecosystem of libraries. In this article, we’ll explore 20 important Python code snippets every data scientist should have in their toolkit. These snippets cover a wide range of data manipulation and analysis tasks.

1. Importing Libraries:

Always start by importing the necessary libraries for your project.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

2. Reading Data:

Load data from various sources, such as CSV, Excel, or SQL databases.

# From CSV
data = pd.read_csv('data.csv')

# From Excel
data = pd.read_excel('data.xlsx')

# From SQL
import sqlite3
conn = sqlite3.connect('database.db')
data = pd.read_sql_query('SELECT * FROM table_name', conn)

3. Data Inspection:

Quickly check the first few rows and basic statistics of your data.


4. Handling Missing Values:

Deal with missing data using pandas.

data.dropna()  # Remove rows with missing values
data.fillna(value)  # Fill missing values with a specific value

5. Data Selection:

Select specific columns or rows from your DataFrame.


6. Data Filtering:

Filter data based on conditions.

data[data['column'] > 50]
data[(data['column1'] > 30) & (data['column2'] < 10)]

7. Grouping and Aggregation:

Aggregate data using group-by operations.


8. Data Visualization:

Create plots and charts for data exploration.

plt.hist(data['column'], bins=20)
sns.scatterplot(x='x', y='y', data=data)

9. Data Sampling:

Take random samples from your dataset.

sample = data.sample(n=100)

10. Pivot Tables:

Create pivot tables for summarizing data.

pd.pivot_table(data, values='value', index='category', columns='date', aggfunc=np.sum)

11. Merging Data:

Combine data from multiple sources.

merged_data = pd.concat([data1, data2], axis=0)

12. Data Transformation:

Apply functions to data columns.

data['new_column'] = data['old_column'].apply(lambda x: x * 2)

13. Date and Time Operations:

Manipulate date and time data.

data['date_column'] = pd.to_datetime(data['date_column'])
data['month'] = data['date_column'].dt.month

14. Machine Learning with Scikit-Learn:

Train and evaluate machine learning models.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)

15. Saving Data:

Save your processed data to a file.

data.to_csv('processed_data.csv', index=False)

16. Handling Outliers:

Detect and deal with outliers in your data.

Q1 = data['column'].quantile(0.25)
Q3 = data['column'].quantile(0.75)
IQR = Q3 - Q1
data = data[(data['column'] >= Q1 - 1.5 * IQR) & (data['column'] <= Q3 + 1.5 * IQR)]

17. Text Processing:

Perform text processing tasks.

text = "This is a sample text."
words = text.split()

18. Statistical Tests:

Conduct statistical tests for hypothesis testing.

from scipy.stats import ttest_ind
result = ttest_ind(data['group1'], data['group2'])

19. Regular Expressions:

Use regex for advanced text pattern matching.

import re

matches = re.findall(r'\b\d+\b', text)

20. Error Handling:

Handle exceptions to ensure smooth code execution.

    # Code that may raise an exception
except Exception as e:
    print(f"An error occurred: {e}")


These 20 essential Python code snippets will save you time and effort while working on various data science tasks. Whether you’re cleaning data, exploring it, or building machine learning models, having these tools at your disposal is invaluable. Learning and mastering these snippets will make you a more efficient and effective data scientist.

Python Fundamentals

Thank you for your time and interest! 🚀 You can find even more content at Python Fundamentals 💫

Data Science
Python Programming
Recommended from ReadMedium