20 Essential Python Code Snippets for Data Scientists
Upgrade your Python skills
Python is the go-to language for data scientists, thanks to its versatility and rich ecosystem of libraries. In this article, we’ll explore 20 important Python code snippets every data scientist should have in their toolkit. These snippets cover a wide range of data manipulation and analysis tasks.
1. Importing Libraries:
Always start by importing the necessary libraries for your project.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
2. Reading Data:
Load data from various sources, such as CSV, Excel, or SQL databases.
# From CSV
data = pd.read_csv('data.csv')
# From Excel
data = pd.read_excel('data.xlsx')
# From SQL
import sqlite3
conn = sqlite3.connect('database.db')
data = pd.read_sql_query('SELECT * FROM table_name', conn)
3. Data Inspection:
Quickly check the first few rows and basic statistics of your data.
data.head() data.describe()
4. Handling Missing Values:
Deal with missing data using pandas.
data.dropna() # Remove rows with missing values
data.fillna(value) # Fill missing values with a specific value
5. Data Selection:
Select specific columns or rows from your DataFrame.
data['column_name']
data.loc[data['condition']]
6. Data Filtering:
Filter data based on conditions.
data[data['column'] > 50]
data[(data['column1'] > 30) & (data['column2'] < 10)]
7. Grouping and Aggregation:
Aggregate data using group-by operations.
data.groupby('category')['value'].mean()
8. Data Visualization:
Create plots and charts for data exploration.
plt.hist(data['column'], bins=20)
sns.scatterplot(x='x', y='y', data=data)
9. Data Sampling:
Take random samples from your dataset.
sample = data.sample(n=100)
10. Pivot Tables:
Create pivot tables for summarizing data.
pd.pivot_table(data, values='value', index='category', columns='date', aggfunc=np.sum)
11. Merging Data:
Combine data from multiple sources.
merged_data = pd.concat([data1, data2], axis=0)
12. Data Transformation:
Apply functions to data columns.
data['new_column'] = data['old_column'].apply(lambda x: x * 2)
13. Date and Time Operations:
Manipulate date and time data.
data['date_column'] = pd.to_datetime(data['date_column'])
data['month'] = data['date_column'].dt.month
14. Machine Learning with Scikit-Learn:
Train and evaluate machine learning models.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
15. Saving Data:
Save your processed data to a file.
data.to_csv('processed_data.csv', index=False)
16. Handling Outliers:
Detect and deal with outliers in your data.
Q1 = data['column'].quantile(0.25)
Q3 = data['column'].quantile(0.75)
IQR = Q3 - Q1
data = data[(data['column'] >= Q1 - 1.5 * IQR) & (data['column'] <= Q3 + 1.5 * IQR)]
17. Text Processing:
Perform text processing tasks.
text = "This is a sample text."
words = text.split()
18. Statistical Tests:
Conduct statistical tests for hypothesis testing.
from scipy.stats import ttest_ind
result = ttest_ind(data['group1'], data['group2'])
19. Regular Expressions:
Use regex for advanced text pattern matching.
import re
matches = re.findall(r'\b\d+\b', text)
20. Error Handling:
Handle exceptions to ensure smooth code execution.
try:
# Code that may raise an exception
except Exception as e:
print(f"An error occurred: {e}")
Conclusion
These 20 essential Python code snippets will save you time and effort while working on various data science tasks. Whether you’re cleaning data, exploring it, or building machine learning models, having these tools at your disposal is invaluable. Learning and mastering these snippets will make you a more efficient and effective data scientist.
Python Fundamentals
Thank you for your time and interest! 🚀 You can find even more content at Python Fundamentals 💫