avatarNaina Chaturvedi

Summary

The provided content outlines Day 23 of a 30-day data analytics series, focusing on understanding various charts/plots, implementing linear regression, performing data profiling, and exploring correlation coefficients, with practical examples and code implementations using Python and libraries like pandas, seaborn, and scikit-learn.

Abstract

Day 23 of the "30 days of Data Analytics with Projects Series" delves into the intricacies of data visualization by explaining different types of charts and plots, such as count plots, bar plots, dist plots, box plots, violin plots, joint plots, pie charts, line plots, point plots, heatmaps, cat plots, KDE plots, and LM plots, among others. The article emphasizes the importance of linear regression as a statistical technique for modeling relationships between variables and provides a step-by-step implementation using Python. Data profiling is discussed as a method for summarizing and understanding dataset characteristics, including data types, missing values, and statistical summaries. The concept of correlation coefficients is explored in depth, covering Spearman's ρ, Pearson's r, Kendall's τ, Cramér's V, and Phik (φk), with explanations of their applications and interpretations. The content also includes a comprehensive guide to generating profile reports and interpreting correlation coefficients, complete with visual examples and code snippets for hands-on learning.

Opinions

  • The author advocates for the use of seaborn and matplotlib for data visualization, highlighting their effectiveness in presenting data clearly and intuitively.
  • There is a strong emphasis on the practical application of statistical techniques, encouraging readers to engage with actual code implementations and real-world datasets.
  • The article suggests that understanding the underlying distribution and relationship between variables is crucial for effective data analysis.
  • The author provides a subjective recommendation to subscribe to a YouTube channel for additional learning resources, indicating a belief in the value of supplementary educational content.
  • There is an opinion that mastering the interpretation of various charts and plots is essential for data analysts to communicate insights effectively.
  • The content implies that a thorough knowledge of correlation coefficients is vital for data scientists to assess the strength and direction of relationships between variables.

Project 9— Day 23 of 30 days of Data Analytics with Projects Series

Welcome back peep. Hope all’s well. This is Day 23 of 30 days of data analytics where we will be implementing a project covering —

Know your charts/plots

A detailed study of what each chart represents, implementation details and which chart to use and when.

Linear Regression

Data Profiling

Correlation Coefficients

Spearman’s ρ

Pearson’s r

Kendall’s τ

Cramér’s V (φc)

Phik (φk)

Lets cover some of the most important concepts in brief —

  • Linear Regression: is a statistical technique used to model the relationship between one or more independent variables (also called predictors or features) and a dependent variable. Linear regression assumes that the relationship between the variables is linear and can be used to make predictions about the value of the dependent variable based on the values of the independent variables.
  • Data Profiling: is the process of analyzing and summarizing the main characteristics of a dataset. This can include reviewing the data types, number of records, missing values, and other statistical summaries. It is important step in understanding the data and identifying potential issues before building models.
  • Correlation Coefficients: are measures of the strength and direction of the relationship between two variables. The different types are:
  • Spearman’s ρ: a non-parametric measure of the correlation between two variables. This measure is used when the data is ordinal.
  • Pearson’s r: a measure of the linear correlation between two variables. This measure is used when the data is interval or ratio.
  • Kendall’s τ: a non-parametric measure of the correlation between two variables. This measure is used when the data is ordinal.
  • Cramér’s V (φc): a measure of association between two categorical variables. It is used when the variables are nominal and the sample size is small.
  • Phik (φk): a measure of association between two categorical variables. It is used when the variables are nominal and the sample size is large.

These coefficients can be used to determine whether two variables are positively or negatively correlated and the strength of this correlation.

Example Code Implementation —

import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Linear Regression
# Assume we have two independent variables 'X1' and 'X2' and a dependent variable 'y'
X = data[['X1', 'X2']]
y = data['y']

# Create a linear regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Predict the dependent variable
y_pred = model.predict(X)

# Data Profiling
data = pd.read_csv('your_dataset.csv')
print("Data Types:")
print(data.dtypes)
print("Number of Records:", len(data))
print("Missing Values:")
print(data.isnull().sum())
print("Statistical Summaries:")
print(data.describe())

# Correlation Coefficients
spearman_corr, _ = stats.spearmanr(data['Variable1'], data['Variable2'])
pearson_corr, _ = stats.pearsonr(data['Variable1'], data['Variable2'])
kendall_corr, _ = stats.kendalltau(data['Variable1'], data['Variable2'])
cramer_corr, _ = stats.pointbiserialr(data['CategoricalVariable1'], data['CategoricalVariable2'])
phik_corr = data.corr(method='phik')

print("Spearman's ρ:", spearman_corr)
print("Pearson's r:", pearson_corr)
print("Kendall's τ:", kendall_corr)
print("Cramér's V (φc):", cramer_corr)
print("Phik (φk):")
print(phik_corr)

# Visualization of Correlation
sns.scatterplot(x='Variable1', y='Variable2', data=data)
plt.title('Variable1 vs Variable2')
plt.show()

In linear regression, it is important to check the correlation between independent variables and dependent variable, because high correlation between independent variables can lead to multicollinearity which can affect the interpretation of the model.

Snippet —

What’s covered in 30 days of Data Analytics Series till now —

Day 1 : Data Analytics basics and kickstart of Data analytics with projects series

Day 2: Business Understanding — Data Driven Decision Making, Descriptive Analysis, Predictive Analysis, Diagnostic Analysis, Prescriptive Analysis

Day 3 : Data Analytics Ecosystem — Data Life Cycle, Data Analysis complete process ( most important things)

Day 4 : Probability, Conditional Probability, Binomial Distribution, Probability Density Function, Sampling Distribution

Day 5 : Statistics

Day 6 : Basic and Advanced SQL

Day 7 : Data Collection, Data Cleaning and Python

Day 8 : Pandas and Numpy

Day 9 : Data Manipulation

Day 10 : Data Visualization — Part 1

Day 11 : Project 1 : Data Visualization — Part 2

Day 12 : Data Visualization — Part 3

Day 13: Tableau — Part 1

Day 14: Tableau — Part 2

Day 15: Tableau — Part 3

Tableau Project

Day 16 : Data Analysis Project 2

Day 17 : Data Analysis Project 3

Day 18: Data Analysis Project 4

Day 19: Data Analysis Project 5

Day 20 : Data Analysis Project 6

Categorical and Numerical Features

Missing Value Analysis

Fill the missing Values

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Day 21 : Data Analysis Project 7

Data Profiling

Feature Engineering

GroupBy Features

Categorical and Numerical Features

Missing Value Analysis

Fill the missing Values

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Day 22 : Data analysis Project 8

Linear Regression

Data Profiling

Feature Engineering

Sort Values

Categorical and Numerical Features

Missing Value Analysis

Unique Value Analysis

Univariate Analysis

Bivariate Analysis

Multivariate Analysis

Correlation Analysis

Correlation Coefficients

Day 23 : Data Analysis Project 9

Know your chart/plots

Linear Regression

Data Profiling

Correlation Coefficients

Take Complete Hands On Tableau Course : Link

All the projects, data structures, algorithms, system design, Data Science and ML, Data Engineering, MLOps and Deep Learning videos will be published on our youtube channel ( just launched).

Subscribe today!

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

In the last post we covered Data Visualization and in this post we will cover a project.

(Note : Zoom all the images)

Import Necessary Libraries

import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.colors import rgb2hex
import matplotlib.cm as cm
import matplotlib.colors 
from collections import Counter
cmap2 = cm.get_cmap('twilight',13)
colors1= []
for i in range(cmap2.N):
    rgb= cmap2(i)[:4]
    colors1.append(rgb2hex(rgb))

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Set style
sns.set(style='whitegrid')

Load the Data

We are using video games dataset, pokemon dataset and iris dataset in this project.

df= pd.read_csv('Path to file/vgsales.csv', low_memory = False)

Get information about your data

df.info()

Output —

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64

Know your charts/plots

To present your data, there are four basic presentation types :

Composition : To show part-to-whole relationship of the data variables

Distribution : To show the spread of the data values

Relationship : To establish relationship between the different data variables

Comparison : To compare one value with the other ( i.e two or more data variables)

Count Plot

Count Plot shows the number of occurrences of an item based on a certain type of category.

Implementation —

#Genre Count ( Video Games Data Set)

plt.figure(figsize=(10,8))
sns.countplot(x='Genre',data=df,palette='mako',order = df['Genre'].value_counts().index)
plt.xlabel('Genre')
plt.xticks(rotation = 60)
plt.ylabel('Count')
plt.legend()
plt.title('Genre Count')


plt.show()

Output —

Bar plot

Bar plot is used to show categorical data with heights proportional to values and represents point estimates and estimate of central tendency.

Implementation —

# Global Sales by Genre ( video game dataset)

gg_df = df.groupby(by=['Genre'])['Global_Sales'].sum()
gg_y = gg_df.reset_index()

plt.figure(figsize=(25,18))
sns.barplot(y='Genre',x='Global_Sales', data=gg_y,palette='mako',orient='h')
plt.xlabel('Year')
plt.xticks(rotation = 60)
plt.ylabel('Global Sales')
plt.legend()
plt.title('Global Sales by Genre')


plt.show()

Output —

Distplot

Distplot is used to show the univariate distribution of data.

Implementation —

# Attack Distribution ( Pokemon dataset)

plt.figure(figsize=(12,10))
sns.distplot(x=df['Attack'],bins=10,color='darkcyan',kde=True,hist=True)
plt.title('Attack Distribution')
plt.xlabel('Attack')
plt.ylabel('Frequency')
plt.xticks(rotation=45)


plt.show()

Output —

Boxplot

Boxplot is used to give a statistical summary of the features being plotted. Top line represent the max value, top edge of box is third Quartile, middle edge represents the median,bottom edge represents the first quartile value. #The bottom most line represent the minimum value of the feature.The height of the box is called as Interquartile range.The black dots on the plot represent the outlier values in the data.

Pic credits : Onlinemathlearning

Implementation —

#Generation vs Speed ( Pokemon dataset)

plt.figure(figsize=(20,10))
sns.boxplot(x="Generation", y="Speed", hue='Legendary', data=df, palette='mako')
plt.title('Generation vs Speed  by Legendary Status')


plt.show()

Output —

Violin Plot

Violin Plot is used to visualize the distribution of data and its probability distribution. It’s a combination of a Box Plot and a Density Plot that is rotated and placed on each side, to show the distribution shape of the data. The thick black bar in the centre represents the interquartile range, the thin black line extended from it represents the 95% confidence intervals, and the white dot is the median.

Implementation —

#Total by Generation ( Pokemon dataset)

plt.figure(figsize=(20,20))
plt.title('Total by Generation')
sns.violinplot(x = "Generation", y = "Total",data = df,palette='mako')


plt.show()

Output —

Joint Plot

Joint Plot is used to quickly visualize and analyze the relationship between two variables and describe their individual distributions on the same plot. You can draw a plot of two variables with bivariate and univariate graphs. You can replace the scatterplots and histograms with density estimates and regression.

Implementation —

#Attack Vs Defense ( Pokemon dataset)

plt.figure(figsize=(20,20))
plt.title('Attack vs Defense')
sns.jointplot(x="Attack",y="Defense",data=df,kind="hex",color='darkcyan')


plt.show()

Output —

Another implementation —

#Sepal Length vs Sepal Width ( Iris dataset)

plt.figure(figsize=(20,20))
plt.title('Sepal Length vs Sepal Width')
sns.jointplot(x="sepal_length",y="sepal_width",data=iris_data,color="blue",kind="reg")

plt.show()

Output —

Pie chart

Pie chart is used to show numerical proportion of the categorical features in the data.

Implementation —

#   Genre Percentage

plt.figure(figsize=(25,12))
p_r = df['Genre'].value_counts().head(10)
plt.pie(x=p_r,labels=p_r.index,colors=colors1,autopct='%.0f%%',explode=[0.07 for i in p_r.index],startangle=180,wedgeprops={'linewidth':1,'edgecolor':'black'},shadow=True)
plt.title('Genre percentage ')
plt.legend(loc='upper right',title='Genre')


plt.show()

Output —

Line Plot

Line plot is used to show frequency of the data points and to compare those data points.

Implementation —

# Different Sales by Platform ( Video Game Dataset)
pg = df.groupby('Platform').mean()[['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales' ]]

plt.figure(figsize=(20,10))
pg.plot.line(color=colors1)
plt.title(' Different Sales by Platform')
plt.legend(loc='upper right')

plt.show()

Output —

Point Plot

Point plot is used to show confidence intervals and point estimates. It creates 2D plot of points.

Implementation —

# Global Sales by Year via different Platform
plt.figure(figsize=(25,18))
sns.catplot(x="Year",y="Global_Sales",kind="point",data=df[(df.Year > 2007) & (df.Year < 2018)], hue = "Platform",
            palette='mako',ci = None,edgecolor=None,height=10, aspect=10.6/8.23)
plt.title('Sales by Year via different Platform')

plt.show()

Output —

Heat Map

Heat map is used to find out the correlation between different features in the dataset. High positive or negative value shows that the features have high correlation. It helps in correlation analysis.

Implementation —

# Total Sales by Different Genre ( Video Games Dataset)
g_platform = df[['Genre', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales']]
g_compare = g_platform.groupby(by=['Genre']).sum()

# heatmap correlation

#corrmat = df.corr()
plt.figure(figsize=(15,10))
sns.heatmap(g_compare,annot=True,fmt=".2f",cmap='mako')

plt.xticks(fontsize=15)
plt.yticks(fontsize=15)

plt.show()

Output —

Cat plot

Cat plot is used to work with categorical data. One can use catplot to represent categorical data using scatter plot as well as point to show distribution of the data observations.

Implementation —

# NA Sales by Year via different Platform ( Video games dataset)
plt.figure(figsize=(25,18))
sns.catplot(x="Year",y="NA_Sales",kind="bar",data=df[(df.Year > 2009) & (df.Year < 2014)], hue = "Platform",
            palette='mako',ci = None,edgecolor=None,height=10, aspect=10.6/8.23)
plt.title('Sales by Year via different Platform')


plt.show()

Output —

KDE plot

KDE plot is used to fit and plot a univariate or bivariate kernel density estimate. Kernel Density Estimation ( KDE) chart is used to show the the distribution of data points/values i.e. project the probability density of a continuous variable in more interpretable format.

Implementation —

# Game Release by Year ( Video Games Dataset)

plt.figure(figsize=(25,18))
sns.kdeplot(data=df['Year'], label='Year', shade=True,palette='mako')
plt.xlabel('Year')
plt.xticks(rotation = 60)
plt.legend()
plt.title('Game Release by Year')


plt.show()

Output —

Another implementation —

# Sp.Attack distribution by Type 1  ( Pokemon data set)

plt.figure(figsize=(25,12))
sns.kdeplot(df["Sp. Atk"], hue=df["Type 1"], fill=True, linewidth=1, palette='mako')
plt.axvline(df['Sp. Atk'].mean(), c='black',ls='--')
plt.title("Sp. Atk distribution by Type 1 ")


plt.show()

Output —

LMplot

LMplot is used to fit regression models across conditional subsets of a dataset.

Implementation —

# Attack Vs Defense by Legendary Status( Pokemon data set)

plt.figure(figsize=(20,10))
sns.lmplot(x='Attack', y='Defense', hue='Legendary',  markers=['+', 'D'], fit_reg=False, data=df,palette='mako')
plt.title('Attack Vs Defense by Legendary Status')


plt.show()

Output —

Swarm Plot

Swarm plot is used whenever you want to draw a categorical scatterplot with non-overlapping points. It gives a better representation of the distribution of values, but it does not scale well to large numbers of observations. The style of the plot is called a “beeswarm”.

Implementation —

#Species vs Petal Width ( iris dataset)

plt.figure(figsize=(10,8))
sns.swarmplot(x="species", y="petal_width", data=iris_data, color=".25")

plt.title('Species vs Petal Width')

plt.show()

Output —

Pair Plot

Pair plot is used to show how all the features vcan be paired with all other variables. In this, one variable in the same data row is matched with another variable’s value.

Implementation —

# Pairplot using Iris Dataset

plt.figure(figsize=(50,30))
sns.pairplot(data=iris_data,kind="scatter",hue="species",dropna=True,palette="winter")


plt.show()

Output —

Strip Plot

Strip plot is used when you want to show all observations along with some representation of the underlying distribution.

Implementation —

fig=sns.stripplot(x="species",y="petal_width",data=iris_data,color="blue",hue="species",order=["Iris-setosa",\
                "Iris-versicolor","Iris-virginica"],jitter=True,edgecolor="black",linewidth=1,size=6,orient='v'\
                ,palette="Set2")

Output —

Facet grid

Facet grid is used as a Multi-plot grid for plotting conditional relationships.

Implementation —

#Iris Datset

sns.FacetGrid(iris_data,hue="species",height=5)\
             .map(sns.kdeplot,"sepal_length")\
             .add_legend()

Output —

To summarize —

  1. When you want to show distribution , then use -
  • For single variable with few data points — Use column histogram/count plot
  • For Single Variable with many data points — Use Histogram
  • For two variable s — Use Scatter Chart

2. When you want to show composition, then use —

  • For static composition, to show share of total — Use pie chart

3. When you want to show relationship, then use —

  • When you want to show relationship between two variables — Use scatter chart
  • When you want to show relationship between three variables — Use Bubble Chart

4. When you want to show Comparison, then use —

  • When you want to show comparison, and have many items for few categories — Use bar chart
  • When you want to show comparison, and have few items for few categories — Use Column Chart
  • When you want to show comparison, and have many periods over time for cynical data — Use Area Chart
  • When you want to show comparison, and have many periods over time for non- cynical data — Use Line Chart

Data Profiling

It is used to generate profile reports from the input data.

The statistics include

Descriptive Statistics and Quantile Statistics.

Descriptive stats — Standard deviation, Kurtosis, mean, skewness, variance etc

Quantile Statistics — Min-max, percentiles, median, IQR etc

df.profile_report()

Output —

Output

Linear Regression

It’s a technique to estimate the relationship between two quantitative variables. It is used when you want to establish:

  1. Strength of the relationship — How strong the relationship is between two variables
  2. The value of the dependent variable at a certain value of the independent variable.

where,

y is the predicted value of the dependent variable for any given value of the independent variable which is X.

B0 is the intercept and B1 is the regression coefficient

x is the independent variable

e is the error of the estimate

It works on the assumption that the relationship between the independent and dependent variable is linear: the line of best fit through the data points is a straight line as shown in the diagram.

# Pokemon dataset

reg_X = df.loc[:,"Attack":]
reg_y = pd.DataFrame(df.loc[:,"Total"])

X_train, X_test, y_train, y_test = train_test_split(pd.DataFrame(reg_X.loc[:,"Attack"]), reg_y,random_state = 0)
lr = LinearRegression().fit(X_train, y_train)
x = np.array(reg_X["Attack"])
# Linear Regression

plt.figure(figsize=(20,10))
plt.scatter(reg_X.loc[:,"Attack"], reg_y, marker= 'D', s=30, alpha=0.9, cmap='Blue')
plt.plot(reg_X.loc[:,"Attack"], lr.intercept_+ lr.coef_ * x.reshape(-1,1) , 'black')


ax = plt.gca()
ax.xaxis.grid(True,alpha=0.4)
ax.yaxis.grid(True,alpha=0.4)
    
plt.title('Linear Regression')
plt.xlabel('Attack')
plt.ylabel('Total')
          

plt.show()

Output —

Correlation Coefficients

It’s the measure of the strength of the relationship between two variables.

Pic credits : cumath

Spearman’s ρ

The Spearman’s rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson’s r. It’s value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson’s r

The Pearson’s correlation coefficient (r) is a measure of linear correlation between two variables. It’s value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall’s τ

Similarly to Spearman’s rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It’s value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér’s V (φc)

Cramér’s V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér’s V have been proved to be biased, even for large samples.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution.

That’s it for now.

Find Day 24 Below: Data Analysis : Project 10

Let me know if you have questions in the comment section below. Subscribe/ Follow, Like/Clap as it would encourage me to write more in my free time

Stay Tuned!!

Read More —

11 most important System Design Base Concepts

1. System design basics

2. Horizontal and vertical scaling

3. Load balancing and Message queues

4. High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture

5. Caching, Indexing, Proxies

6. Networking, How Browsers work, Content Network Delivery ( CDN)

7. Database Sharding, CAP Theorem, Database schema Design

8. Concurrency, API, Components + OOP + Abstraction

9. Estimation and Planning, Performance

10. Map Reduce, Patterns and Microservices

11. SQL vs NoSQL and Cloud

12. Most Popular System Design Questions

13. System Design Template — How to solve any System Design Question

14. Quick RoundUp : Solved System Design Case Studies

System Design Case Studies — In Depth

Design Instagram

Design Messenger App

Design Twitter

Design URL Shortener

Design Dropbox

Design Youtube

Design API Rate Limiter

Design Web Crawler

Design Facebook’s Newsfeed

Design Yelp

Design Uber

Design Tinder

Design Tiktok

Design Whatsapp

Most Popular System Design Questions

Mega Compilation : Solved System Design Case studies

Complete Data Structures and Algorithm Series

Complexity Analysis

Backtracking

Sliding Window

Greedy Technique

Two pointer Technique

Arrays

Linked List

Strings

Stack

Queues

Hash Table/Hashing

Binary Search

1- D Dynamic Programming

Divide and Conquer Technique

Recursion

Some of the other best Series —

60 days of Data Science and ML Series with projects

30 Days of Natural Language Processing ( NLP) Series

30 days of Machine Learning Ops

30 days of Data Structures and Algorithms and System Design Simplified

60 Days of Deep Learning with Projects Series

30 days of Data Engineering with projects Series

Data Science and Machine Learning Research ( papers) Simplified **

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

Exceptional Github Repos — Part 1

Exceptional Github Repos — Part 2

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

For Python Projects —

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Follow for more updates. Stay tuned and keep coding!

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Recurrent Neural Network with Keras

Clustering Geolocation Data in Python using DBSCAN and K-Means

Facial Expression Recognition using Keras

Hyperparameter Tuning with Keras Tuner

Custom Layers in Keras

Data Science
Machine Learning
Tech
Programming
Artificial Intelligence
Recommended from ReadMedium