Complete Data Preprocessing and Data Visualization with Projects — Mega Compilation Part 2
Connect the dots…

Hi All. This post covers the two most important steps of data science — Data Manipulation, Cleaning, Augmentation, Preprocessing and Data Visualization.
Let’s get started —
Real world data is messy ( it contains null values, noises, missing values etc) and in most of the cases when we start the building the ML model we need to clean, format and pre process the data.
Some of the other best Series —
How to solve any System Design Question ( approach that you can take)?
100 days : Your Data Science and Machine Learning Degree Series with projects
Complete Data Visualization and Pre-processing Series with projects
Tech Newsletter —
If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :
Data preprocessing , one of the first and crucial step — the process in which we prepare the raw data and make it suitable for a ML model to increase its accuracy and efficiency.
Pre-requisites for data pre-processing —
Day 0 : Python
Everything you need to know in Python ( with some cool projects) is covered in the post below :
Where to find Day 0 post :
Day 1 : Hands on Pandas — part 1
In this post we covered Pandas part 1 in depth with Code Implementation. Pandas is an open source Python package written for the Python programming language for data manipulation, analysis and ML tasks.
Where to find Day 1 post :
Highly Recommended Data Science and Machine Learning Courses that you MUST take ( with certificate) —
Find best data science and data engineering courses here
Find best Machine Learning and Deep Learning courses here
Day 2: Hands on Pandas — part 2
In this post we covered Pandas part 2 in depth with Code Implementation. Topics like indexing, filtering, transformation, Merging, Hierarchical Indexing etc are covered.
Where to find Day 2 post :
Day 3 : Advanced Pandas Techniques for Data Scientists
In this post we covered how to bin, split the data, mean and interpolation method etc.
Where to find Day 3 post :
Day 4: Numpy with Code Implementation
In this post we covered Numpy part 1 with focus on Flattening the arrays, Concatenation and Broadcasting etc in detail. Numpy is a python library for scientific computing — to work with multidimensional array objects and used to handle large amount of data. An array which is a grid of values and is indexed by a tuple of nonnegative integers is main data structure of the Numpy library.
Where to find Day 4 post :
Day 5: One Simple Trick to Scrape Tabular Data using Python
Data scraping is the process of importing information from a website into a spreadsheet or local file on your system and it’s one of the most efficient ways to get data from the web. Many of you must be familiar with the Cheerio library or Python with Beautiful Soup to scrape the data. In this article, I’m going to teach one simple trick to scrape tabular data using Python and Pandas with just four lines of code.
Where to find Day 5 post :
Day 6 : Hands On Data preprocessing — Part 1
In this post we learned/implemented Hands on Data Pre-processing in depth — Part 1. Data preprocessing , one of the first and crucial step — the process in which we prepare the raw data and make it suitable for a ML model to increase its accuracy and efficiency.
Where to find Day 6 post :
Day 7 : Hands on Data Preprocessing in depth — Part 2
In this post we learned/implemented Hands on Data Pre-processing in depth — Part 2. Topics like Data Cleaning, Data Augmentation, Transformation, Channel Shift etc are covered in detail.
Where to find Day 7 post :
Day 8 : Read And Process Large Datasets Within Seconds
Handle billion of rows in seconds.
Where to find Day 8 post :
Day 9 : Data Visualization
Data Visualization is an incredibly important step as it helps to understand how the data is distributed wrt time, lets you visualize your hypothesis about the data, conveys important information through different charts to let leaders take important business decisions, lets you examine the missing values/outliers in the data.
Start here —
Project — Kaggle’s annual Machine Learning and Data Science Survey ( Part 1 )
In this post we implemented a project and covered some of the most important concepts — data cleaning, preprocessing, EDA etc through a project.
This data ( Kaggle’s annual Machine Learning and Data Science Survey) has 42+ questions and 25,973 responses and for this post we will cover how to approach a problem and a very elementary view covering how to analyze your data.
Where to find Day 9 post :
Day 10: Project — Detailed Crypto Analysis
In this post we covered detailed Crypto Analysis to build a basic intuition and part 2 covers how we can build a model to predict the prices..
Where to find Day 10 post :
Day 11: Project — Detailed Analysis of the Netflix Content.
In this post we covered detailed Analysis of the Netflix Content.
Where to find Day 11 post :
Day 12 : Data visualization and Clustering — Part 1
In this post we covered Data visualization and Clustering — Part 1 in detail.
Where to find Day 12 post :
Day 13: Data visualization and Clustering — Part 2
In this post we covered Data visualization and Clustering — Part 2 in detail.
Where to find Day 13 post :
Day 14: Data visualization and Clustering — Part 3
In this post we covered Data visualization and Clustering — Part 3 in detail.
Where to find Day 14 post :
Day 15 : How To Choose Right Data Visualization Charts For Your Data?
Where to find Day 15 post :
Day 16 : The Top 5 Datasets Released by Google
Data Collection is the first step to building great Machine Learning models. One of the factors in determining how accurate your model can be is dictated by the quantity & quality of your data. While it can be a fun exercise to sift through dozens of data sets to find the perfect one, but sometimes it can also be frustrating.
Where to find Day 16 post :
Day 17 : Data Science Tips and Techniques —
23 Data Science Techniques You Should Know!
Efficient Code and Optimization techniques for Python
11 Jupyter Notebook Techniques You Should Know
Best Resources for Data Science and Machine Learning (full list)
For Python Projects —
For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML
Follow for more updates. Stay tuned and keep coding! Some of the links are affiliates.
For other projects, tune to —
Build Machine Learning Pipelines( With Code)
Recurrent Neural Network with Keras
Clustering Geolocation Data in Python using DBSCAN and K-Means
Facial Expression Recognition using Keras
Hyperparameter Tuning with Keras Tuner
Custom Layers in Keras





