Complete Data Preprocessing and Data Visualization with Projects — Mega Compilation Part 2

Connect the dots…

Hi All. This post covers the two most important steps of data science — Data Manipulation, Cleaning, Augmentation, Preprocessing and Data Visualization.

Let’s get started —

Real world data is messy ( it contains null values, noises, missing values etc) and in most of the cases when we start the building the ML model we need to clean, format and pre process the data.

Some of the other best Series —

30 Days of Natural Language Processing ( NLP) Series

How to solve any System Design Question ( approach that you can take)?

Complete System Design Case Studies Series

30 days of Data Engineering with projects Series

60 days of Data Science and ML Series with projects

100 days : Your Data Science and Machine Learning Degree Series with projects

23 Data Science Techniques You Should Know

Tech Interview Series — Curated List of coding questions

Complete System Design with most popular Questions Series

Complete Data Visualization and Pre-processing Series with projects

Complete Python Series with Projects

Complete Advanced Python Series with Projects

Kaggle Best Notebooks that will teach you the most

Complete Developers Guide to Git

All the Data Science and Machine Learning Resources

210 Machine Learning Projects

30 days of Machine Learning Ops

Tech Newsletter —

If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :

Ignito

Data Science, ML, AI and more… Click to read Ignito, by Naina Chaturvedi, a Substack publication. Launched 7 months…

naina0405.substack.com

Data preprocessing , one of the first and crucial step — the process in which we prepare the raw data and make it suitable for a ML model to increase its accuracy and efficiency.

Pre-requisites for data pre-processing —

Day 0 : Python

Everything you need to know in Python ( with some cool projects) is covered in the post below :

Where to find Day 0 post :

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

medium.com

Day 1 : Hands on Pandas — part 1

In this post we covered Pandas part 1 in depth with Code Implementation. Pandas is an open source Python package written for the Python programming language for data manipulation, analysis and ML tasks.

Where to find Day 1 post :

Day 9–60 days of Data Science and Machine Learning

Hands on Pandas part 1 in depth…

medium.datadriveninvestor.com

Highly Recommended Data Science and Machine Learning Courses that you MUST take ( with certificate) —

Complete Data Scientist

Complete Data Analyst

Complete Data Engineering

Complete Machine Learning Engineer

Complete Deep Learning

Complete Natural Language Processing

Complete Self Driving Car Engineer

Find best data science and data engineering courses here

Find best Machine Learning and Deep Learning courses here

Day 2: Hands on Pandas — part 2

In this post we covered Pandas part 2 in depth with Code Implementation. Topics like indexing, filtering, transformation, Merging, Hierarchical Indexing etc are covered.

Where to find Day 2 post :

Day 10–60 days of Data Science and Machine Learning

Hands on Pandas part 2 in depth…

medium.datadriveninvestor.com

Day 3 : Advanced Pandas Techniques for Data Scientists

In this post we covered how to bin, split the data, mean and interpolation method etc.

Where to find Day 3 post :

5 Cool Advanced Pandas Techniques for Data Scientists

Use these techniques …

medium.datadriveninvestor.com

Day 4: Numpy with Code Implementation

In this post we covered Numpy part 1 with focus on Flattening the arrays, Concatenation and Broadcasting etc in detail. Numpy is a python library for scientific computing — to work with multidimensional array objects and used to handle large amount of data. An array which is a grid of values and is indexed by a tuple of nonnegative integers is main data structure of the Numpy library.

Where to find Day 4 post :

Day 11–60 days of Data Science and Machine Learning

Hands on Numpy part 1 in depth…

medium.datadriveninvestor.com

Day 5: One Simple Trick to Scrape Tabular Data using Python

Data scraping is the process of importing information from a website into a spreadsheet or local file on your system and it’s one of the most efficient ways to get data from the web. Many of you must be familiar with the Cheerio library or Python with Beautiful Soup to scrape the data. In this article, I’m going to teach one simple trick to scrape tabular data using Python and Pandas with just four lines of code.

Where to find Day 5 post :

One Simple Trick to Scrape Tabular Data using Python

With just 4 lines of code…

ai.plainenglish.io

Day 6 : Hands On Data preprocessing — Part 1

In this post we learned/implemented Hands on Data Pre-processing in depth — Part 1. Data preprocessing , one of the first and crucial step — the process in which we prepare the raw data and make it suitable for a ML model to increase its accuracy and efficiency.

Where to find Day 6 post :

Day 12–60 days of Data Science and Machine Learning

Hands on Data Pre-processing in depth — Part 1

medium.datadriveninvestor.com

Day 7 : Hands on Data Preprocessing in depth — Part 2

In this post we learned/implemented Hands on Data Pre-processing in depth — Part 2. Topics like Data Cleaning, Data Augmentation, Transformation, Channel Shift etc are covered in detail.

Where to find Day 7 post :

Day 13–60 days of Data Science and Machine Learning

Hands on Data Preprocessing in depth — Part 2…

medium.datadriveninvestor.com

Day 8 : Read And Process Large Datasets Within Seconds

Handle billion of rows in seconds.

Where to find Day 8 post :

Read And Process Large Datasets Within Seconds — Part 1

Handle billion of rows in seconds…

medium.datadriveninvestor.com

Day 9 : Data Visualization

Data Visualization is an incredibly important step as it helps to understand how the data is distributed wrt time, lets you visualize your hypothesis about the data, conveys important information through different charts to let leaders take important business decisions, lets you examine the missing values/outliers in the data.

Start here —

Project — Kaggle’s annual Machine Learning and Data Science Survey ( Part 1 )

In this post we implemented a project and covered some of the most important concepts — data cleaning, preprocessing, EDA etc through a project.

This data ( Kaggle’s annual Machine Learning and Data Science Survey) has 42+ questions and 25,973 responses and for this post we will cover how to approach a problem and a very elementary view covering how to analyze your data.

Where to find Day 9 post :

Day 17–60 days of Data Science and Machine Learning

Project implementation : Part 1 ..

medium.datadriveninvestor.com

Day 19 — 60 days of Data Science and Machine Learning

Project Implementation ( Part 2)…

medium.datadriveninvestor.com

Day 10: Project — Detailed Crypto Analysis

In this post we covered detailed Crypto Analysis to build a basic intuition and part 2 covers how we can build a model to predict the prices..

Where to find Day 10 post :

Day 20–60 days of Data Science and Machine Learning

Project Implementation : Crypto Analysis

medium.com

Day 11: Project — Detailed Analysis of the Netflix Content.

In this post we covered detailed Analysis of the Netflix Content.

Where to find Day 11 post :

Day 21 : 60 days of Data Science and Machine Learning Series

Project Implementation…

medium.datadriveninvestor.com

Day 12 : Data visualization and Clustering — Part 1

In this post we covered Data visualization and Clustering — Part 1 in detail.

Where to find Day 12 post :

Day 28 : 60 days of Data Science and Machine Learning Series

ML Clustering Project 2 ( Part 1)..

medium.com

Day 13: Data visualization and Clustering — Part 2

In this post we covered Data visualization and Clustering — Part 2 in detail.

Where to find Day 13 post :

Day 29 : 60 days of Data Science and Machine Learning Series

ML clustering Project 2 ( Part 2)..

medium.com

Day 14: Data visualization and Clustering — Part 3

In this post we covered Data visualization and Clustering — Part 3 in detail.

Where to find Day 14 post :

Day 30: 60 days of Data Science and Machine Learning Series

ML clustering Project 2 ( part 3)..

medium.com

Day 15 : How To Choose Right Data Visualization Charts For Your Data?

Where to find Day 15 post :

How To Choose Right Data Visualization Charts For Your Data?

A crash course on practical Data Visualization …( Part 1)

medium.com

Day 16 : The Top 5 Datasets Released by Google

Data Collection is the first step to building great Machine Learning models. One of the factors in determining how accurate your model can be is dictated by the quantity & quality of your data. While it can be a fun exercise to sift through dozens of data sets to find the perfect one, but sometimes it can also be frustrating.

Where to find Day 16 post :

The Top 5 Datasets Released by Google

Gold Standards…

medium.datadriveninvestor.com

Day 17 : Data Science Tips and Techniques —

23 Data Science Techniques You Should Know!

23 Data Science Techniques You Should Know!

Save your precious time by using these hacks

ai.plainenglish.io

Efficient Code and Optimization techniques for Python

Efficient Code and Optimization techniques for Python

With Implementation…

medium.datadriveninvestor.com

11 Jupyter Notebook Techniques You Should Know

11 Jupyter Notebook Techniques You Should Know

Become a Jupyter pro

ai.plainenglish.io

Best Resources for Data Science and Machine Learning (full list)

Best Resources for Data Science and Machine Learning (full list)

Become a rockstar Data Science or ML engineer …

medium.datadriveninvestor.com

For Python Projects —

Complete Python And Projects — Mega Compilation

Everything that you need to know in Python with Projects…

medium.com

Analyzing Video using Python, OpenCV and NumPy

With Code Implementation…

medium.datadriveninvestor.com

For complete 60 days of Data Science and ML : Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Day 1 — Day 60 : Quick Recap of 60 days of Data Science and ML

Connect the ML dots…

medium.com

Follow for more updates. Stay tuned and keep coding! Some of the links are affiliates.

For other projects, tune to —

Build Machine Learning Pipelines( With Code)

Build Machine Learning Pipelines( With Code) — Part 1

Complete implementation…

medium.datadriveninvestor.com

Recurrent Neural Network with Keras

Recurrent Neural Network with Keras

Project Implementation and cheatsheet…

medium.datadriveninvestor.com

Clustering Geolocation Data in Python using DBSCAN and K-Means

Clustering Geolocation Data in Python using DBSCAN and K-Means

Project Implementation…

medium.datadriveninvestor.com

Facial Expression Recognition using Keras

Facial Expression Recognition using Keras

Project Implementation…

medium.datadriveninvestor.com

Hyperparameter Tuning with Keras Tuner

Hyperparameter Tuning with Keras Tuner

Project Implementation….

medium.datadriveninvestor.com

Custom Layers in Keras

Custom Layers in Keras

Code implementation …

medium.datadriveninvestor.com