My list of Kaggle Best Notebooks — Topic wise ( Data Science and Machine Learning) — Part 1
Part 1 — Notebooks from which you will learn the most…
Welcome back peeps! Today with this I’m gonna open Kaggles’ pandora’s box — MY list of Kaggle Best Notebooks — each topic wise for Data Science and Machine Leaning — Part 1.
I have been participating in the Kaggle competitions for past 4.5 years during my free time and it’s been an incredible learning curve. As much as I loved writing my own solution to the problems on the platform, I thoroughly went through some of the top notebooks only to find the gems hidden beneath. Thanks to the amazing community of Kaggle ( especially the star notebooks) — I have learned so much and implemented those learnings at my job.
Some of the other best Series —
30 days of Data Structures and Algorithms and System Design Simplified
100 days : Your Data Science and Machine Learning Degree Series with projects
Complete Data Visualization and Pre-processing Series with projects
Projects Videos —
All the projects, data structures, SQL, algorithms, system design, Data Science and ML , Data Analytics, Data Engineering, , Implemented Data Science and ML projects, Implemented Data Engineering Projects, Implemented Deep Learning Projects, Implemented Machine Learning Ops Projects, Implemented Time Series Analysis and Forecasting Projects, Implemented Applied Machine Learning Projects, Implemented Tensorflow and Keras Projects, Implemented PyTorch Projects, Implemented Scikit Learn Projects, Implemented Big Data Projects, Implemented Cloud Machine Learning Projects, Implemented Neural Networks Projects, Implemented OpenCV Projects,Complete ML Research Papers Summarized, Implemented Data Analytics projects, Implemented Data Visualization Projects, Implemented Data Mining Projects, Implemented Natural Leaning Processing Projects, MLOps and Deep Learning, Applied Machine Learning with Projects Series, PyTorch with Projects Series, Tensorflow and Keras with Projects Series, Scikit Learn Series with Projects, Time Series Analysis and Forecasting with Projects Series, ML System Design Case Studies Series videos will be published on our youtube channel ( just launched).
Subscribe today!
Tech Newsletter —
If you are interested, you can join my newsletter through which I send tech interview tips, techniques, patterns, hacks — Software Development, ML, Data Science, Startups and Technology projects to more than 30K readers. You can subscribe to Tech Brew :
In this post, I’ll share with you the best notebooks on Kaggle( according to me) from which you can learn the most and exponentially speed up your learning curve in data science and ML field.
Part 2 of this series — My list of Kaggle Best Notebooks — Topic wise ( Data Science and Machine Learning) — Part 2
Before you start if you are interested in Software Development, ML, Data Science, Startups and Technology then you can subscribe to Tech Brew :
Disclaimer : This is my list that I’m just sharing so that people who are getting started in the field of Data Science and ML don’t fall in the rabbit hole with overwhelming information out there. Remember learning is a three step process — one what do you want to learn, second from where you want to learn and third implement what you learned.
Lets’s dive in!
Web Scraping
- https://www.kaggle.com/code/daniboy370/tutorial-web-scraping
- https://www.kaggle.com/code/dierickx3/kaggle-web-scraping-via-headless-firefox-selenium
- https://www.kaggle.com/code/digvijaysinhgohil/web-scraping-using-python
60 days Project based Data Science and ML ( with implemented projects): Mega Compilation —
Python
Ensembling in Python
Pandas
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-1-exploratory-data-analysis-with-pandas
- Star Notebook : https://www.kaggle.com/code/prashant111/comprehensive-data-analysis-with-pandas
- https://www.kaggle.com/code/sohier/tutorial-accessing-data-with-pandas
- https://www.kaggle.com/code/kashnitsky/a1-demo-pandas-and-uci-adult-dataset
- https://www.kaggle.com/code/ash316/learn-pandas-with-pokemons
- https://www.kaggle.com/code/frtgnn/simple-profiling-eda-using-pandas-profiling
- https://www.kaggle.com/code/corazzon/how-to-use-pandas-filter-in-survey-eda
- https://www.kaggle.com/code/shivan118/pandas-100-tricks
Data Exploration
- Star Notebook : https://www.kaggle.com/code/sudalairajkumar/simple-exploration-notebook-zillow-prize
- https://www.kaggle.com/code/pmarcelino/comprehensive-data-exploration-with-python/notebook
Part 1 and 2 ( Day 1- 71 ) of Data Science and ML series can be found here —
Data pre-processing
- Star notebook : https://www.kaggle.com/code/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy/notebook
- Star Notebook : https://www.kaggle.com/code/nkitgupta/advance-data-preprocessing
- Star Notebook : https://www.kaggle.com/code/agrawaladitya/step-by-step-data-preprocessing-eda
- https://www.kaggle.com/code/gzuidhof/full-preprocessing-tutorial
- https://www.kaggle.com/code/sudalairajkumar/getting-started-with-text-preprocessing
- https://www.kaggle.com/code/nz0722/simple-eda-text-preprocessing-jigsaw
- https://www.kaggle.com/code/smasar/tutorial-preprocessing-processing-evaluation
- https://www.kaggle.com/code/vikassingh1996/extensive-data-preprocessing-and-modeling
Data Pre-processing and Data Visualization : Mega Compilation
Text Preprocessing
- Star Notebook : https://www.kaggle.com/code/sudalairajkumar/getting-started-with-text-preprocessing
- https://www.kaggle.com/code/shashanksai/text-preprocessing-using-python
- https://www.kaggle.com/code/theoviel/improve-your-score-with-some-text-preprocessing
- https://www.kaggle.com/code/l3nnys/useful-text-preprocessing-on-the-datasets
- https://www.kaggle.com/code/balatmak/text-preprocessing-steps-and-universal-pipeline
- https://www.kaggle.com/code/srinivasav22/text-preprocessing-and-advanced-functions
- https://www.kaggle.com/code/awadhi123/text-preprocessing-using-nltk
Data Visualizations
- https://www.kaggle.com/code/andresionek/how-to-create-award-winning-data-visualizations/notebook
- https://www.kaggle.com/code/willcanniford/chocolate-bar-ratings-extensive-eda/report
- https://www.kaggle.com/code/ash316/eda-to-prediction-dietanic
- https://www.kaggle.com/code/deffro/eda-is-fun
- https://www.kaggle.com/code/gpreda/santander-eda-and-prediction
Interactive Visualizations
- Star Notebook : https://www.kaggle.com/code/tavoosi/tutorial-interactive-data-visualizations
- Star Notebook : https://www.kaggle.com/code/maheshdadhich/strength-of-visualization-python-visuals-tutorial
- https://www.kaggle.com/code/erikbruin/airbnb-the-amsterdam-story-with-interactive-maps
- https://www.kaggle.com/code/subinium/kaggle-2020-visualization-analysis
- https://www.kaggle.com/code/pranav84/kiva-loans-eda-part-1-interactive-visualizations/report
Complete Pandas and techniques : Mega Compilation
How to deal with Imbalanced Datasets
- https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/notebook
- https://www.kaggle.com/code/rafjaa/resampling-strategies-for-imbalanced-datasets
- https://www.kaggle.com/code/souravsaha1605/comprehensive-guide-on-imbalanced-data-handling
- https://www.kaggle.com/code/shahules/tackling-class-imbalance
- https://www.kaggle.com/code/suyashlakhani/credit-card-fraud-handling-imbalanced-dataset-98
Tabular Data
- Star Notebook : https://www.kaggle.com/code/vbmokin/data-science-for-tabular-data-advanced-techniques
- https://www.kaggle.com/code/vbmokin/data-science-for-tabular-data-advanced-techniques
- https://www.kaggle.com/code/manabendrarout/tabular-data-preparation-basic-eda-and-baseline
- https://www.kaggle.com/code/vbmokin/50-tips-data-science-tabular-data-for-beginner
- https://www.kaggle.com/code/vbmokin/50-advanced-tips-data-science-for-tabular-data
- https://www.kaggle.com/code/parulpandey/explainable-boosting-machines-for-tabular-data
Mathematical & Statistical Skills
- Star Notebook : https://www.kaggle.com/code/carlolepelaars/statistics-tutorial
- Star Notebooks : https://www.kaggle.com/code/kanncaa1/statistical-learning-tutorial-for-beginners
- https://www.kaggle.com/code/upadorprofzs/statistical-analysis-descriptive-statistics-br
- https://www.kaggle.com/code/yashvi/practical-statistics-1-descriptive-statistics
Feature Engineering
- Star Notebook : https://www.kaggle.com/code/codename007/home-credit-complete-eda-feature-importance
- Star Notebook : https://www.kaggle.com/code/artgor/eda-feature-engineering-and-everything
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-6-feature-engineering-and-feature-selection/notebook
- https://www.kaggle.com/code/dlarionov/feature-engineering-xgboost
- https://www.kaggle.com/code/eikedehling/feature-engineering/notebook
- https://www.kaggle.com/code/gunesevitan/titanic-advanced-feature-engineering-tutorial
- https://www.kaggle.com/code/willkoehrsen/introduction-to-manual-feature-engineering
- https://www.kaggle.com/code/willkoehrsen/automated-feature-engineering-basics
- https://www.kaggle.com/code/rejasupotaro/effective-feature-engineering
Modelling
- Start notebook :https://www.kaggle.com/code/odins0n/spaceship-titanic-eda-27-different-models
- Star Notebook : https://www.kaggle.com/code/dansbecker/how-models-work
- https://www.kaggle.com/code/kanncaa1/feature-selection-and-data-visualization
- https://www.kaggle.com/code/artgor/eda-and-models
Complete Python with Projects : Mega Compilation
Model Performance
Hyper Parameter Tuning
- Star Notebook : https://www.kaggle.com/code/willkoehrsen/intro-to-model-tuning-grid-and-random-search
- https://www.kaggle.com/code/ldfreeman3/a-data-science-framework-to-achieve-99-accuracy
- https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning
XGBoost & LightGBM & Catboost
- https://www.kaggle.com/code/kaanboke/xgboost-lightgbm-catboost-imbalanced-data
- https://www.kaggle.com/code/dansbecker/xgboost
- https://www.kaggle.com/code/eliotbarr/stacking-test-sklearn-xgboost-catboost-lightgbm
Sklearn and ML Pipeline
- Star Notebook : https://www.kaggle.com/code/kanncaa1/machine-learning-tutorial-for-beginners
- https://www.kaggle.com/code/armandsauzay/sklearn-pipelines-made-easy
- https://www.kaggle.com/code/ialimustufa/titanic-beginner-s-guide-with-sklearn
- https://www.kaggle.com/code/neviadomski/how-to-get-to-top-25-with-simple-model-sklearn
- https://www.kaggle.com/code/baghern/a-deep-dive-into-sklearn-pipelines
- https://www.kaggle.com/code/sermakarevich/sklearn-pipelines-tutorial
- https://www.kaggle.com/code/residentmario/automated-feature-selection-with-sklearn
- https://www.kaggle.com/code/qitvision/a-complete-ml-pipeline-fast-ai
- https://www.kaggle.com/code/poonaml/titanic-survival-prediction-end-to-end-ml-pipeline
- https://www.kaggle.com/code/huanvo/lyft-complete-train-and-prediction-pipeline
- https://www.kaggle.com/code/pouryaayria/a-complete-ml-pipeline-tutorial-acu-86
- Star Notebook : https://www.kaggle.com/code/dansbecker/pipelines
Tech Interview — Mega Compilation
Naive Bayes
- Star Notebook : https://www.kaggle.com/code/prashant111/naive-bayes-classifier-in-python
- https://www.kaggle.com/code/blackblitz/gaussian-naive-bayes
- https://www.kaggle.com/code/julian3833/jigsaw-incredibly-simple-naive-bayes-0-768
- https://www.kaggle.com/code/startupsci/titanic-data-science-solutions
- https://www.kaggle.com/code/akshaysharma001/naive-bayes-with-hyperpameter-tuning
Binary Classification
- Star Notebook : https://www.kaggle.com/code/rnmehta5/pima-indian-diabetes-binary-classification
- https://www.kaggle.com/code/tanetboss/beginner-binary-classification-for-nice-movie
- https://www.kaggle.com/code/jashsheth5/binary-classification-with-sklearn-and-keras-95
Linear Regression
Logistic Regression
- https://www.kaggle.com/code/kanncaa1/logistic-regression-implementation
- https://www.kaggle.com/code/faressayah/logistic-regression-data-preprocessing
Most Popular System Design Questions — Mega Compilation
Decision Trees
- https://www.kaggle.com/code/kashnitsky/topic-3-decision-trees-and-knn
- https://www.kaggle.com/code/kashnitsky/a3-demo-decision-trees-solution
- https://www.kaggle.com/code/faressayah/decision-trees-random-forest-for-beginners
- https://www.kaggle.com/code/gauravduttakiit/hyperparameter-tuning-in-decision-trees
- https://www.kaggle.com/code/prashant111/decision-tree-classifier-tutorial
Clustering
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-7-unsupervised-learning-pca-and-clustering/notebook
- Star Notebook : https://www.kaggle.com/code/fazilbtopal/popular-unsupervised-clustering-algorithms
- Star Notebook : https://www.kaggle.com/code/maksimeren/covid-19-literature-clustering
- https://www.kaggle.com/code/kushal1996/customer-segmentation-k-means-analysis
- https://www.kaggle.com/code/karnikakapoor/customer-segmentation-clustering
- https://www.kaggle.com/code/hellbuoy/online-retail-k-means-hierarchical-clustering
- https://www.kaggle.com/code/prashant111/k-means-clustering-with-python
- https://www.kaggle.com/code/sabanasimbutt/clustering-visualization-of-clusters-using-pca
Gradient Boosting
- https://www.kaggle.com/code/kashnitsky/topic-10-gradient-boosting/notebook
- https://www.kaggle.com/code/ambrosm/tpsmay22-gradient-boosting-quickstart
- https://www.kaggle.com/code/grroverpr/gradient-boosting-simplified
K-Nearest Neighbors
- Star Notebook : https://www.kaggle.com/code/kashnitsky/topic-3-decision-trees-and-knn/notebook
- https://www.kaggle.com/code/shrutimechlearn/step-by-step-diabetes-classification-knn-detailed
- https://www.kaggle.com/code/prashant111/knn-classifier-tutorial
- https://www.kaggle.com/code/cdeotte/mnist-perfect-100-using-knn
- https://www.kaggle.com/code/mgabrielkerr/visualizing-knn-svm-and-xgboost-on-iris-dataset
- Star Notebook : https://www.kaggle.com/code/shrutimechlearn/step-by-step-diabetes-classification-knn-detailed
Implemented Projects : Mega Compilation
Support Vector Machines
- Star Notebook : https://www.kaggle.com/code/nirajvermafcb/support-vector-machine-detail-analysis
- https://www.kaggle.com/code/faressayah/support-vector-machine-pca-tutorial-for-beginner
- https://www.kaggle.com/code/arshid/support-vector-machine-on-iris-flower-dataset
- https://www.kaggle.com/code/codeblogger/step-by-step-support-vector-machine-svm
Competitions Notebook :
Part 2 : Coming Soon!
30 days of Data Analytics Series —
Day 1 : Data Analytics basics and kickstart of Data analytics with projects series
Day 3 : Data Analytics Ecosystem — Data Life Cycle, Data Analysis complete process ( most important things)
Day 5 : Statistics
Day 6 : Basic and Advanced SQL
Day 8 : Pandas and Numpy
Day 9 : Data Manipulation
Day 10 : Data Visualization — Part 1
Day 11 : Project 1 : Data Visualization — Part 2
Day 12 : Data Visualization — Part 3
Day 13: Tableau — Part 1
Day 14: Tableau — Part 2
Day 15: Tableau — Part 3
Day 16 : Data Analysis Project 2
Day 17 : Data Analysis Project 3
Day 18: Data Analysis Project 4
Day 20 : Data Analysis Project 6
Day 21 : Data Analysis Project 7
Take Complete Hands On Tableau Course : Link
Quick Recap — Most Important Projects, Data Science, Machine Learning, Programming Tricks and Techniques
Writing Efficient and Optimized Python Code
Writing Efficient Python Code — Part 2
Use these hacks and techniques…
medium.datadriveninvestor.com
Big Query SQL and Linux
Happy learning and Kaggling :)
Follow for more updates, stay tuned and of-course let me end this post with a quote by Steve Jobs ;)
“Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe is great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle. As with all matters of the heart, you’ll know when you find it.”






