How to Get Started on Kaggle in 2022 (Even If You Are Terrified)
Your road to GrandMaster

Wow, this place is fantastic. Look at all these Kagglers. So much to learn. The competitions sound like immense fun, and even Google seems to host them here. I swear I am gonna crush it.
Let me check out some of these gold notebooks. Whoa, it is written by a grandmaster. THAT sounds luxurious. The guy seems to know so damn much. I've never seen a data scientist so cool. Let's see his profile — yep, he is a rocket scientist.
What am I thinking? I can't compete with that. I should probably go before I embarrass myself. Yeah, that's right. I will study up for a couple more years on Coursera and come back and show 'em.
…
That pretty much sums up how I felt when I first set my keyboard on Kaggle, and I am sure that's the case for so many others. Feeling that dreadful sinking sensation in your stomach when you realize you are not just good enough to play with the big guys—wondering if you will ever be like them.
I certainly wouldn't have achieved a Master's rank if I kept sitting on my back, wondering miserably. I just had to take the first step and figure out the rest down the road.
This first step is the most important one for starters. Get it wrong, and you are back to that impostor syndrome that tells you that there is no way in hell you will be able to compete with these pros.
Today, I will show you exactly how you should take this initial step in stages that will set you up for success in no time on Kaggle.
Get the best and latest ML and AI papers chosen and summarized by a powerful AI — Alpha Signal:
Step 1: The tedious setup I must mention
Once you create your account and become a novice (the lowest rank on Kaggle), you are mere inches away from becoming an official Kaggler.
All you have to do is submit to a competition and earn your contributor badge. Almost 40% of all Kaggle users are contributors, and even a larger proportion is novices.
Sure, if you are a beginner, there is no shame in taking a few of Kaggle's official free courses to get your skills up to scratch. I recommend taking the Intro to ML course first as it will explain the basics of ML and hold your hands while you submit to a competition for the first time.
Step 2: Be part of the community first.
Many people go about joining Kaggle the wrong way. As everyone brands Kaggle as a competitions platform, most feel a mounting pressure to join them straight away to become a ‘Kaggler.’
Nothing could be further from the truth.
Feeling that you have to join a competition right away is a massive trap, and you will just end up miserable if you haven't developed the right skills yet because Kaggle is brutally competitive. Even the smallest challenges attract many skilled specialists.
So what is the solution?
Be part of the community first. Feel that you are a member of this group so that you eliminate your impostor syndrome and can say to yourself, "I am a Kaggler."
Well, how do you do that?
You start reading notebooks. A lot. By everyone. I suggest picking a competition you are interested in (see the following sections for suggestions) and ordering the list of notebooks by the number of upvotes.

You scroll down and open the notebooks with the lowest # of upvotes. Often, these are written by beginners and need just as much support and feedback as you.
Once you start communicating with the authors in the comments section (there is a very high chance they respond to every comment), you build a connection with them. They will probably follow you back once you comment on a few of their notebooks.
Now, in every notebook, there are a few to-dos.
First, leave at least one comment. It can be about anything — the way the code is written (clean, follows best practices), how the author explains his thought process, the quality of the data visualizations, or any imaginable aspect of the notebook.
You can also suggest improvements here and there. If the author knows more than you, say that too and tell them you learned something new.

If you don't know much about what the author is talking about, fork the notebook and start tinkering with every code cell. Read the docs and tutorials on the unknown functions until you understand what each line is doing.
Once you repeat this process for 10–20 notebooks, you will realize you learn much more than you ever will in a course in the same span of time.
And the best part is that now, you aren't just nobody — you know and follow many Kagglers, and they probably remember you as well if you had left good enough feedback and appreciation for their work.
Step 3: Understand how the Kaggle hotness system works and its progression system.
You have to look at this page on the Kaggle docs to understand how to reach each rank:

There are four categories, and each one requires different levels of effort and skill to achieve the grandmaster title. GMs in competitions and datasets are the hardest.
You might not be in Kaggle for ranks, which is perfectly fine. But if you want to build good connections and expand your network, earning a title is a surefire way to gain credibility.
The most respectable titles are masters and GMs in either competitions or notebooks (in my opinion, at least). Still, any GM title is always enough to earn respect and admiration from many.
Kaggle's hotness system is similar to LinkedIn's (but without the complications of second and third-degree connections).
If you produce content, it will often show up on your followers’ feed to match their preferences. If they interact with your work, this will similarly display on their followers' feeds. So, more followers mean wider reach.
The rank of notebooks in a comp also changes, not based on # of upvotes but how many people comment on it.
Step 4: Choose a competition
Now, you are ready. Everything has been leading up to this — choosing and entering a competition.
Even if you are a beginner, I DON'T recommend these Getting-started competitions:
Almost any beginner starts on Kaggle with these, so they are very saturated. Besides, there is a lot of plagiarism because people will just copy from top solutions from previous winners to move up the leaderboard.
Don't be surprised if you find notebooks with thousands of upvotes — these competitions have been there since the dawn of Kaggle, and people have had time to explore their datasets in every way imaginable.
Instead, start with the monthly Tabular Playground Series (TPS) competitions, as I did. They are a teensy bit harder than getting-started comps but are much more fun and have a larger capacity to teach you something new.
Because of their beginner-level nature, they attract very few grandmasters but are still hard to win (I have yet to do it).
The ultimate 4-week workflow to succeed in a competition
First, let's define "success in a competition."
For me, I am pretty happy if my score is within 1% of the first-place submission. For example, in September TPS, which I was very active in, I scored 0.81708 ROC AUC while the first place scored 0.81775. I took 332nd place even with a difference of 0.0067.
Most of the time, you can land in the reach of the top score even with simple models and a bit of hyperparameter tuning. But to reach the top, you have to do a lot of tedious work and experimentation.
That is what's required to win, at least in TPS competitions.
The featured and code competitions (with prize money) are different. They have runtime and resource limits. For example, some don't accept solutions that train for more than 8 hours to keep them practical in case the host wants to productionalize some of them in the future.
No matter the type of the comp, there are a few steps you can take to guarantee success. I'll outline them in the case of TPS competitions, but you can always extend the ideas depending on the duration of the competition.
So, to get good results in TPS or any Kaggle competition, here is how you should allocate your time:
Week 1: Exploratory Data Analysis (EDA). Its importance is undeniable, and it will be integral to how you come up with solutions later.
You should pay special attention to the features in the data and how you can normalize them (different distributions need different algorithms to do so). EDA also gives you ideas for possible feature engineering ideas, another essential part of your success.
In week 1, you should also develop a validation strategy. Set up a baseline score with a model (XGBoost is popular) and see if you can improve it using other methods.
Week 2: Model selection. Now, you try out different models with default hyperparameters to see which one has the highest score over your baseline. It can be tree-based like XGBoost or linear models like logistic or linear regression.
I also recommend going through some rarer models in Sklearn based on your intuition and the knowledge you got from EDA. You may find a surprise or two that could give you even higher scores than tree-based models.
You must judge these models based on their training and test scores through cross-validation (the keywords are linked to relevant tutorials).
Week 3: Feature engineering. Now, we are onto the secret sauce all the GMs use in their path to the top. Feature engineering is not something you learn in a course, and there is a lot that can be said about it, but in brief:
Feature engineering is about shaping and transforming datasets so that the models can learn as much information from them as possible.
FE is especially important in tabular and time series competitions.
Every time you change the data (add a new column, modify an existing one, etc.), you should run your best model(s) you found in week 2 to see if this change improves your score.
Week 4: Hyperparameter tuning: Now that you've added new, juicy features, it is time to squeeze every bit of performance from your potential models.
You will need a good tuning framework. I am especially fond of Optuna, and so are many other Kagglers.
Once you have your best model(s) tuned, you can generate predictions and pack them up for a submission. Or you can combine multiple models as an ensemble to boost your score even more.
Ensemble solutions are mostly useless in the real world as they are very costly, but you will see that Kaggle is full of them. Upwards the 50th percentile of the leaderboard, people often use ensembles. You can learn more about them here.
And finally, all these steps are iterative and can change based on your needs. You might go back to any one of them anytime and stretch or shorten the time you need to complete each step.
Step 5: Start publishing notebooks — the backbone of your success (optional)
If one thing is guaranteed to get your name out, it is publishing high-quality tutorial notebooks.
Even though there are many highly-skilled scientists who crush it in competitions, you will find that there aren't many who can explain their work to others in a crystal-clear way.
That's why I had such an easy time getting my Master title in notebooks:

I only had to create 22 notebooks to get 11 gold medals (4 short of GrandMaster), a much higher gold/notebook ratio than most. Before I published, I had been writing data science tutorials for quite a while, so I had much practice writing technical content.
In fact, there is nothing but laziness and other commitments that are stopping me from getting my first-ever GM title:

You will also find that the best and most popular GMs are the ones that win competitions and outline their solutions in a professional and yet understandable manner to the masses.
Writing notebooks forces you to write clean code and do research on the topics you want to explain to others:

The community gobbles down good notebooks — for any of my typical gold notebooks, I receive 20–40 "thank you" and feedback comments as opposed to barely a few on my articles on Medium, even though thousands usually read them.
Conclusion
Kaggle Grandmaster titles are definitely a prestige. They look flashy on your LinkedIn title and spice your resume right up.
But what is best about Kaggle is always its community. Not only is it the most prominent online data platform, but programmers from other industries have yet to form such an amazing community with so many skilled specialists. Regardless of your rank, being part of it expands your knowledge and network much more than courses and books will ever do.
