Careers, Data Science
5 Steps to Become a Data Scientist
Data Science is such a broad field that includes several subdivisions like data preparation and exploration; data representation and transformation; data visualization and presentation; predictive analytics; machine learning, etc. For beginners, learning the fundamentals of data science can be a very daunting task especially if you don’t have proper guidance as to the necessary training required, or what courses to take, and in what order. Before discussing the steps necessary to become a data scientist, let’s discuss the skills that every data scientist should have in his skills set toolbox.
Essential Skills That Every Data Scientist Should Have in His Skills Set Toolbox
The top 5 technology skills mentioned in most data science job listings (The Most in Demand Skills for Data Scientists — Towards Data Science) are:
- Python
- R
- SQL
- Hadoop
- Spark
Becoming a data scientist also requires general skills in the following:
- Mathematical Analysis and Linear Algebra
- Machine Learning
- Statistics and Probability
- Computer Science
- Communication
- Data Wrangling/Preparation, Data Presentation/Visualization
I started learning data science about a year ago. It was quite challenging from the beginning, but let me share with you the approach that worked for me. I will discuss five important steps that helped me all throughout my journey as a data scientist.
Steps for Becoming a Data Scientist
Step 1: Do not be in a rush
If you have not read this article: “Teach Yourself Programming in Ten Years” by Peter Norvig (Director of Machine Learning at Google), I encourage you to do so. Here is a link to the article: http://norvig.com/21-days.html
The point here is that you don’t need ten years to learn the basics of data science, but learning data science in a rush is certainly not helpful. It takes time, effort, energy, patience and commitment to become a data scientist.
Step 2: Take Courses from DataCamp, Coursera, EdX, or other Platforms
DataCamp (https://www.datacamp.com/courses) is certainly a good website where you can learn lots of different skills from basic programming concepts to advance skills such as data science and machine learning. However, I think DataCamp uses an approach that is in a rush, and therefore too superficial. DataCamp courses are crash courses, with little or no level of depth. Most of the assessment questions are quite easy and non-challenging. If you are interested in the academic approach of learning data science, I would recommend the following courses that will give you a very solid foundation in data science (the academic approach requires an enormous amount of time commitment and dedication, but it is worthwhile):
(i) Professional Certificate in Data Science (HarvardX, through edX):https://www.edx.org/professional...
Includes the following courses, all taught using R (you can audit courses for free or purchase a verified certificate):
- Data Science: R Basics;
- Data Science: Visualization;
- Data Science: Probability;
- Data Science: Inference and Modeling;
- Data Science: Productivity Tools;
- Data Science: Wrangling;
- Data Science: Linear Regression;
- Data Science: Machine Learning;
- Data Science: Capstone
(ii) Analytics: Essential Tools and Methods (Georgia TechX, through edX): https://www.edx.org/micromasters...
Includes the following courses, all taught using R, Python, and SQL (you can audit for free or purchase a verified certificate):
- Introduction to Analytics Modeling;
- Introduction to Computing for Data Analysis;
- Data Analytics for Business.
(iii) Applied Data Science with Python Specialization (the University of Michigan, through Coursera): https://www.coursera.org/special...
Includes the following courses, all taught using python (you can audit most courses for free, some require the purchase of a verified certificate):
- Introduction to Data Science in Python;
- Applied Plotting, Charting & Data Representation in Python;
- Applied Machine Learning in Python;
- Applied Text Mining in Python;
- Applied Social Network Analysis in Python.
Step 3: Learning from a Textbook
Learning from a textbook provides a more refined and in-depth knowledge beyond what you get from online courses. This book provides a great introduction to data science and machine learning, with code included: “Python Machine Learning”, by Sebastian Raschka. The author explains fundamental concepts in machine learning in a way that is very easy to follow. Also, the code is included, so you can actually use the code provided to practice and build your own models. I have personally found this book to be very useful in my journey as a data scientist. I would recommend this book to any data science aspirant. All that you need is basic linear algebra and programming skills to be able to understand the book. There are also lots of other excellent data science textbooks out there such as “Python for Data Analysis” by Wes McKinney, “Applied Predictive Modeling” by Kuhn & Johnson, “Data Mining: Practical Machine Learning Tools and Techniques” by Ian H. Witten, Eibe Frank & Mark A. Hall, and so on.
Step 4: Network with other Data Science Aspirants
From my personal experience, I have learnt a lot from weekly group conversations on various topics in data science and machine learning by teaming up with other data science aspirants. Network with other data science aspirants, share your code on GitHub, showcase your skills on LinkedIn, this will really help you to learn a lot of new concepts and tools within a short period of time. You also get exposed to new ways of doing things, as well as to new algorithms and technologies.
Step 5: Apply Knowledge to Real Data Science Problems
Keep in mind that online courses alone will not make you a data scientist. After establishing a strong foundation in data science, you may seek an internship or participate in Kaggle competitions where you get to work on real data science projects.
Remember that you may be very good at handling data as well as building good machine learning algorithms, but as a data scientist, the real world application is all that matters. Every predictive model must produce meaningful and interpretable results of real-life situations. A predictive model must be validated against reality in order to be considered meaningful and useful. Human input and experience are therefore always necessary and beneficial for making sense out of results produced by algorithms.
In summary, we have discussed the five important steps to becoming a data scientist. The journey to becoming a data scientist might be different for different individuals based on their backgrounds, but the steps mentioned above is the approach that worked for me.
Thanks for reading!