Create your own Data Science degree online for free
Learn the skills needed for a career change!
For those of you who want to skill up in data science, fortunately you can do this from home.
Degrees can be costly and expensive. However, you can master the necessary skills online without the costs and hassle of a traditional degree. Internet education platforms like edX, Coursera and Udemy has brought a wealth of knowledge at your finger tips at very little costs. However, there is a catch — you do need to persevere and practice, practice, practice…
To be a good data scientist you need a solid understanding in 3 key areas:
- A solid base in programming: including a database language (SQL) and a scripting language (Python)
- A solid base in statistics: including linear modelling, Bayesian methods and applied machine learning
- A sound understanding of the field you’re applying your “data science” to. This means having a good grasp of the underlying dynamics on what drives the variables in the field that you are employed, rather than churning through data in and out… garbage in, garbage out. In other words, have respect for you data.
After a good understanding of programming and statistics, you can then delve into interdisciplinary subjects. I call these “advanced electives”. Some of them are listed here:
- Information retrieval and web search
- Big data management
- Cloud computing
- Natural language processing
- Neural networks and deep learning
Okay.
So I’m going to divide our “little data science degree” into three “semesters”.
- Semester One: Programming
- Semester Two: Statistics
- Semester Three: Advanced Electives
You can mix and mash subjects in semester one and two, but save the advanced electives for after you have mastered programming and statistics.
Semester One: Programming
You need working fluency in at least one database language and two scripting language. Here, I’m recommending SQL, Python and R. Python is a general purpose language that will do you well in your data science career. R is more geared to statistical analysis. There are more than 4,000 statistical packages available in R, which means whatever analysis you want to do … there’s something out there that allows you to implement speedily.
1A Learning SQL (with UC Davis)
Let’s start with SQL. You can’t be a data scientist without knowing how to run and use a simple relational database.
Cousera and UC Davis offers a specialization for SQL basics.
Furthermore, we see how SQL is being applied in big data with the course below.
If you’re keen, to contrast SQL, you could also look into studying NoSQL as an extension. The course below on edx provides a good introduction that compares and contrasts SQL with NoSQL.
1B Learning Python (with Michigan)
The University of Michigan has a 4 course specialization for Python. This is highly recommended. I’ve also attached a link to their web-book that you can go through in your own time.
1C Learning R (with Johns Hopkins)
Finally, we come to learning R. R is especially useful for statistics. Johns Hopkins University has paired up with Coursera, offering two specializations in R.
A basic overview for data scientists:
A more advanced course to become a power user — where you learn to build your own packages:
This ends our first semester, where hopefully we have gained some confidence in databases and scripting. Next we will explore statistics for a data scientist.
Semester Two: Statistics
A poor “data scientist” simply runs data through various black boxes and analyses results. To be a good data scientist, you need to understand the analysis you perform. This means a solid grasp of statistics
2A Introduction to Statistics (skip if needed)
For those who do not have a string background in statistics, the following offers a very gentle introduction to the topic.
2B Statistics with Python (with Michigan)
Armed with some basic stats and Python skills, we can now tackle this specialization where we learn about linear models, GLMs and logistic regressions.
2C Statistics with R (with Duke)
After you complete the “Statistics with Python” course, you can move on to this one. You should be able to ease into this one, for you already possess some working knowledge of statistical inference (from Statistics with Python) and R (from last semester). The Duke course repeats linear modelling we did with Michigan, but using R. Furthermore, we are introduced to Bayesian statistics.
Now that we have developed a solid base in programming and statistics; and have also used statistics in both R and Python…we are ready to tackle some more advanced topics in data science.
Semester Three: Advanced Topics
Pick two to three of the below, depending on your interest.
Data Mining and Web Search
UIUC offers a great data mining specialization course with Coursera, which covers information retrieval, text mining and web search.
Machine Learning
A reasonably gentle introduction to Machine Learning
Big Data
Natural Language Processing
(this one comes highly recommended)
Cloud Computing
And there you have it — a three semester self-taught data science program!
If you want to see my top 5 tips for effective online learning, or my review of UIUC’s online master degree, see below: