avatarCassius

Summary

The website outlines a self-guided, online curriculum to create a Data Science degree equivalent, emphasizing the importance of programming, statistics, and advanced electives.

Abstract

The provided content details a comprehensive approach to self-education in Data Science, proposing a three-semester program structure. The first semester focuses on programming skills, including SQL, Python, and R. The second semester delves into statistical knowledge, covering foundational statistics and its application using Python and R. The final semester introduces advanced topics such as data mining, machine learning, big data, natural language processing, and cloud computing. The curriculum is designed to be accessible through online platforms like edX, Coursera, and Udemy, and it stresses the necessity of practical application and continuous learning.

Opinions

  • The author believes that traditional degrees in data science can be costly and that online education platforms offer a viable and cost-effective alternative.
  • Practical experience is deemed crucial, with the author emphasizing the need for perseverance and consistent practice in programming and statistics.
  • A solid understanding of the field one is applying data science to is considered essential to avoid "garbage in, garbage out" scenarios.
  • The author suggests that a good data scientist should be able to critically analyze and understand the statistical methods they use, rather than just running data through algorithms blindly.
  • Interdisciplinary subjects, referred to as "advanced electives," are presented as important for delving deeper into data science applications after mastering core programming and statistical skills.
  • The author provides specific course recommendations from universities like UC Davis, Michigan, and Johns Hopkins, available on Coursera and edX, indicating a preference for these resources.
  • Self-motivation and effective online learning strategies are implied to be key to success in a self-taught data science program.

Create your own Data Science degree online for free

Learn the skills needed for a career change!

Photo by Markus Spiske on Unsplash

For those of you who want to skill up in data science, fortunately you can do this from home.

Degrees can be costly and expensive. However, you can master the necessary skills online without the costs and hassle of a traditional degree. Internet education platforms like edX, Coursera and Udemy has brought a wealth of knowledge at your finger tips at very little costs. However, there is a catch — you do need to persevere and practice, practice, practice…

To be a good data scientist you need a solid understanding in 3 key areas:

  1. A solid base in programming: including a database language (SQL) and a scripting language (Python)
  2. A solid base in statistics: including linear modelling, Bayesian methods and applied machine learning
  3. A sound understanding of the field you’re applying your “data science” to. This means having a good grasp of the underlying dynamics on what drives the variables in the field that you are employed, rather than churning through data in and out… garbage in, garbage out. In other words, have respect for you data.

After a good understanding of programming and statistics, you can then delve into interdisciplinary subjects. I call these “advanced electives”. Some of them are listed here:

  • Information retrieval and web search
  • Big data management
  • Cloud computing
  • Natural language processing
  • Neural networks and deep learning

Okay.

So I’m going to divide our “little data science degree” into three “semesters”.

  1. Semester One: Programming
  2. Semester Two: Statistics
  3. Semester Three: Advanced Electives

You can mix and mash subjects in semester one and two, but save the advanced electives for after you have mastered programming and statistics.

Semester One: Programming

You need working fluency in at least one database language and two scripting language. Here, I’m recommending SQL, Python and R. Python is a general purpose language that will do you well in your data science career. R is more geared to statistical analysis. There are more than 4,000 statistical packages available in R, which means whatever analysis you want to do … there’s something out there that allows you to implement speedily.

1A Learning SQL (with UC Davis)

Let’s start with SQL. You can’t be a data scientist without knowing how to run and use a simple relational database.

Cousera and UC Davis offers a specialization for SQL basics.

Furthermore, we see how SQL is being applied in big data with the course below.

If you’re keen, to contrast SQL, you could also look into studying NoSQL as an extension. The course below on edx provides a good introduction that compares and contrasts SQL with NoSQL.

1B Learning Python (with Michigan)

The University of Michigan has a 4 course specialization for Python. This is highly recommended. I’ve also attached a link to their web-book that you can go through in your own time.

1C Learning R (with Johns Hopkins)

Finally, we come to learning R. R is especially useful for statistics. Johns Hopkins University has paired up with Coursera, offering two specializations in R.

A basic overview for data scientists:

A more advanced course to become a power user — where you learn to build your own packages:

This ends our first semester, where hopefully we have gained some confidence in databases and scripting. Next we will explore statistics for a data scientist.

Semester Two: Statistics

A poor “data scientist” simply runs data through various black boxes and analyses results. To be a good data scientist, you need to understand the analysis you perform. This means a solid grasp of statistics

2A Introduction to Statistics (skip if needed)

For those who do not have a string background in statistics, the following offers a very gentle introduction to the topic.

2B Statistics with Python (with Michigan)

Armed with some basic stats and Python skills, we can now tackle this specialization where we learn about linear models, GLMs and logistic regressions.

2C Statistics with R (with Duke)

After you complete the “Statistics with Python” course, you can move on to this one. You should be able to ease into this one, for you already possess some working knowledge of statistical inference (from Statistics with Python) and R (from last semester). The Duke course repeats linear modelling we did with Michigan, but using R. Furthermore, we are introduced to Bayesian statistics.

Now that we have developed a solid base in programming and statistics; and have also used statistics in both R and Python…we are ready to tackle some more advanced topics in data science.

Semester Three: Advanced Topics

Pick two to three of the below, depending on your interest.

Data Mining and Web Search

UIUC offers a great data mining specialization course with Coursera, which covers information retrieval, text mining and web search.

Machine Learning

A reasonably gentle introduction to Machine Learning

Big Data

Natural Language Processing

(this one comes highly recommended)

Cloud Computing

And there you have it — a three semester self-taught data science program!

If you want to see my top 5 tips for effective online learning, or my review of UIUC’s online master degree, see below:

Online Learning
Data Science
Computer Science
Coursera
Education
Recommended from ReadMedium