3 Data Science Projects That Got Me 12 Interviews. And 1 That Got Me in Trouble.

3 work samples that got my foot in the door, and 1 that almost got me tossed out.

Big Trouble graffiti logo. Photo by Nikhil Mitra on Unsplash

Analyzing My Data Science Portfolio Projects

Like many graduates, I did not have a job lined up after earning my data science master’s degree; the reason is because I didn’t apply anywhere.

For three months.

I graduated in spring time but didn’t begin my job search, in earnest, until summer.

Either fear or foresight compelled me to believe I wasn’t ready, and that emotion drove me to invest the hours I wasn’t working an hourly job at a local hotel into crafting a marketable, desirable data science portfolio.

When I receive comments and LinkedIn messages with questions related to breaking into the data industry, my first tip is always to create a GitHub-hosted portfolio that you can use as a professional calling card.

Bonus points if you create engaging documentation or share that content on a platform like Medium or LinkedIn to connect with potential hiring managers.

While I’ve written about the importance of having a side project, even as a working professional, and shared my communication tips for presenting data projects, I realized that I’ve never shared the contents of my own portfolio.

My hope is that, for those of you wondering where to start when it comes to displaying your work, you’ll see real examples of what kinds of projects catch a recruiter or interviewer’s eye and why.

Before I share my work and process, I have to acknowledge that while I had no formal tech background, I did have the following traits recruiters and managers found desirable (based on feedback, not my egoism, I swear):

A master’s degree in data science
Domain knowledge (I purposely applied for roles within the media, education and hospitality industries, all of which I’ve worked in before)
Python/SQL/BI development experience (course work, personal projects)
Communication skills (a huge plus, even for junior engineers or developers)

Note: Needless to say, your results may vary.

Project 1: Udemy Course Data Analysis Dashboard

Update: Since original publication, I’ve created a walkthrough of this project so you can understand not only the code, but the process behind its conception and execution.

Creating The Dashboard That Got Me A Data Analyst Job Offer

A walkthrough of the Udemy dashboard that got me a job offer from one of the biggest names in academic publishing.

medium.com

Summary: A static dashboard that combined Google Search data and a dataset from Kaggle to draw insights about course attendance and content demand on the Udemy course hosting platform.

Tech Used:

SQL
Tableau
Python

Datasets:

Udemy dataset from Kaggle
Google Trends data

Key insights shared:

Nearly a quarter of Udemy’s courses cost less than $20, which is $20 less than the average price of competitor Edx
Over the past 5 years Udemy has received nearly 2x the Search traffic of Edx
9 out of 10 of Udemy’s most subscribed to courses are in web development

Why this project works:

Domain relevancy (I was applying to an education company)
Demonstrated data visualization best practices
Showed ability to conduct competitive analysis

Project 2: The K-Word

Summary: After years of headlines involving various ‘Karens’, I decided to investigate trends related to the popularity of the name ‘Karen’ over the past 100 years and create visualizations to turn into a data-driven story.

The K-Word: The Rise and fall of the name “Karen” from 1920 — present

Earlier this summer, the Social Security Administration revealed that negative press associated with the name “Karen”…

zachl-quinn.medium.com

Tech used:

Python
BigQuery (database)
BigQuery (SQL)

Datasets:

Social Security baby name data (BigQuery)
Google Trends

Key insights shared:

Karen hit its peak in 1957, with an average of 725 Karens born
Karen has a positive correlation with searches for ‘racist’, ‘nasty’ and ‘manager’ and a negative correlation with searches for ‘kindness’
Spain, Iraq and Iran are the non-U.S. countries that searched for the ‘Karen’ insult the most

Why this project works:

Novelty
Narrative flow
Use of multiple data sources
Clear visualization

Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow Pipeline: Your Data Engineering Resource.

To receive my latest writing, you can follow me as well.

Project 3: Churn Prediction in Massive Open Online Course Enrollment

Summary: Using multivariable logistic regression, I created 4 models to determine the likelihood for churn by MOOC students, a group known for their well-documented lack of commitment to educational products.

GitHub portfolio screenshot. Data science project by the author.

Tech used:

Python
GitHub

Datasets:

Edx data (Kaggle)

Key insights shared:

Despite a 90% accuracy, the model precision was considerably lower (high 60%)
Best predictors of likelihood to churn included: Percentage of course complete, percentage of course audited and whether or not the individual held a bachelor’s degree
Reduction of features created a more precise model, with accuracy and precision metrics hovering around 95%

Why this project works:

Tackles a pervasive business problem: Churn
Demonstrates proficiency in building ML models
Shows ability to interpret and present model results

And The Project That Got Me in Trouble…

Tragedy Porn

Summary: In 2018 5 adult film stars died within 3 months. Though most of the deaths were underreported, the suicide of porn star August Ames, 23, brought renewed attention to an industry that has experienced a surge in performers dying before age 30 due to suicide, overdoses and violence.

The controversial dashboard. Data, visualization and writing by the author.

Tragedy Porn

In 2018 5 adult film stars died within 3 months. Though most of the deaths were underreported, the suicide of porn star…

zachl-quinn.medium.com

Tech Used:

Python
SQL (SQL Lite)
Tableau

Datasets:

Custom dataset built by combining data scraped from Wikipedia with Kaggle data

Key insights shared:

In 2018 5 popular adult actresses died within 3 months of each other
In the last 5 years, 10 performers have died before age 26
12 performers total died in 2018, marking a 25-year high

Why this project works:

Demonstrated an ability to create a custom, aggregated dataset (a super important skill for any data job)
Featured data scraped from the web (a very marketable skill)
Displayed knowledge of visualization best practices
Tells a clear, easy-to-follow data story that engages an audience

Why It Got Me in Trouble

(By Now, You Can Probably Guess)

I presented this project during a panel interview with senior data analysts.

In the moment they were engaged and asked questions about how I gathered, manipulated and displayed my data.

By all accounts, the interview went well.

Then the recruiter called.

While she was happy to inform me I was advancing to the next interview round, she suggested that I maybe not present a dashboard called ‘Tragedy Porn’ to the company’s executives in the final interview round.

She shared that my interviewers suggested that, while the analysis was solid, the content was a bit controversial (to say the least).

I come from the world of journalism and media where very few topics are off the table, so this was a learning experience for me.

In the end, I did get and then decline an offer from the company.

Overall it was a fine experience with a valuable lesson for any candidate: Know your audience and target your content accordingly.

Recap

Data science portfolios are not to be taken or to be assembled lightly.

Recruiters, interviewers and your future bosses can absolutely distinguish between a thrown-together repo of code vs. a true, well-honed portfolio.

While your goal shouldn’t be to replicate the work I’ve shared here, there are certainly lessons to be learned:

Make sure your chosen projects focus on analyzing or attempting to solve real business problems (bonus points if these include problems your target organization faces).
You’ll more than likely be laboring over these projects for many unpaid hours. Pick topics and data that interest you.
Interpreting and presenting results is an overlooked but highly valuable skill. Learn how to confidently present and answer follow-up questions.
Do your homework and try to use technology that mirrors what your target organization uses. Many job descriptions I encountered featured heavy Python and SQL needs, hence the focus.
Know your audience and choose topics accordingly.

Hopefully, if you’re seeking to enter this field, you are passionate about data (or at least interested to the extent that you can do the job without getting too bored).

In addition to sharing your portfolio, don’t be afraid to share the story behind your process with your interviewers.

Managers want candidates that are passionate about data and solving business problems. Show them evidence that these descriptions apply to you.

For more data science skill building, check out my latest work:

Even SQL Newbies Can Materialize SQL Views In Under 10 Minutes

Learn one of the most useful but intimidating SQL/data engineering skills in just a few minutes.

medium.com

Create a job-worthy data portfolio. Learn how with my free project guide.