avatarZach Quinn

Summary

The author shares their experience of creating a data science portfolio that helped them secure interviews and job offers, along with a project that caused controversy.

Abstract

The author, a data science graduate, discusses the importance of having a portfolio to showcase skills and projects. They share three projects from their portfolio that helped them secure interviews: a Udemy course data analysis dashboard, an analysis of the name 'Karen,' and a churn prediction model for massive open online courses. The author also shares a controversial project on adult film star deaths that got them in trouble during an interview. The author emphasizes the importance of knowing your audience and choosing appropriate topics for your portfolio.

Bullet points

  • The author shares their data science portfolio that helped them secure interviews and job offers.
  • The first project is a Udemy course data analysis dashboard, which demonstrated domain relevancy, data visualization best practices, and the ability to conduct competitive analysis.
  • The second project is an analysis of the name 'Karen,' which showcased novelty, narrative flow, use of multiple data sources, and clear visualization.
  • The third project is a churn prediction model for massive open online courses, which tackled a pervasive business problem, demonstrated proficiency in building ML models, and showed the ability to interpret and present model results.
  • The author also shares a controversial project on adult film star deaths that got them in trouble during an interview.
  • The author emphasizes the importance of knowing your audience and choosing appropriate topics for your portfolio.

3 Data Science Projects That Got Me 12 Interviews. And 1 That Got Me in Trouble.

3 work samples that got my foot in the door, and 1 that almost got me tossed out.

Big Trouble graffiti logo. Photo by Nikhil Mitra on Unsplash

Analyzing My Data Science Portfolio Projects

Like many graduates, I did not have a job lined up after earning my data science master’s degree; the reason is because I didn’t apply anywhere.

For three months.

I graduated in spring time but didn’t begin my job search, in earnest, until summer.

Either fear or foresight compelled me to believe I wasn’t ready, and that emotion drove me to invest the hours I wasn’t working an hourly job at a local hotel into crafting a marketable, desirable data science portfolio.

When I receive comments and LinkedIn messages with questions related to breaking into the data industry, my first tip is always to create a GitHub-hosted portfolio that you can use as a professional calling card.

Bonus points if you create engaging documentation or share that content on a platform like Medium or LinkedIn to connect with potential hiring managers.

While I’ve written about the importance of having a side project, even as a working professional, and shared my communication tips for presenting data projects, I realized that I’ve never shared the contents of my own portfolio.

My hope is that, for those of you wondering where to start when it comes to displaying your work, you’ll see real examples of what kinds of projects catch a recruiter or interviewer’s eye and why.

Before I share my work and process, I have to acknowledge that while I had no formal tech background, I did have the following traits recruiters and managers found desirable (based on feedback, not my egoism, I swear):

  • A master’s degree in data science
  • Domain knowledge (I purposely applied for roles within the media, education and hospitality industries, all of which I’ve worked in before)
  • Python/SQL/BI development experience (course work, personal projects)
  • Communication skills (a huge plus, even for junior engineers or developers)

Note: Needless to say, your results may vary.

Project 1: Udemy Course Data Analysis Dashboard

Update: Since original publication, I’ve created a walkthrough of this project so you can understand not only the code, but the process behind its conception and execution.

Summary: A static dashboard that combined Google Search data and a dataset from Kaggle to draw insights about course attendance and content demand on the Udemy course hosting platform.

Udemy dashboard by the author.

Tech Used:

  • SQL
  • Tableau
  • Python

Datasets:

  • Udemy dataset from Kaggle
  • Google Trends data

Key insights shared:

  • Nearly a quarter of Udemy’s courses cost less than $20, which is $20 less than the average price of competitor Edx
  • Over the past 5 years Udemy has received nearly 2x the Search traffic of Edx
  • 9 out of 10 of Udemy’s most subscribed to courses are in web development

Why this project works:

  • Domain relevancy (I was applying to an education company)
  • Demonstrated data visualization best practices
  • Showed ability to conduct competitive analysis

Project 2: The K-Word

Summary: After years of headlines involving various ‘Karens’, I decided to investigate trends related to the popularity of the name ‘Karen’ over the past 100 years and create visualizations to turn into a data-driven story.

Tech used:

  • Python
  • BigQuery (database)
  • BigQuery (SQL)

Datasets:

  • Social Security baby name data (BigQuery)
  • Google Trends

Key insights shared:

  • Karen hit its peak in 1957, with an average of 725 Karens born
  • Karen has a positive correlation with searches for ‘racist’, ‘nasty’ and ‘manager’ and a negative correlation with searches for ‘kindness’
  • Spain, Iraq and Iran are the non-U.S. countries that searched for the ‘Karen’ insult the most

Why this project works:

  • Novelty
  • Narrative flow
  • Use of multiple data sources
  • Clear visualization

Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow Pipeline: Your Data Engineering Resource.

To receive my latest writing, you can follow me as well.

Project 3: Churn Prediction in Massive Open Online Course Enrollment

Summary: Using multivariable logistic regression, I created 4 models to determine the likelihood for churn by MOOC students, a group known for their well-documented lack of commitment to educational products.

GitHub portfolio screenshot. Data science project by the author.

Tech used:

  • Python
  • GitHub

Datasets:

  • Edx data (Kaggle)

Key insights shared:

  • Despite a 90% accuracy, the model precision was considerably lower (high 60%)
  • Best predictors of likelihood to churn included: Percentage of course complete, percentage of course audited and whether or not the individual held a bachelor’s degree
  • Reduction of features created a more precise model, with accuracy and precision metrics hovering around 95%

Why this project works:

  • Tackles a pervasive business problem: Churn
  • Demonstrates proficiency in building ML models
  • Shows ability to interpret and present model results

And The Project That Got Me in Trouble…

Tragedy Porn

Summary: In 2018 5 adult film stars died within 3 months. Though most of the deaths were underreported, the suicide of porn star August Ames, 23, brought renewed attention to an industry that has experienced a surge in performers dying before age 30 due to suicide, overdoses and violence.

The controversial dashboard. Data, visualization and writing by the author.

Tech Used:

  • Python
  • SQL (SQL Lite)
  • Tableau

Datasets:

  • Custom dataset built by combining data scraped from Wikipedia with Kaggle data

Key insights shared:

  • In 2018 5 popular adult actresses died within 3 months of each other
  • In the last 5 years, 10 performers have died before age 26
  • 12 performers total died in 2018, marking a 25-year high

Why this project works:

  • Demonstrated an ability to create a custom, aggregated dataset (a super important skill for any data job)
  • Featured data scraped from the web (a very marketable skill)
  • Displayed knowledge of visualization best practices
  • Tells a clear, easy-to-follow data story that engages an audience

Why It Got Me in Trouble

(By Now, You Can Probably Guess)

I presented this project during a panel interview with senior data analysts.

In the moment they were engaged and asked questions about how I gathered, manipulated and displayed my data.

By all accounts, the interview went well.

Then the recruiter called.

While she was happy to inform me I was advancing to the next interview round, she suggested that I maybe not present a dashboard called ‘Tragedy Porn’ to the company’s executives in the final interview round.

She shared that my interviewers suggested that, while the analysis was solid, the content was a bit controversial (to say the least).

I come from the world of journalism and media where very few topics are off the table, so this was a learning experience for me.

In the end, I did get and then decline an offer from the company.

Overall it was a fine experience with a valuable lesson for any candidate: Know your audience and target your content accordingly.

Recap

Data science portfolios are not to be taken or to be assembled lightly.

Recruiters, interviewers and your future bosses can absolutely distinguish between a thrown-together repo of code vs. a true, well-honed portfolio.

While your goal shouldn’t be to replicate the work I’ve shared here, there are certainly lessons to be learned:

  • Make sure your chosen projects focus on analyzing or attempting to solve real business problems (bonus points if these include problems your target organization faces).
  • You’ll more than likely be laboring over these projects for many unpaid hours. Pick topics and data that interest you.
  • Interpreting and presenting results is an overlooked but highly valuable skill. Learn how to confidently present and answer follow-up questions.
  • Do your homework and try to use technology that mirrors what your target organization uses. Many job descriptions I encountered featured heavy Python and SQL needs, hence the focus.
  • Know your audience and choose topics accordingly.

Hopefully, if you’re seeking to enter this field, you are passionate about data (or at least interested to the extent that you can do the job without getting too bored).

In addition to sharing your portfolio, don’t be afraid to share the story behind your process with your interviewers.

Managers want candidates that are passionate about data and solving business problems. Show them evidence that these descriptions apply to you.

For more data science skill building, check out my latest work:

Create a job-worthy data portfolio. Learn how with my free project guide.

Data Science
Job Hunting
Learning To Code
Data Analysis
Data Engineering
Recommended from ReadMedium