avatarNatassha Selvaraj

Summary

The article outlines five unique data science portfolio projects that demonstrate the author's skills and passion for data science, setting them apart in the job market.

Abstract

The author emphasizes the importance of showcasing distinctive data science projects in one's portfolio to stand out in the competitive job market. They share personal experience about how their unique projects, such as analyzing female representation in Hollywood, creating an app to distinguish real from fake faces, and performing sentiment analysis on a YouTube feud, helped them land their first data science internship and continue to attract attention. The projects not only demonstrate technical skills like data cleaning, machine learning, and model deployment but also tell compelling stories that capture the attention of recruiters and the public. The author encourages aspiring data scientists to apply their learning in real-world projects and share them online to improve their skills and increase their visibility in the industry.

Opinions

  • Common data science projects like those using the Titanic or Boston House Pricing datasets can harm a portfolio by making the candidate appear inexperienced.
  • Projects should be engaging and tell a story to make a resume stand out.
  • Non-technical recruiters appreciate interactive applications or data-driven stories over raw code.
  • Personal projects are crucial for demonstrating practical skills and passion for data science.
  • Building and sharing projects online can lead to job offers, partnerships, and freelance opportunities.
  • Overcoming challenges while working on projects is a significant source of learning and skill improvement.

5 of My Best Data Science Portfolio Projects

The best data science projects I’ve ever built

Photo by Bram Naus on Unsplash

To enter the data science industry as a beginner, you need to prove that you have the skills necessary to do the job.

If you don’t have any formal data science qualification, the best way to do this is by building data science projects.

When you showcase these projects on your portfolio, it gives a potential employer confidence that you’d be able to do the job.

However, not all data science projects are good to have on your portfolio. Some projects are just too common and simple — like machine learning on the Titanic Dataset, or linear regression on the Boston House Pricing dataset.

Showcasing projects like these actually do more harm to your portfolio than good. They give the impression that you are a novice who has done no more than a beginner-level data science course, and are unable to showcase skills beyond that.

So what kind of projects should you be showcasing on your portfolio?

Showcase projects that tell a story. Build something you are passionate about.

People love stories.

Recruiters receive hundreds of resumes for a single job listing.

To stand out, you need to showcase something that would capture their attention.

If your recruiter is a non-technical person, they aren’t going to understand a bunch of code sitting in your GitHub repository.

You need to build something they can interact with, or tell a story they can relate to with the help of data.

This will make your application stand out amongst the hundreds of other resumes in the pile.

It also shows that you have passion for what you do.

I have worked on countless data science projects in the past.

I started some of them when I was still a novice at programming. To be honest, most of them were awful.

I look back at some of my projects and realize that my coding practices were terrible. I could have done so many things better.

However, these were the projects that helped me land my first data science internship. These are also the projects I still get messages and questions about. Because people remember them. They are unique, and they tell a story.

Project 1: An analysis of Female Representation in Hollywood

Photo by Christina @ wocintechchat.com on Unsplash

I came up with the idea for this project when watching a show called Jane the Virgin.

There was an episode in the show that addressed gender disparity in the media.

There is a test called the Bechdel Test that measures female representation in works of fiction. A movie only passes the Bechdel Test if the following criteria are met:

  • The movie needs to have at least two women in it
  • Who speak to each other
  • About something other than a man

I thought it would be interesting to perform an analysis to answer the following questions about Hollywood movies:

  • Do movies with female directors pass the Bechdel Test more often?
  • Does the genre of a movie have an impact on whether it passed the Bechdel Test?
  • Has female representation in Hollywood improved over time?
  • Are movies that passed the Bechdel Test rated higher than movies that didn’t?

This was my first real data analysis project. I set out to answer all these questions with the help of multiple datasets available on the Internet.

I showcased skills like data cleaning, manipulation, analysis, and visualization.

Once I was done with the project, I wrote an article about it and posted all the codes and results online.

Project 2: An app to distinguish between real and fake faces

Image by author

I was fascinated by the ability of AI applications to create fake faces.

In this project, I collected a dataset of both real and fake faces. Then, I created a quiz that allowed you to identify if a face was fake or real.

You can take my quiz for as long as you like. Everytime you refresh the page or click “next,” an image will be randomly selected and presented to you. All you need to do is guess whether it is an AI generated image or a real image.

Skills demonstrated: Javascript, HTML, CSS, Flask, Python

Project 3: Sentiment analysis of a YouTube feud

Photo by Pablo Rebolledo on Unsplash

Ever since I was 12, I enjoyed watching YouTube videos and drama channels.

A huge controversy recently came to light involving two popular YouTubers — James Charles and Tati Westbrook.

Both influencers found themselves in the middle of a very public dispute, one that caused them to lose millions of followers and brand deals.

I thought it would be interesting to do an analysis of both influencers in order to better understand public opinion on them.

I scraped data from Twitter and YouTube and built a sentiment analysis model to understand public opinion on this feud. I wanted to see whose side people were on, and if people’s opinion on this controversy had changed over time.

Skills demonstrated: data scraping, API usage, Python, sentiment analysis, data visualization

Project 4: Celebrity image recognition model

Image by author

Ever wondered who your celebrity look-alike was?

All you need to do is upload an image of yourself and click on the predict button.

The deep learning model will provide you with its prediction of the celebrity you most resemble.

Skills demonstrated: Javascript, HTML, CSS, Flask, Python, model deployment, Keras

Project 5: Customer segmentation with Python

Image by Clay Banks on Unsplash

This project is the only one in this list that has a business application.

I used a dataset on Kaggle, and came up with a K-Means clustering model to come up with different consumer segments.

This is a pretty popular dataset for unsupervised learning, and many people have used it to build segmentation models.

To differentiate my analysis from the rest, I analyzed the different segments that were built at the end. I created consumer profiles based on segment behaviour, and came up with separate marketing strategies for customers in each segment.

Skills demonstrated: Python, K-Means clustering, PCA, cluster interpretation, data analysis

Conclusion

There are over 4 million people enrolled in Andrew Ng’s machine learning course on Coursera.

Every aspiring data scientist will have one of these introductory level machine learning courses on their resume.

However, these courses don’t mean anything if you aren’t able to apply the skills you learnt in real life.

They also aren’t sufficient to demonstrate to a recruiter that you have the skills necessary to do the job.

To set yourself apart from the crowd, you need to build something that stands out. Build an application that an end-user can play around with, or write a blog post around your project.

Creating these projects and sharing them online have helped me reach people from all around the world.

Thanks to these projects, I have received many job offers, partnership opportunities, and freelance offers.

Working on these projects have also helped me improve my data science and programming skills.

Every time I come up with a project idea, I write down the different steps involved in building the project. Then, I draw a diagram of what I want the end product to look like.

I get stuck a lot along the way, but I learn a lot from overcoming problems. In fact, most of my programming knowledge comes from these personal projects I’ve worked on.

That’s all for this article, thanks for reading!

Create with the heart, build with the mind — Criss Jami

Data Science
Machine Learning
Technology
Artificial Intelligence
Programming
Recommended from ReadMedium