There’s No Such Thing as a Data Scientist
The Inconsistent Definitions of Data Science and More Descriptive Titles

What do you really do?
There’s a memorable scene in Office Space where consultants determining employee productivity start by asking, “What would you say… you do here?”
That scene and the “What I Do” images are funny because we empathize with the struggle to describe our jobs. It’s not funny, however, when the same misunderstanding occurs during the job search. It’s important to understand what a job posting means. It’s important for prospective employers to understand our skills and abilities. We’ve all viewed job postings with the same title, but with totally different descriptions.
How can the same title mean such vastly different things from one company to another?
This phenomenon is becoming increasingly common in the field of data science. The discipline has dramatically risen in popularity over the past few years. And while the number of data science jobs has increased, clarity around the role has declined. This post takes advantage of Indeed’s tremendous amounts of behavioral data to describe trends in the field and more specific definitions for data science roles.
The growing popularity of data science
Jobs matching “data scientist” have risen from 0.03% of jobs to about 0.15% (+400%) in a 4-year span.
Even earlier in 2012, a much ballyhooed article called Data Scientist the “Sexiest Job of the 21st Century.” If the title alone isn’t enough, maybe folks are interested for monetary reasons. According to Indeed’s salary data, a data scientist makes an average of $130k per year.
OK. Got it. Data science has taken off like discounted Nutella in a European supermarket. With this rise, we’ve also seen the refinement of more specific roles within the discipline. Our colleague Trey Causey wrote about the convergence between product managers and data scientists in the “Rise of the Data Product Manager.”
Many of us at Indeed also felt that the title “data scientist” has recently become more of a catchall for many different sets of responsibilities. We wanted to dig deeper and test our intuition. Could we find natural delineations of roles within the job market? Could we use data to understand the differences within these titles and better classify them for clarity and consistency?
Spoiler Alert: We can.
Overlapping careers in data science
For this analysis of job titles, we looked at all site visitors who entered the search query “data scientist” on Indeed for the month of January 2018. Next, we looked at other searches these same users performed. We created a matrix for each user by their searches and another for searches by users. We calculated the cartesian product of these matrices to show the frequency between any pair of search terms:

Next, we removed “data scientist” from the data, as this search was present for all users. We used an R package called “igraph” to do the clustering and visualization. According to the igraph documentation, “this function implements the fast greedy modularity optimization algorithm for finding community structure.” While researching this algorithm, we learned that it was designed to quickly create communities from large data sets that have sparse regions. Hmm, that sounds exactly like the data we are using!
Here’s a great obligatory equation we can add for how this works. You’ll have to read that paper to understand what it means
Next, we wrote a function with a pruning parameter to choose the minimum number of vertices in each cluster. This parameter is best set by “guess and check”, as higher numbers don’t necessarily mean more total groups and vice versa. We tried various numbers from 3–20 and checked to see if the groups made sense. We didn’t care about really small clusters and we wanted the queries to fit together. More on this later.
By choosing five as the pruning threshold, four clusters formed. We subsequently labeled these clusters “business intelligence”, “statistician”, “machine learning engineer”, and “natural scientist”.
Here are the queries that make up each group:







