avatarSarah Floris

Summary

The article analyzes trends among data professionals based on Stack Overflow surveys and question tags from 2017 to 2021, identifying the most popular languages, databases, and platforms to focus on in the future.

Abstract

The future of data professionals is examined through an analysis of Stack Overflow data, including surveys from 2017 to 2021 and question tags. The study, which filtered responses to include only data-related roles, reveals Python and TypeScript as rising stars in programming languages, with Python being the most desired language to learn. In the realm of databases, MongoDB and PostgreSQL show increasing popularity, while Microsoft SQL Server, MySQL, and SQLite are on the decline. For platforms, AWS, Google Cloud Platform, and Microsoft Azure are the frontrunners, with growing interest as evidenced by Stack Overflow question tags. The article suggests that data professionals should prioritize learning these trending technologies to stay relevant in their field.

Opinions

  • The author believes that Python's popularity is significant and recommends it as a primary language for data professionals to learn.
  • TypeScript is also highlighted as a valuable language to learn in 2022, based on increasing interest in Stack Overflow questions.
  • MongoDB and PostgreSQL are considered important databases for future learning due to their upward trends in both survey data and question tags on Stack Overflow.
  • AWS, Google Cloud Platform, and Microsoft Azure are seen as the most relevant cloud platforms for data professionals, with a clear recommendation to focus on these technologies for career development.
  • The author expresses surprise at the relatively low percentage of respondents identifying as data professionals, peaking at 24.98% in 2019.
  • There is an implication that some databases (Microsoft SQL Server, MySQL, and SQLite) and programming languages (C# and Java) may be less in demand going forward, based on the observed trends.

The Future of Data Professionals According to Stack Overflow

Platforms, databases, and languages you will most likely see in the future according to data developers

Stack Overflow’s icon⁴

Introduction

Stack Overflow is an awesome developer community. There is no denying it. And the best part is that that we all use it one way or another. I was able to find some of the python syntax that is used to transform the data and create the plots for today’s article. And all of that data is open to the public.

How can we use this data to figure out what to learn? Are there some databases, platforms, or languages that should be avoided? I will be analyzing both the survey data from 2017 to 2021. While looking at the survey data and the questions data today, I want to draw some hypothesis or theories about where we are going in the data realm. What languages will we be using? What databases? How about platforms?

Interested in looking how I got the data and these plots? Take a look at my Kaggle notebook here.

Datasets

Surveys

The primary dataset was Stack Overflow Datasets 2011 to Present²; I combined this dataset with the results from this year’s Stack Overflow survey results found here.¹ A lot of the initial code work I performed was based upon the work that Kasaraneni had already performed.³

Since 2017, Stack Overflow’s surveys had 338,489 respondents total. I then proceeded to remove those whose developer type was empty or null, leaving us with 283,272 total respondents. These respondents are spread out across the years as shown below where 2019 had the most respondents.

2019 had most respondents. Chart created by Author.

I filtered out the developer types who contained either ‘Data’ or ‘data’; the remaining developer types were database administrator, data scientist or machine learning specialist, data engineer, and data or business analyst. I was actually surprised to see that the max percentage of data professionals respondents was at a max 24.98% in 2019.

Out of the 5 years, 2019 had most data respondents. Chart created by Author.

In addition to filtering out non-data professionals, I removed respondents who did not have any professional experience, leaving me with 64,703 respondents. Number of developers are spread out over the 5 years, where the largest percentage was in 2019 at 30.53%.

Created by Author

Questions

The other dataset I want to incorporate is the number of questions that have been asked on the Stack Overflow website. These questions will have tags that I will filter out using the results from the survey. Stack Overflow has their trends page set for the last 12 years; I cannot filter these out by dates, but I want to focus only on the last 7 years.

Analysis

Languages

The most desired languages to be learned were Python, SQL, JavaScript, HTML/CSS, TypeScript, Bash/Shell, C#, Go, Rust, and Java where the order is in order of popularity.

Created by Author. Green means top 10.

I compared these languages and the number of times that these same languages are tagged in questions on Stack Overflow.

Stack Overflow Language Trends⁴

Questions with tags Python and Typescript have increased, meaning those tend to get asked more frequently meanwhile questions with tags c# and Java have decreased. The other tags can still go either way.

Based upon this data, we should stick with Python and if you are interested in learning Typescript, learn Typescript in 2022.

Databases

Similarly, the survey from data professionals showed that the top 10 databases were PostgreSQL, MySQL, MongoDB, Redis, SQLite, Microsoft SQL Server, Elasticsearch, MariaDB, Firebase, and lastly DynamoDB.

Created by Author. Green means top 10.

I compared these databases and the number of times that these same databases are tagged in questions on Stack Overflow.

Stack Overflow Database Trends⁴

Microsoft SQL Server, MySQL, and SQLite are trending downwards. On the other hands, MongoDB and PostgreSQL are trending upwards. The others can either go either way.

In 2022, we should learn how to pull data and understand the architecture of MongoDB and PostgreSQL.

Platforms

Similarly, the survey from data professionals showed that the platforms are AWS, Google Cloud Platform, Microsoft Azure, Heroku, and IBM Cloud or Watson.

Created by Author.

I compared these platforms and the number of times that these same platforms are tagged in questions on Stack Overflow.

Stack Overflow Platform Trends⁴

Questions with tags Amazon Web Services, Microsoft Azure, and Google Cloud Platform have increased over the past 7 years. However, the other cloud providers are unfortunately not as desire-able but could still go either way.

Learning any of the top 3 cloud provides will be useful in 2022.

There you have it folks. The suggested languages, platforms, and databases to learn according to the data professionals who answered the Stack Overflow survey and questions from Stack Overflow.

Interested in looking how I got the data and these plots? Take a look at my Kaggle notebook here.

Resources
1.https://insights.stackoverflow.com/survey/2021#developer-profile-developer-roles
2. https://www.kaggle.com/chaitanyakck/stackoverflow-datasets-2011-to-present
3, https://www.kaggle.com/chaitanyakck/eda-on-stack-overflow-survey-results-2017-2020
4. https://stackoverflow.design/brand/logo/
5. https://insights.stackoverflow.com/trends
Data Science
Data Visualization
Software Development
Software Engineering
Data
Recommended from ReadMedium