The article analyzes job market trends for UK data professionals, focusing on skills, salaries, and demand across different data-related roles based on data collected from Reed.co.uk.
Abstract
The article presents a comprehensive analysis of over 9000 job specifications for data professionals in the UK, utilizing data scraped from Reed.co.uk over a four-month period. It examines the demand for various data roles, such as data analysts, data scientists, data engineers, and machine learning engineers, revealing that analyst roles are most in demand and that data engineers are sought after more than data scientists. The study also explores salary trends, indicating that machine learning engineers command the highest median salary, followed closely by data engineers. The article delves into the required skills for these roles, highlighting the prevalence of Python, SQL, and cloud technologies like Azure and AWS. It also touches on the potential impact of economic conditions on job vacancies and the urgency of job applications based on the lifespan of job listings. The author plans to continue data collection to monitor changes in the job market.
Opinions
The author suggests that the data job market in the UK is dynamic but still robust despite economic uncertainties like inflation and recession.
There is an emphasis on the importance of Python and SQL across data roles, with a notable preference for Azure in data engineering and AWS in data science and machine learning roles.
The article implies that job titles in the data field can be misleading, as they often encompass a range of responsibilities and required skills.
The author provides a disclaimer against web scraping, positioning the article purely for educational purposes.
There is a subtle encouragement for readers to consider Medium membership through the author's referral link, hinting at the value of the content provided.
The author hints at the potential for future analysis, including the exploration of correlations between skills and salaries, and the consideration of application counts and job locations.
Jobs in Data: What the Data Tells Us About Skills And Salaries
A brief analysis of over 9000 job specs for UK data professionals
It’s an interesting time for job hunting. The Great Resignation, spawned in that brief post-pandemic period of soaring demand and personal re-evaluation, has tumbled into a murky mess of inflation and recession. So, it seems like a good time to analyze the job market — are companies still hiring, how much are they paying, and what are they looking for?
Reed.co.uk is one of the biggest job boards in the UK. In a previous article, I used the Reed API to programmatically access job listings, and in another article, I explored the use of AWS Batch to harvest this data regularly. Those Batch jobs have now been running for five months, and in this article, we explore the resulting data.
Disclaimer: This article is only for educational purposes. We do not encourage anyone to scrape websites, especially those web properties that may have terms and conditions against such actions.
A quick look at the data
I designed a job search to fetch every job listing in the UK with one of the following keywords in its title: data analyst, data scientist, data engineer, or machine learning. These are the four job families which we analyze in this article.
Which jobs are in high demand?
Across the four months of data collection, the job listings were distributed amongst the four job families as follows:
Analyst roles are comfortably in the lead, and interestingly there are 5 data engineer roles for every 2 data scientist roles.
Digging into the roles within each job family, the data scientist and data engineer families both have a similar seniority distribution: around 4% of each family are junior/graduate roles, 30% are regular roles, 20% are senior roles, and Lead or Principal titles make up 15% of data science roles and 9% of engineering roles. This is interesting because data departments are usually pretty flat, but at least 35% of data science vacancies here require some sort of seniority. Meanwhile, the ML engineer roles are skewed more towards the vanilla title (without any explicit seniority), presumably because these roles are more focused on technical specialism.
You may wonder what job titles are hiding in each family’s “other” section. The answer is a varied plethora of more verbose job titles which weren’t so easily categorized.
Some refer to the business or industry, such as HR Data Analyst, Customer Data Scientist, Data Engineer — Hedge Fund.
Some refer to technology, such as Data Scientist- Python C#, GCP Data Engineer, Machine Learning Engineer — Computer Vision.
Some refer to working logistics, such as Data Scientist — Remote, Data Engineer — Remote, Machine Learning Engineer / Remote/ High.
There’s also one particular job that Boeing annoyingly posted 46 times in different geographical locations.
Show me the money
About 60% of job listings on Reed are permanent, full-time roles that list an annual salary range. In this analysis, we take the midpoint of that range for each job listing and compare it across roles.
A disclaimer before we dive into this — this is just one job board, the salary range for any given job can be large, and we don’t know whether the roles which don’t declare salaries are in line with those which do. So…take this with a pinch of salt.
As you’d expect, the Analyst roles offer the lowest salary, with a median of £37.5k. Data engineer roles are a tiny bit more expensive than data scientist roles (a median of £65k to £62.5k), while the ML engineers command an even higher median (£72.5k).
In this messy but more granular box plot, we can see a few more nuances. A vanilla ML engineer role commands an average of £10k more than its data engineering equivalent, with that average salary sliding a further £5k for the data science counterpart. However, these differences all but disappear with increasing seniority (in fact, the most senior data engineers earn a little less than their data science counterparts, on average).
Skills for success
Each job listing comes with a full description. By looking at how many job listings contain certain keywords, we can get an idea of what skills are in demand.
(Note: Some key terms listed in this section are actually buckets of similar terms — e.g., deep learning and neural networks are counted together.)
Looking at programming languages (plus spark), python is still very much in demand, mentioned in a whopping 77% of data scientist roles, 59% of data engineer roles, and 86% of ML engineer roles. SQL is also very popular, and Spark and Scala are each mentioned by over 20% of data engineering roles.
Note: these skills are not mutually exclusive because one job listing may mention numerous skills, and so the percentages for each job family can add up to more than 100%.
Interestingly, Azure is the winner of the Clouds for data engineers, whilst ML engineers and data scientists are sticking with AWS. Perhaps Azure has made gains in the data storage space, whilst AWS is still king when it comes to machine learning orchestration.
On the data storage/access front (plus lambda as it didn’t neatly fall anywhere else), the data engineer roles unsurprisingly have the most action. Perhaps more surprising is the fact that the majority of data engineer listings don’t mention any of these terms.
This is a bit of a grab bag of tooling keywords for ETL, analytics, ML, and orchestration that I’ve loosely termed “platform skills”. Again dominated by data engineer roles and again not mentioned massively frequently, but around 10% of data engineering specs do mention Airflow, Kafka, or Databricks.
An even smaller number of listings are mentioning things like docker and Kubernetes, which could perhaps be more associated with certain DevOps roles.
Moving more towards data science, we see that statistics is mentioned in almost half of data science roles, over a third of ML engineer roles, and nearly 20% of data analyst roles. A big chunk of ML engineer roles mentions natural language processing (NLP) or deep learning, and almost 10% specifically refer to convolutional neural networks.
And in our final chart, the visualization tools Power BI and Tableau frequently appear in the analyst roles and, to a lesser extent, in the data engineer roles. We again get a sense of the ML engineers being focused more exclusively on deep learning than the data scientists, judging by mentions of Tensorflow and Pytorch.
In case anyone is interested: in order to decide which key terms to analyze, I trained a Word2Vec model and, for a set of seed words, calculated the document frequency for the most similar terms. For each seed word (e.g., python), I then pulled out words that were both similar and at least somewhat common (e.g., scala) in order to build up a set of key terms for different topics.
Is timing everything?
Has the threat of recession caused new vacancies to dry up? A bit, maybe.
This chart shows the number of new jobs posted each week. For the most part, the volumes look pretty constant, aside from a dip in the week of the Platinum Jubilee bank holidays and a corrective bounce the following week. However, the data science and analyst roles have dipped a little since the beginning of September, with data engineering holding steadier.
And finally, if you’ve ever seen a job listing you like and wondered if you can get away with putting off the application, the answer is…probably not.
This ugly chart displays the length of time that listings stay live, cumulatively. It shows that 15% of job listings disappear after less than a week, and a quarter disappears within 12 days.
Wrapping up
Hopefully, this article has provided some food for thought, especially if you’re at the start of your data career and wondering which way to go.
One thing that stands out to me, particularly from the skills analysis, is that these job titles mean slightly different things to different companies. A data engineer maybe someone who maintains a platform or someone who maintains the data pipelines within a platform, for example. A job spec for the former may focus on DevOps practices such as infrastructure as code, mentioning words like AWS and terraform, whilst a job spec for the latter may focus on things like python and Airflow.
What’s next?
For now, nothing! But I will continue to accumulate data and check back next year to see if anything has changed. Next time I might look into the fields I haven’t yet used, such as application count or location, or I may dive deeper into correlations. Do certain skills command different salaries, for example?
To get access to unlimited stories, consider signing up to become a Medium member for $5 per month. If you use my link, I’ll receive a small commission at no extra cost to you.