avatarDmytro Iakubovskyi

Summary

Machine Learning Engineers have higher reported salaries than Data Scientists, with a difference of about 14,000 USD/year, or 10% of the average yearly compensation.

Abstract

According to a Machine Learning model trained to predict reported salaries based on various features, Machine Learning Engineers have higher reported salaries than Data Scientists. The average difference between the model predictions for these two job titles is about 14,000 USD/year, or 10% of the average yearly compensation. The largest absolute difference between SHAP values for predicted Machine Learning Engineer and Data Scientist yearly compensations is for Senior level (Expert) positions, while the largest relative difference is for Entry level (Junior) positions. Both absolute gap and relative gap are the largest for the non-remote work and the medium-sized companies (50 to 250 employees). The largest difference in absolute and relative terms is for employees working and residing in leading EU markets such as Germany and France.

Opinions

  • The author believes that the difference between SHAP values for predicted Machine Learning Engineer and Data Scientist yearly compensations grows between 2022 and 2023, both in absolute and relative terms.
  • The author thinks that the gap between these job titles becomes larger over time.
  • The author suggests that the largest relative difference between these job titles is for Entry level (Junior) positions.
  • The author indicates that both absolute gap and relative gap are the largest for the non-remote work and the medium-sized companies (50 to 250 employees).
  • The author states that the largest difference in absolute and relative terms is for employees working and residing in leading EU markets such as Germany and France.

Machine Learning Engineer — the new “sexiest” job in AI and Machine Learning domain?

Differences of SHAP values for reported 2022–2023 yearly compensations between Machine Learning Engineers and Data Scientists across

Photo by AltumCode on Unsplash

One of the results from my previous regularly updated paper,

is an important observation is that the Data Scientist job is no longer the one with the highest reported salaries (quantified in terms of SHAP values for reported yearly compensation):

Source: author, screenshot from Newest salaries in Data Science and AI explained by SHAP values | by Dmytro Iakubovskyi | Data And Beyond | Medium

In other words, in terms of Machine Learning model trained to predict the reported salary based on various features (job title, experience level, country location, work year, company size, remote work ratio, and employment type), the average difference between the model predictions (given other parameters the same) for Machine Learning Engineers and Data Scientists is about 14,000 USD/year, or about 10% of the average yearly compensation.

Below, I further break the difference between these two job titles in terms of the abovementioned features. The corresponding code is available as a Kaggle notebook.

The code snippet in Python:

def plot_gap(col, main_col="job_title", value1="Machine Learning Engineer", value2="Data Scientist"):
    df_infl = X_test.copy()
    df_infl['shap_gd'] = shap_values[:,int(list(X_test.columns).index(main_col))]
    df1_mean = pd.pivot_table(df_infl, values=['shap_gd'], index=[col, main_col], aggfunc=np.mean)
    df1_std = pd.pivot_table(df_infl, values=['shap_gd'], index=[col, main_col], aggfunc=np.std)
    df2_mean = pd.pivot(df1_mean.reset_index(), index=col, columns=main_col, values='shap_gd')[[value1, value2]].dropna(axis=0)
    df2_mean['gap'] = df2_mean[value1]-df2_mean[value2]
    df2_std = pd.pivot(df1_std.reset_index(), index=col, columns=main_col, values='shap_gd')[[value1, value2]]
    df2_std['std'] = np.sqrt(df2_std[value1]**2 + df2_std[value2]**2)
    df2 = df2_mean[['gap']].join(df2_std[['std']], how='inner')
    df2 = df2.dropna(axis=0).sort_values('gap', ascending=False).sort_values('gap', ascending=False)
    plt.figure(figsize=(12,8))
    plt.bar(x=df2.index, height=df2['gap'])
    plt.errorbar(df2.index, df2['gap'], yerr=df2['std'], fmt="o", color="r")
    plt.title(f'SHAP value of gap per {col}, yearly compensation')
    plt.ylabel('kUSD/year')
    plt.tick_params(axis="x", rotation=90)
    plt.show();
    print()
    print()
    df_infl['shap_'] = shap_values[:,int(list(X_test.columns).index(col))]
    df2['avg_pay'] = expected_values + df_infl.groupby(col)['shap_'].mean()
    df2['avg_pp'] = 100*df2['gap']/df2['avg_pay']
    df2 = df2.sort_values('avg_pp', ascending=False)
    plt.figure(figsize=(12,8))
    plt.bar(x=df2.index, height=df2['avg_pp'])
    plt.errorbar(df2.index, df2['avg_pp'], yerr=100*df2['std']/df2['avg_pay'], fmt="o", color="r")
    plt.title(f'Gap per {col} relative to average pay')
    plt.ylabel('Percentage points')
    plt.tick_params(axis="x", rotation=90)
    plt.show();
    return

for col in X_test.columns:
    if col != 'job_title':
        print(col)
        plot_gap(col)

Work year

The difference between SHAP values for predicted Machine Learning Engineer and Data Scientist yearly compensations grows between 2022 and 2023, both in absolute terms:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

and in relative terms:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

In other words, the gap between these job titles becomes larger over time.

Experience level

While the largest absolute difference between SHAP values for predicted Machine Learning Engineer and Data Scientist yearly compensations is for Senior level (Expert) positions:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

the largest relative difference (about 12%) is for Entry level (Junior) positions:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

Remote work ratio

Both absolute gap and relative gap are the largest for the non-remote work:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle
Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

Company size

Both absolute gap and relative gap are the largest for the medium-sized companies (50 to 250 employees):

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle
Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

Residence location

Finally, the largest difference in absolute and relative (about 20%!) terms is for employees working and residing in leading EU markets such as Germany and France:

Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle
Source: author, screenshot from AIML salaries 2022–2023 CatBoost+SHAP | Kaggle

I hope these results can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.

Salary
Data Science
Machine Learning
Data Scientist
Machine Learning Engineer
Recommended from ReadMedium