avatarAntonello Benedetto

Summary

The website content provides an explanation and demonstrates three methods to calculate weighted averages in Python, emphasizing the importance of using weighted averages when dealing with aggregated data.

Abstract

The article on the website focuses on the concept of weighted averages and their significance in accurately representing data that is grouped or aggregated. It illustrates this concept with an example of yearly salaries across different employee groups, highlighting the inaccuracy of simple averages in such cases. The author then proceeds to demonstrate three distinct methods to compute weighted averages in Python: using list comprehension, the zip() function, and the numpy library's average() function. Each method is accompanied by code snippets, providing readers with practical tools to perform weighted average calculations in their data analysis work. The article also includes affiliate links to recommended courses for further learning in Python and data science.

Opinions

  • The author believes that weighted averages are crucial for an accurate description of datasets with grouped data and different weights.
  • The article suggests that coding your own algorithm for weighted averages can be straightforward, offering two custom function examples.
  • It is implied that using numpy's built-in average() function is the easiest and most flexible method for computing weighted averages in a production environment.
  • The author expresses curiosity about other algorithms or packages readers might use for weighted averages, inviting them to share their experiences in the comments.
  • The inclusion of affiliate links indicates the author's endorsement of the linked courses as valuable resources for learning Python and data science.

Programming | Interviewing | Office Hours

3 Ways To Compute A Weighted Average in Python

In this brief tutorial, I show how to compute weighted averages in Python either defining your own functions or using NumPy

Photo by Dmitry Sidorov from Pexels

Update: Many of you contacted me asking for valuable resources to automate Excel tasks with Python or to apply popular statistical concepts in Python. Below I share four courses that I would recommend:

Hope you’ll find them useful too! Now enjoy the article :D

When To Use A Weighted Average?

Suppose you had to analyze the table below, showing the yearly salary for the employees of a small company divided in five groups (from lower to higher salary):

Image created by the author in Tableau. Mock data has been used.

If you computed the simple average of the Salary Per Year column you would obtain:

But is £62,000 an accurate representation of the average salary across the groups? Because data comes already aggregated and each group has a different Employees Number, the average Salary Per Year for each group weights differently in the overall average. In computing the simple average, the same weight was assigned to each group leading to a biased result.

In this cases, the solution is to take into account the weight of each group by computing a weighted average that can be represented algebraically with the formula:

Where x represents the distribution ( Salary Per Year ) and w represents the weight to be assigned ( Employees Number). Given that the table includes five groups, the formula above becomes:

An by replacing x and w with actual figures, you should obtain the result below:

Note how taking weights into account, the average Salary Per Year across the groups is almost £18,000 lower than the one computed with the simple average and this is an accurate way to describe our dataset given the number of employees in each group.

Now that the theory has been covered, let’s see how to obtain a weighted average in Python using 3 different methods. In order to do that, the first step is to import packages and the employees_salary table itself:

import pandas as pd
from numpy import average
df = pd.read_csv(‘C:/Users/anbento/Desktop/employee_salary.csv’)
df.head()
distribution = df[‘salary_p_year’]
weights = df[‘employees_number’]

Method #1 : Function Using List Comprehension

If you wish to code your own algorithm, the first very straightforward way to compute a weighted average is to use list comprehension to obtain the product of each Salary Per Year with the corresponding Employee Number ( numerator ) and then divide it by the sum of the weights ( denominator).

Output:
44225.35

The function above could easily be rewritten as a one liner:

Output:
44225.35

Method #2: Function Using Zip()

Instead of using list comprehensions, you could simply start from and empty list ( weighted_sum ) and append the product of the average salary for each group by its weight . The zip() function is very handy as it generates an iterator of tuples that helps pairing each salary to the corresponding weight .

Output:
44225.35

Method #3: Using Numpy Average() Function

The numpy package includes an average() function (that has been imported above) where you can specify a list of weights to calculate a weighted average. This is by far the easiest and more flexible method to perform these kind of computations in production:

Output:
44225.35

Conclusion

In this brief tutorial, we learnt how weighted averages should be the preferred option every time data is presented in an aggregated or grouped way, where some quantities or frequencies can be identified. We also found at least 3 methods to compute a weighted average with Python either with a self-defined function or a built-in one. I would be curious to know if you use any other algorithm or package to compute weighted averages, so please do leave a comment!

A Note For My Readers

This post includes affiliate links for which I may make a small commission at no extra cost to you, should you make a purchase.

You May Also Like

Data Engineering
Python
Programming
Careers
Recommended from ReadMedium