Programming | Interviewing | Office Hours
3 Ways To Compute A Weighted Average in Python
In this brief tutorial, I show how to compute weighted averages in Python either defining your own functions or using NumPy

Update: Many of you contacted me asking for valuable resources to automate Excel tasks with Python or to apply popular statistical concepts in Python. Below I share four courses that I would recommend:
- Programming For Data Science In Python (Nano-Degree)
- Practicing Statistics Interview Questions In Python
Hope you’ll find them useful too! Now enjoy the article :D
When To Use A Weighted Average?
Suppose you had to analyze the table below, showing the yearly salary for the employees of a small company divided in five groups (from lower to higher salary):

If you computed the simple average of the Salary Per Year column you would obtain:

But is £62,000 an accurate representation of the average salary across the groups? Because data comes already aggregated and each group has a different Employees Number, the average Salary Per Year for each group weights differently in the overall average. In computing the simple average, the same weight was assigned to each group leading to a biased result.
In this cases, the solution is to take into account the weight of each group by computing a weighted average that can be represented algebraically with the formula:

Where x represents the distribution ( Salary Per Year ) and w represents the weight to be assigned ( Employees Number). Given that the table includes five groups, the formula above becomes:

An by replacing x and w with actual figures, you should obtain the result below:

Note how taking weights into account, the average Salary Per Year across the groups is almost £18,000 lower than the one computed with the simple average and this is an accurate way to describe our dataset given the number of employees in each group.
Now that the theory has been covered, let’s see how to obtain a weighted average in Python using 3 different methods. In order to do that, the first step is to import packages and the employees_salary table itself:
import pandas as pd
from numpy import averagedf = pd.read_csv(‘C:/Users/anbento/Desktop/employee_salary.csv’)df.head()
distribution = df[‘salary_p_year’]
weights = df[‘employees_number’]Method #1 : Function Using List Comprehension
If you wish to code your own algorithm, the first very straightforward way to compute a weighted average is to use list comprehension to obtain the product of each Salary Per Year with the corresponding Employee Number ( numerator ) and then divide it by the sum of the weights ( denominator).






