avatarTheSkepticalStatistician

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1874

Abstract

om?utm_source=medium&utm_medium=referral">Unsplash</a></figcaption></figure><p id="bdcd">Companies love to use average pay to show that people are being paid a decent wage. The problem is that positions higher in the organization make insanely more and will “pull” the average higher misrepresenting what someone can expect to make.</p><p id="5ba7">For example, consider a company with 3 employees. Employee 1 makes 7.25/hour, employee 2 makes 8.00/hour, and employee 3 makes 50/hour. Suddenly, the average employee earns 21.75/hour at the company. They can advertise this as it <b><i>isn’t a lie</i></b> but we all know it’s far from what people should expect to make.</p><p id="ed0d">What happens here? Why does the average say 21.75/hour but clearly 2/3 employees make very little? The answer is the data is really skewed so the average isn’t that useful. The employee making 50/hour makes the companies pay look high but when you look at the pay for each employee it is a different story. <b>Slick, I know.</b></p><h2 id="3188">#2: Political Poll Data</h2><figure id="5f9f"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*kqLRKMGS5c-AWWmv"><figcaption>Photo by <a href="https://unsplash.com/@eagleboobs?utm_source=medium&amp;utm_medium=referral">Elliott Stallion</a> on <a href="https://unsplash.com?utm_source=medium&amp;utm_medium=referral">Unsplash</a></figcaption></figure><p id="34f5">Around election season, tons of polls are taken to understand how voters feel about certain issues or candidates. This is done to get an average sense for which way voters are leaning on key parts of the ballot as well as to market certain ideas (think about polls used to show people like/hate the president).</p><p id="bd6e">The problem with these is usually the data collection method cannot be trusted. Parties have incentives to show their proposal or ca

Options

ndidate is favored.</p><p id="d283">For example, suppose you are president of the US. You want to show that Americans really like you as president. To do this, whatever party you are from would take polls but instead of randomly selecting people, they would try to select people they know are in favor of you. When they take the data and average it, it would then look like Americans favor you as president. They will never tell us how they chose the people they asked only that the average American is in favor of you. <b>Slick, I know!</b></p><h2 id="7c07">#3: Performance Metrics</h2><p id="68ec">Have you ever watched a sports talk show and they are talking about a player that is inconsistent? Some times the player has a huge game then other times it is like they were not even playing? For example, sometimes in football a wide receiver has 150+ yards games while other games they don’t catch a single pass.</p><p id="ded0">If you take their average, it might look really good ( which is often used to represent the player). On average, they might have 70 yards per game. With that stat line, it looks like they are a dependable player that will help the team every game.</p><p id="c562">Truth is, the player’s performance is so unreliable that the average performance does not tell you anything and relying on them is like gambling. That is what happens when the data is so spread out that it can’t be trusted.</p><p id="ce2b">I’m sure by now you are thinking of times people have told you what happens on average. Well as you can see, what happens on average may be misleading. From these three examples, we can see averages can be easily manipulated or just outright used incorrectly. Averages are heavily influenced by the first 4 factors and they can be made to say anything. For this reason, I am skeptical of averages and you should be too.</p></article></body>

Why I am Skeptical of Averages and Why You Should Be Too

Photo by Lukas from Pexels

As a statistician and data scientist, I use averages frequently. They can be useful for summarizing data and can work well for doing inference when used appropriately. But all too often, I see averages being used incorrectly in multiple places as a representation for everything. Because of this, people often have a misrepresentation of what the true distribution of the data looks like.

Before we believe an average, we need to ask more questions to make sure it represents what we want it to represent. We need to question things like:

1) What is the sample size used to create the average?

2) How spread out is the data?

3) What is the shape of the data?

4) How was the data collected?

Without answers to these questions, it’s hard to know if the average should be believed as a representation of the group. Let’s take a few examples of common places averages are used and why you should be skeptical of them.

#1:How Much Companies Pay Employees

Photo by NeONBRAND on Unsplash

Companies love to use average pay to show that people are being paid a decent wage. The problem is that positions higher in the organization make insanely more and will “pull” the average higher misrepresenting what someone can expect to make.

For example, consider a company with 3 employees. Employee 1 makes 7.25/hour, employee 2 makes 8.00/hour, and employee 3 makes 50/hour. Suddenly, the average employee earns 21.75/hour at the company. They can advertise this as it isn’t a lie but we all know it’s far from what people should expect to make.

What happens here? Why does the average say 21.75/hour but clearly 2/3 employees make very little? The answer is the data is really skewed so the average isn’t that useful. The employee making 50/hour makes the companies pay look high but when you look at the pay for each employee it is a different story. Slick, I know.

#2: Political Poll Data

Photo by Elliott Stallion on Unsplash

Around election season, tons of polls are taken to understand how voters feel about certain issues or candidates. This is done to get an average sense for which way voters are leaning on key parts of the ballot as well as to market certain ideas (think about polls used to show people like/hate the president).

The problem with these is usually the data collection method cannot be trusted. Parties have incentives to show their proposal or candidate is favored.

For example, suppose you are president of the US. You want to show that Americans really like you as president. To do this, whatever party you are from would take polls but instead of randomly selecting people, they would try to select people they know are in favor of you. When they take the data and average it, it would then look like Americans favor you as president. They will never tell us how they chose the people they asked only that the average American is in favor of you. Slick, I know!

#3: Performance Metrics

Have you ever watched a sports talk show and they are talking about a player that is inconsistent? Some times the player has a huge game then other times it is like they were not even playing? For example, sometimes in football a wide receiver has 150+ yards games while other games they don’t catch a single pass.

If you take their average, it might look really good ( which is often used to represent the player). On average, they might have 70 yards per game. With that stat line, it looks like they are a dependable player that will help the team every game.

Truth is, the player’s performance is so unreliable that the average performance does not tell you anything and relying on them is like gambling. That is what happens when the data is so spread out that it can’t be trusted.

I’m sure by now you are thinking of times people have told you what happens on average. Well as you can see, what happens on average may be misleading. From these three examples, we can see averages can be easily manipulated or just outright used incorrectly. Averages are heavily influenced by the first 4 factors and they can be made to say anything. For this reason, I am skeptical of averages and you should be too.

Statistics
Data Science
Data
Mathematics
Science
Recommended from ReadMedium