How Data Can Fool You
Detecting bias in statistical experiments and understanding how they arise

As we all know, we can not always trust what we read or hear about on the internet, or in general for that matter. Having a basic understanding of statistics will go a long way when deciding if what you’re reading is valid or invalid. For starters, one should have a grasp on the concept of statistical bias.
“We don’t see things as they are. We see things as we are.” — Anais Nin
The most common type of bias is selection bias. Selection bias encompasses any type of bias where not all members of the population of interest have an equal chance of being selected for the sample. Today, I will present two of the most common types of selection bias.
Nonresponse Bias
This is when the certain groups of people who are apart of the experiment vary greatly from the people who are not. Some ways in which nonresponse bias can occur is when:
- some people are more inclined to respond to a survey
- the way a survey is administered is flawed (ex: survey through email)
Let’s take a look at the first bullet point. Consider a survey that concerns the view of classical music in a certain county. The survey is administered to the general public, and the results are great: everybody seems to like classical music! They’re so great that the county executive decides to only allow classical music to be played (of course this would never happen, but bear with me). After a few months, the county executive is astonished to find that the majority of the county is unhappy with the new rule.
What went wrong? Didn’t the survey show that classical music would have a positive effect?
Rewind time back to when the survey was made public. If you’re a person who does not know much about classical music (which happens to be quite a bit of people), you would not have a strong urge to complete the survey at hand. Meanwhile, classical music enthusiasts would be the first to respond. What happened was that the majority of responses logged were from people who exhibit positive bias, meaning that the survey will seem to indicate that classical music is well received in that county. In reality, those who are indifferent about the subject simply did not care to respond. Therefore, the survey inaccurately represented the county’s position on classical music.

Nonresponse bias is not just limited to the appeal participants have about a particular subject, though. Especially with surveys through email, the probability of nonresponse bias occuring is relatively high. For example, the email being sent to the recipient could have ended up in the spam folder, or it may be that the recipient does not use their email very often. All of these factors lead to the fact that a chunk of the target population is not being represented in the survey, and that chunk of people could very well have an opinion that differs from the people who did respond to the survey.
Sampling Bias
Sampling bias occurs when the sample collected is done in a non-random manner. To understand this concept, let’s look at another example. Imagine that you, a resident of Edison, intend to conduct a survey with the population of interest being people in New Jersey. However, due to difficulty traveling, your sample is only composed of people who live in Edison. This type of sampling is called convenience sampling, and it is a common indicator that sampling bias will follow. Because of the fact that you only chose to include people from Edison, and the intended population interest was all of New Jersey, not everybody had an equal chance of being chosen. For obvious reasons, this is an issue; if the population of interest is all of New Jersey, the sample should, without a doubt, be designed to include all residents of New Jersey. If not, it is not possible to generalize findings back to the population.
Having this knowledge of statistical bias, it will be much easier to detect credibility while reading. If you find an article online suggesting that the Houston Rockets play better basketball than the Los Angeles Lakers, but you notice that the survey was administered in Houston where there are more Rockets fans than Lakers fans, that should raise a red flag. Keeping track of where these red flags appear will make the process of forming opinions a lot more effective.
