What is Probability?
A Review of Three Schools of Thought
We hear people talking about probability every day. For example, most of us heard about it in school: when you were in a high school or college math class, you probably had to answer questions like “What’s the probability that you draw two red cards in a row?” or “What’s the probability that you roll a nine when you roll two dice?” But we hear it in more real-world contexts, too. We see estimates of probability every time we check the weather: our weather app might tell us that at 5 PM tomorrow, there’s a 30% chance of rain. Probabilistic statements show up in news articles, too. For example, you might read statements like “People who don’t exercise regularly are more likely to become obese” or “The chances of having twins are about 1 in 250.”

In recent decades, talking about probability has become ubiquitous in two specific fields: sports games and political elections. In the world of sports, people are using ever-more-complicated models to estimate the probability that a given team will win a game. Not only that, we’ve figured out how to estimate that probability from any given point in time in the game, such that we can plot how the team’s probability of victory changes over the course of the game. Similarly, in the months leading up to a presidential election, various pundits will calculate their estimates of the probability that each candidate will win the election, and they adjust their estimates every time a new poll comes in.

One person in particular has been instrumental in bringing both of these forms of prediction into our popular understanding: Nate Silver. On his website, FiveThirtyEight, Silver provides estimates of the probability of victory for every upcoming major-league sports game and every election. For reasons that are over my head, he seems to have a better ability to calculate these probabilities than any of the other pundits, and he correctly predicts winners more often than anyone else. His talent for statistical predictions has earned him such celebrity that even people who don’t like math know who he is and trust his judgment. And in the last three presidential elections, there has been a sizeable number of people who check FiveThirtyEight every single day to see how Silver’s estimate of the probability that their candidate will win has changed since yesterday.

But all of these discussions of probability leave open an important question: what is probability? If the weather app says there’s a 30% chance of rain at 5 PM, what does that mean? If FiveThirtyEight says there’s a 70.6% chance that my candidate will win the election, what does that mean? What is probability? Is probability an inherent property of the universe or just a concept that humans came up with?
There are three schools of thought that attempt to answer these questions: the frequentist school, the Bayesian school, and the propensity theory. In this essay, I will 1. Discuss these three schools of thought and when it makes the most sense to use each of them, and 2. Explain why I subscribe to the propensity theory of probability.
The Frequentist School
The frequentist interpretation of probability (also called frequentism) posits that the probability that an event will occur is the fraction of the time that it occurs ……. if you could test it a very large number of times.
A frequentist thinks of everything in terms of repeated tests. If you want to find the probability that something happens, then test it one million times. Count up the number of times that it happened and divide by one million. That’s the probability that it happens.
For example, if you draw one card from a deck of cards, what’s the probability that it’s a diamond? Everyone knows it’s one fourth. But a frequentist would interpret the problem like this: if you gave one million people a deck of cards and told them all to draw the top card, then approximately one fourth of them would get a diamond. (And theoretically, if you had an infinite number of people, then exactly one fourth of them would get a diamond.) Thus, the probability is one fourth.

Notice that they expanded the problem. Instead of looking at you and your deck of cards, they imagined setting up an experiment with a very large number of people, each with their own deck of cards. The basis of frequentism is expanding the problem to a large sample and testing each individual case.
The frequentist interpretation is often applied to specific events in the future. For example, FiveThirtyEight states that in this Sunday’s game between the Houston Texans and the Jacksonville Jaguars, the Texans have a 65% chance of winning. A frequentist would interpret that statement as follows: if we could simulate the game over and over again, then the Texans would win approximately 65% of the time.

That may seem like a bit of an intellectual stretch, but in practice, a frequentist calculates these estimates by looking at similar events in the past. So a frequentist might look at the entire history of the NFL and identify all the games that have similarities to this game between the Texans and the Jaguars. And after applying that information to a mathematical model, they might come up with the number 65%. Thus, even though the definition of frequentism might seem like it’s too complex and theoretical, it is actually a very pragmatic approach to probability.

Another advantage of frequentism is that it makes for an easy interpretation of confidence intervals and hypothesis testing, both of which are central concepts in statistics.
The Bayesian School
The Bayesian interpretation of probability holds that probability is simply a measure of how confident we are that something is true. If the probability is 100%, we’re certain that it’s true. If it’s 0%, we’re certain that it’s false. If it’s 75%, we’re inclined to think that it’s true. If it’s 25%, we’re inclined to think that it’s false. If it’s 50%, our inclinations are tied, and it’s a complete “toss-up”.
In other words, under the Bayesian school, probability is a measure of how much evidence there is for something being true or false. If the probability is 90%, that means there is strong evidence that it’s true and very little evidence that it’s false. If it’s 10%, then there is strong evidence that it’s false and very little evidence that it’s true. If it’s 50%, then there’s just as much evidence that it’s true as there is that it’s false. Anytime we gain evidence that it’s true, the probability goes up, and anytime we gain evidence that it’s false, the probability goes down.

For example, if you draw a card from a deck, what’s the probability that it’s a diamond? A Bayesian would answer like this: the card is sitting right there at the top of the deck, but it’s face-down, so we don’t know what it is. It is either a heart, diamond, club, or spade, but since it’s face-down, we don’t know which. However, we know that the deck contains thirteen hearts, thirteen diamonds, thirteen clubs, and thirteen spades. There are four suits and the same number of cards in each suit. Thus, our knowledge indicates that the probability that this card is a diamond is one fourth.

The premise of the Bayesian interpretation is that everything you want to know has already been determined; we just don’t know what it is yet. But we gather evidence, and we state our probability as a measure of how much evidence we have for the different possibilities.
The question then is how you would translate the evidence into an actual calculation of the probability. The most honest answer to this question would be that I don’t know: much of it is over my head. However, I do know that a crucial aspect of the Bayesian approach is Bayes’s Theorem, which is a famous equation about conditional probability that tells us how to update our estimate of the probability in light of new evidence.
The Bayesian approach is ideally suited for situations where you have an individual who is a member of a large population, and you make a guess about the individual based on the population they belong to. Consider the following (contrived) example. A 60-year-old man asks his doctor, “Do you think I have cancer?” The doctor answers, “Less than 5% of 60-year-olds have cancer, so I think that’s pretty unlikely.” The patient adds, “By the way, I’ve been smoking two packs of cigarettes every day for the last 40 years.” The doctor responds, “In that case, there’s a much higher chance that you have cancer.” That was a very Bayesian analysis by the doctor: he estimated the probability (the degree of confidence) by known properties of the overall population, but as more information about the individual became available, he modified his estimate in light of the new evidence. And these are the sort of inferences we make every day.

The Bayesian interpretation of probability is not well-suited for events in the future, but it’s not impossible to use it in that context. Going back to the example of the Texans and the Jaguars, a Bayesian might interpret FiveThirtyEight’s estimate as follows: the Texans will either win or lose, but we don’t know which. However, when we run our model, which is based on a set of assumptions and takes historical data into account, we see that there is considerably more evidence that the Texans will win than that the Jaguars will win. To be exact, based on our evidence, we are 65% sure that the Texans will win.
However, a frequentist or an advocate of the propensity theory would interpret that same estimate very differently.
The Bayesian interpretation and Bayes’s Theorem are both named after Thomas Bayes, an 18th Century English statistician and theologian. But oddly enough, Bayes himself never explicitly promoted either of those ideas. He stated a special case of what would later be called Bayes’s Theorem, but he did not state the full theorem. And while he may have implicitly used the Bayesian interpretation of probability, he did not explicitly promote it. It was really Pierre-Simon Laplace, the brilliant French polymath, who developed and promoted the worldview (of probability) that we now call the Bayesian interpretation. As such, the word “Bayesian” is something of a misnomer.

The Propensity Theory
The propensity theory of probability holds that probability is an inherent property of the universe that expresses how likely it is that a future event will occur. The premise of this theory is that the future has not yet been determined, but that some possible future events are more likely to occur than others, and probability is a measure of how likely some possible future event is.
Thus, the propensity theory of probability is strictly for the future. It would make no sense to use it with regard to an unknown fact about the present. Going back to the cards example, if we’re talking about a deck of cards that’s right in front of me, and I ask, “What’s the probability that the top card is a diamond?” it would make no sense to use the propensity theory, because we’re dealing with something in the present, not the future. (The top card is sitting right there.) We could use the Bayesian or frequentist interpretations, but not the propensity theory. But if I were talking about a game I’ll be playing next week, and I ask, “What’s the probability that when the game starts, the first card I pick up will be a diamond?” then the propensity theory would make sense.
Notice that the propensity theory does not actually define the word “probability.” It describes probability as a measure of the likelihood that a possible future event will occur, but that’s not a definition. Trying to define the word “probability” by saying that probability is a measure of likelihood of possible future events is like trying to define the word “dog” by saying that a dog is a canine. You didn’t define it; you just replaced it with a synonym. The word “likelihood” is a synonym of the word “probability”, so if I say that probability is a measure of the likelihood that a possible future event will occur, then that’s not really a definition.

On the other hand, even if it’s not a definition, it does get the point across. The propensity theory holds that probability is like space, time, matter, etc. It’s just an inherent property of the universe. And just as you can easily understand the concepts of space, time, and matter without having any way to define those words, you can understand the concept of probability under the propensity theory even without any definition of “probability” itself.

Naturally, the propensity theory of probability is best suited for situations that involve the future. The example of the Texans-Jaguars game is a good fit for it, and I would make a guess that most people would interpret FiveThirtyEight’s estimate in the style of the propensity theory. When presented with the estimate that the Texans have a 65% chance of winning, most people would think that the outcome of the game is currently undetermined, but that (according to FiveThirtyEight) a Texans’ win is more likely than a Jaguars’ win, by a margin of 65% to 35%. However, not everyone would interpret this estimate in this way. Some people would rather use a frequentist or Bayesian approach.
Out of the three interpretations, the propensity theory is the most philosophical and the least practical. The frequentist and Bayesian interpretations both easily lend themselves to practical applications and calculations, but the propensity theory does not. If I say that probability is an inherent property of the universe that represents the likelihood of a possible future event, that doesn’t provide me with any means of calculating this probability in any real-world example. In fact, I have heard several people present the debate over the definition of probability as just a debate between the frequentist school and Bayesian school. They neglect to mention (and may not even know) that there is a third option. But that is understandable, because the frequentist and Bayesian interpretations are both tied to real-world calculations, while the propensity theory is not.
But in spite of its impracticality, my personal experience seems to indicate that the propensity theory is the most popular among the general public (that was a Bayesian statement). Most people don’t think about these distinctions, of course, but the idea that probability is a measure of the inherent likelihood of a future event in an undetermined future (the propensity theory) seems to have caught on. I think that most people subscribe to that interpretation, even if they never consciously think about it.
My View
I subscribe to the propensity theory of probability, because 1. It seems the most intuitive to me, and 2. I have a problem with both the frequentist and Bayesian interpretations.
My problem with the frequentist interpretation is that I don’t think the definition of probability needs to involve repeated simulations. Past examples and repeated simulations may be the best way to estimate a probability, but I don’t think they’re relevant to the definition of probability. From a philosophical perspective, if we want to know what the word “probability” really means, I think it is best presented by the propensity theory, which does not involve repeated simulations.

Meanwhile, my problem with the Bayesian interpretation has to do with future events. Much of the time (probably more than half the time) when someone is trying to calculate the probability of something, they’re dealing with an event in the future, whether it’s the probability that a sports team will win their next game, the probability that their candidate will win the election, the probability that it will rain next Monday, etc. And since so many probability problems involve future events, I think that any good definition of probability has to do a good job of expressing what it means for a certain future event to have a certain probability. But the Bayesian interpretation is best suited for situations in the present, not the future.
It is possible to apply the Bayesian interpretation to specific future real-world events, but when you do, there seems to be a certain implication that the future has already been determined. (And I don’t like that implication.) Since the Bayesian interpretation is a measure of our degree of confidence in the truth of some statement, it assumes that this statement is either true or false — we just don’t know which. Thus, if someone applies this interpretation to a future event, that seems to be implying that it’s already been determined whether this event will happen or not — we just don’t know which.

To be fair, it’s not a direct implication, and most of the people who would use it on future events (that is, most Bayesians) are not trying to imply that the future is pre-determined. They’re just using the Bayesian interpretation because that’s the one that makes the most sense to them. Nevertheless, even if it’s not direct, there does seem to be a certain implication that the future is pre-determined when someone uses the Bayesian view with regard to a future event. And since I don’t like that implication, I don’t subscribe to the Bayesian interpretation of probability.
I understand that the frequentist and Bayesian interpretations are both much more applicable to the real world than the propensity theory. But if you asked me, as a philosophical question, “What do you think probability is?” I would respond, “Probability is a measure of how inherently likely a future event is.” That’s the propensity theory of probability. And that’s the theory that I believe in.
— — — — *** — — — —
Additional Notes (June 5, 2021):
I posted this article on 10/31/19, but it was only today (6/5/21) that I added the pictures. I cite them below.
The Texans did win that game over the Jaguars on 11/3/19. The score was 26–3. That was such a random example, though.
So far, I’ve neglected to mention a fourth theory of probability: the classical definition, under which the probability of something is just the number of ways it can happen divided by the total number of possible outcomes. But that only makes sense for simple things like card games and rolling a pair of dice. The classical interpretation has a very limited scope, and it isn’t considered one of the major theories (which is why I didn’t even mention it, until now).
Image Citations:
(In order of appearance)
1. The weather app on my phone, which gets its predictions from the Weather Channel.
2. Graph from an online article: “What Real-Time Gambling Data Reveals about Sports,” by Todd Schneider, 2014.
3. Photo from ABC News
4. Photo from WorthPoint.com
5. Both images from Wikipedia
6. Hand-drawn by me!
7. Photo by me
8. Photo from Wikipedia
9. Photo from website,
10. Both portraits from Wikipedia
11. Picture by me, photo of dog from Wikipedia
12. Photo from Wikipedia
13. Drawing from an APS webpage, with APS being the Association for Psychological Science
14. Diagram by me






