Poisson Distribution — Intuition, Examples, and Derivation

When should you use a Poisson Distribution?

Before we set the parameter λ and plug it into the formula, let’s pause for a second and ponder a few questions.

Why did Poisson have to invent the Poisson Distribution?

Why does this distribution exist (= why did he invent it)?

When is Poisson the appropriate choice for modeling?

1. Why did Poisson invent the Poisson Distribution?

To predict the number of events that will take place in the future!

More precisely, to predict the probability of a specific number of events occurring within a fixed time interval.

If you have ever sold anything, this “event” could be defined as, for example, a customer making a purchase (the moment of truth, not merely browsing). It could also be how many visitors you get on your website a day, the number of clicks on your ads in the coming month, how many phone calls you receive during your shift, or even the number of people who will succumb to a fatal disease next year, and so on.

Here is an example of how I would use Poisson in real life:

Every week, on average, 17 people clap for my blog post.

I’d like to predict the number of people who will clap next week, say, because I get paid weekly based on those numbers.

What is the probability that exactly 20 people (or 10, 30, 50, etc.) will clap for the blog post next week?

2. For now, let’s pretend we don’t know anything about the Poisson Distribution. Then how would you solve this problem?

One approach would be to start with the number of readers. Each person who reads the blog has some probability that they will really like it and clap.

And this seems like a classical binomial distribution problem, since we are calculating the probability of a specific number of successful events (claps).

A binomial random variable represents the number of successes x in n repeated trials. And we assume the probability of success p remains constant throughout each trial.

However, we only have one piece of information in this case —a “rate” of 17 people per week (the average number of successes per week, or the expected value of x). We don’t know the clapping probability p or the number of blog visitors n.

To solve this problem, we need a bit more information.

What additional information do we need to frame this probability as a binomial problem?

I believe we need two things: the probability of success (claps) p and the number of trials (visitors) n.

Let’s gather this information from past data.

The stats for my Medium blog post about Gradient Descent

These stats cover a 1-year period. A total of 59k people read my blog. Among them, 888 clapped.

As a result, the number of people who read my blog per week (n) is 59k/52 = 1134. The number of people who clapped per week (x) is 888/52 =17.

The number of people who read per week (n) = 59k/52 = 1134

The number of people who clap per week (x) = 888/52 = 17

Success probability (p): 888/59k = 0.015 = 1.5%

Using the Binomial PMF, what is the probability that I’ll get exactly 20 successes (20 people who clap) next week?

[Binomial Probability for different x’s]

╔══════╦════════════════╗
║   x  ║ Binomial P(X=x)║
╠══════╬════════════════╣
║  10  ║    0.02250     ║
║  17  ║    0.09701     ║  →🡒The average rate has the highest P!
║  20  ║    0.06962     ║  →🡒Nice. 20 is also quite likely!
║  30  ║    0.00121     ║
║  40  ║  < 0.000001    ║  → Well, I guess I won’t get 40 claps.
╚══════╩════════════════╝

We just solved the problem with a binomial distribution.

Then, what is the Poisson distribution for? What are the things that only Poisson can do but Binomial can’t?

3. The limitations of the Binomial Distribution

a) The Binary Nature of the Binomial Distribution

A binomial random variable is “BI-nary” — either 0 or 1.

In our example, we have 17 people per week who clapped. This means 17/7 = 2.4 people clapped per day, and 17/(7*24) = 0.1 people clapping per hour.

If we try to model the success probability by hour (0.1 people/hr) using the binomial random variable, we encounter a problem: most of the hours will have zero claps, but some hours will get exactly 1 clap. However, it is very possible that certain hours will receive more than 1 clap (e.g. 2, 3, or 5 claps).

The problem with the binomial distribution is that it CANNOT account for more than one event within a given unit of time (in this case, 1 hour). And the time unit can only have 0 or 1 event.

Then, how about dividing 1 hour into 60 minutes, making the unit time smaller (1 minute)? This will allow for multiple events to occur within an hour (although each minute would still contain exactly one or zero event).

Is the problem now solved?

Kind of. This approach somewhat mitigates the issue, but the problem caused by the 0/1 nature still remains for smaller time units. For example, what if several people clap during that minute? (i.e. someone tweeted about your blog post and the number of visitors spiked right then.) What should we do then? We can again split a minute into seconds. Then our unit of time is a second, and a minute can have more than one event. But this binary container problem will always be there for ever-smaller time units.

To truly overcome this limitation, we can use the Possion distribution. The main idea is that we can make the Binomial random variable handle multiple events by splitting a unit time into smaller units. Using smaller divisions, we can make the original unit time hold more than one event.

Mathematically, this is represented as n → ∞. Since we assume the rate is constant, p must approach 0 (p → 0). Because otherwise, the product n*p, which is the number of events, will blow up.

By taking the limit, the unit times become infinitesimally small. Consequently, we no longer have to worry about more than one event happening within the same unit time. This is the foundation of the Poisson distribution derivation.

b) The number of trials (n) must be known in the Binomial Distribution

If you use Binomial, you can’t just use the rate (e.g. 17 people per week) to calculate the success probability. You need “more info” (n & p) in order to use the binomial PMF. The Poisson distribution, on the other hand, doesn’t require you to know n or p. We are assuming n is infinitely large and p is very small. The only parameter of the Poisson distribution is the rate λ (the expected value of x). In real life, knowing only the rate (e.g., I got three phone calls between 2 and 4 p.m.) is much more common than knowing both n and p.

4. Let’s derive the Poisson formula mathematically from the Binomial PMF.

Now you know where λ^k , k! and e^-λ come from!

Lastly, we only need to show that the multiplication of the first two terms, n!/((n-k)!*n^k), is 1 as n approaches infinity.

It is 1.

We've got the Poisson Formula!

From https://en.wikipedia.org/wiki/Poisson_distribution

Now the Wikipedia explanation starts to make sense.

Try putting your own numbers into the formula and see if P(x) makes sense!

Here’s mine.

[ Comparison between Binomial & Poisson ]

╔══════╦═══════════════════╦═══════════════════════╗
║   k  ║  Binomial P(X=k)  ║  Poisson P(X=k;λ=17)  ║
╠══════╬═══════════════════╬═══════════════════════╣
║  10  ║      0.02250      ║        0.02300        ║
║  17  ║      0.09701      ║        0.09628        ║
║  20  ║      0.06962      ║        0.07595        ║
║  30  ║      0.00121      ║        0.00340        ║
║  40  ║    < 0.000001     ║      < 0.000001       ║
╚══════╩═══════════════════╩═══════════════════════╝

* Both can be easily calculated here:
Binomial:  https://stattrek.com/online-calculator/binomial.aspx
Poisson :  https://stattrek.com/online-calculator/poisson.aspx

Footnote:

Even though the Poisson distribution is a model of rare events, the rate parameter λ can be any number. It doesn’t always have to be small.
The Poisson Distribution is asymmetric; it is always skewed toward the right. Because it is limited by the zero occurrence barrier (there is no such thing as “minus one” clap) on the left and has no limit on the right.
As λ increases, the graph starts to look more like a normal distribution.

https://en.wikipedia.org/wiki/Poisson_distribution

4. The Poisson Model Assumptions

a. The average rate of events per unit time is constant. This means that the number of people who visit your blog *per hour* might not follow a Poisson Distribution, because the hourly rate is not constant (it’s higher during the day and lower during the night). Using a monthly rate for consumer/biological data for Poisson distribution would be a rough estimate as well, since seasonality has a big effect in that domain.

b. Events are independent. In reality, your blog readers' arrivals may not always be independent. For example, sometimes a lot of people come to your blog all at once because someone famous talked about your blog, or because your blog was featured on Medium’s first page, etc. Similarly, the number of earthquakes per year in a nation may not follow a Poisson Distribution if a very strong earthquake increases the likelihood of aftershocks.

5. Relationship between a Poisson and an Exponential Distribution

If the number of events per unit time follows a Poisson distribution, then the time between events follows an exponential distribution. Although the Poisson distribution is discrete and the exponential distribution is continuous, the two distributions are closely related.

Let’s go deeper: Exponential Distribution Intuition