The provided web content offers an introduction to the Poisson distribution, detailing its history, applications, theoretical underpinnings, and practical examples, with a focus on its utility in data science.
Abstract
The article "Predicting the Unpredictable: An Introduction to the Poisson Distribution" presents a comprehensive overview of the Poisson distribution, a fundamental discrete probability distribution used in various fields such as data science, insurance, and e-commerce. It discusses the distribution's origin by Siméon Denis Poisson, its mathematical formulation through the probability mass function (PMF), and its derivation from the Binomial distribution. The text illustrates the Poisson distribution's application with real-world examples, such as predicting the number of customers in a retail shop within a given time frame. It also provides visual representations of the distribution for different mean values, demonstrating how the distribution changes with the mean rate of occurrences. The article emphasizes the importance of understanding the Poisson distribution for data scientists and offers resources for further learning, including code on GitHub and a newsletter for ongoing insights in the field.
Opinions
The author considers the Poisson distribution an essential concept for data scientists due to its wide range of applications.
The article suggests that the Poisson distribution is intuitive and useful for quantifying the probability of events occurring a specific number of times within a given time interval.
The author implies that the Poisson distribution is derived from the Binomial distribution, indicating a mathematical relationship between the two.
By providing Python-generated plots, the author conveys that visual representation aids in understanding the distribution's behavior under different conditions.
The author's inclusion of their GitHub code and newsletter signifies a belief in the value of practical examples and continuous learning for data science professionals.
Predicting the Unpredictable: An Introduction to the Poisson Distribution
An overview of one of the most famous probability distributions
The Poisson distribution is a ubiquitous discrete probability distribution. It was published by Siméon Denis Poissonin the early 19th century and since found applications in many industries, including insurance, epidemiology, and e-commerce. Therefore, it is an essential concept of Data Scientists to be aware of. In this post, we will dive into the intricacies of the distribution and provide real world examples.
Intuition
The core concept of the Poisson distribution is to quantify the probability of an event happening a specific number of times within a given time interval.
As an example, let’s consider a retail shop that receives 20 customers per hour on average. Using the Poisson distribution, we can calculate the probability of the shop receiving a specific number of customers within an hour, such as 10, 15, or 30.
The Poisson distribution is parametrised by λ, which is the mean of the number of occurrences, E(X) = λ, and the variance, VAR(X) = λ, of the distribution. See here for a derivation of the mean and variance.
It is worth noting that the Poisson distribution is actually derived from the Binomial distribution. Although we will not delve into the derivation in this article, the interested reader can find it here.
The conditions for the Poisson distribution:
The number of events, k, occur independently (Poisson process)
The events occur randomly within the time interval
The expected number of events is fixed
The probability of getting an event at any point in the time interval is equal
Examples & Plots
Returning to our previous shop example, where the mea number of customers per hour is 20. What would be the probability of the shop receiving 10 customers in one hour?
So, what we have is:
λ = 20
k = 10
And, inputting these values into the PMF formula:
Equation by author in LaTeX.
As we can see, it is very low. To gain a better intuition of the distribution of customer visits, we can plot the entire PMF:
Plot generated by author in Python.
As observed, the distribution of customer visits follows an almost bell curve shape, with the most likely number of customers being 20. This makes as 20 is the expected number. For further insight, let’s explore some scenarios where the mean number of customer visitors is 10 or 30 and plot the corresponding distributions:
Plot generated by author in Python.
So, when the mean gets smaller, the majority of the probability mass in the distribution shifts towards the left. This shift is expected because the mean rate represents the expected rate of customer visits. Therefore, it is more likely for the number of visitors to be around the mean value.
Summary & Further Thoughts
The Poisson distribution is a widely used and famous probability distribution in Data Science and Statistics. It models the probability of events occurring at a specific rate, based on a given mean rate. The Poisson distribution finds applications in various industries, including genetics, insurance, and fraud detection, among others.
If you would like to view the full code used in this blog, it is available on my GitHub here:
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist, and the latest AI news to keep you in the loop. There is no “fluff” or “clickbait”, just pure actionable insights from a practicing Data Scientist.