Exponential Distribution — Intuition, Derivation, and Applications

When to Use an Exponential Distribution

Before diving into the formulas, it’s crucial to understand the “why” behind the exponential distribution. With a solid grasp of the underlying principles, you’ll be more likely to apply this knowledge effectively in your own work.

1. Why did we have to invent Exponential Distribution?

Exponential distribution is often used to predict the waiting time until the next event occurs, such as a success, failure, or arrival.

For example, Exponential Distribution can be used to predict:

The amount of time it takes a customer to make a purchase in your store (success)
The amount of time until hardware on AWS EC2 fails (failure)
The amount of time you need to wait for the bus to arrive

Then, my next questions would be: Why is λ * e^(−λt) the PDF of the time until the next event happens? What does X ~ Exp(0.25) mean?

Does the parameter 0.25 represent 0.25 minutes, hours, or days, or is it 0.25 events?

From this point on, I’m going to assume that you know the Poisson distribution well. If you don’t, this article will explain it to you.

X ~ Exp(λ) 👉 Is the exponential parameter λ the same as λ in Poisson?

One crucial point to remember that helps avoid confusion regarding X ~ Exp(0.25) is that 0.25 is not a time duration; instead, it represents an event rate, which corresponds to the parameter λ in a Poisson process.

For example, if your blog has 500 visitors per day, that is a rate. Similarly, the number of customers arriving at the store in an hour, the frequency of earthquakes in a year, the count of car accidents in a week, or the occurrence of typos on a blogpost, or the number of hairs found in Chipotle bowl, are all examples of rates (λ) for a given time unit. This kind of rate serves as the parameter for the Poisson distribution.

On the other hand, when modeling the elapsed time between two events, we often use time rather than rate. For example, we would describe the lifespan of a computer without failure as 10 years (rather than 0.1 failures per year, which is a rate), the arrival of a customer every 10 minutes, or the occurrence of major hurricanes every 7 years. In such cases, the terminology “mean” of the exponential distribution refers to 1/λ, and this represents the average time duration.

Confusion can arise when you encounter the term “decay parameter”, or even worse, the term “decay rate” which is frequently used when discussing exponential distribution. The decay parameter is expressed in terms of time (e.g. every 10 minutes, every 7 years, etc.), which is a reciprocal (1/λ) of the rate (λ) in Poisson distribution. Put another way, if you get 3 customers per hour, that means you get one customer every 1/3 hour.

Understanding this, what does “X ~ Exp(0.25)” mean?

We can interpret “X ~ Exp(0.25)” as having a Poisson rate of 0.25. In a unit time (minute, hour or year), the event occurs 0.25 times on average. Converting this to time terms, it takes 4 hours (a reciprocal of 0.25) until the event occurs, assuming your unit time is an hour.

* Confusion-free: The parameter of the exponential distribution (λ) is the same as that of the Poisson process (λ).

2. Let’s derive the PDF of Exponential from scratch.

To understand why λ * e^(−λt) is the PDF of the time until the *next* event occurs, we need to consider the definition of exponential distribution: the probability distribution of the time *between* the events in a Poisson process.

The time until the next event occurs implies that no event occurs during the waiting period.

This is, in other words, Poisson (X=0).

**Poisson(X=0)**: the first step in the derivation of Exponential distribution.

One important point about the Poisson PDF is that it models the number of events X occurring in a *single* unit of time.

Given this, how would you model the probability distribution of “nothing happens during the time duration t” instead of simply “during one unit of time”?

P(Nothing happens during t time units)

= P(X=0 in the first time unit) 
  * P(X=0 in the second time unit) 
  * … * P(X=0 in the t-th time unit)

= e^−λ * e^−λ * … * e^−λ = e^(-λt)

The Poisson distribution assumes that events occur independently of one another. So we can multiply the probability of “no events in a single unit of time (P(X = 0))” t times to calculate the probability of “no events (zero success) during a time duration t”. This results in the expression e^(-λt).

P(T > t) = P(X=0 during t time units) = e^−λt

* T: the random variable of our interest, the waiting time until the first event
* X: the number of events that follow a Poisson distribution.

* P(T > t): The probability that the waiting time until the first event is greater than t time units
* P(X = 0 in t time units): The probability of observing zero events in t time units

A PDF is a derivative of the CDF. So, in order to find the probability density function (PDF) of an exponential distribution, we can differentiate its cumulative distribution function (CDF), 1 — P(T > t).

The PDF of the exponential distribution can be obtained by differentiating 1 — e^(−λt) with respect to t.

3. The Memoryless Property

The memoryless property of the exponential distribution can be defined as:

[The definition]

P(T > a + b | T > a) = P(T > b)

This means that given an exponential random variable T, the probability that T exceeds a sum of two time periods (a + b) given that it has already exceeded the first period a, is equal to the probability that T exceeds just the second period b.

A picture is worth a thousand words:

Can you show me the mathematical proof? 👇

Is memoryless a “useful” property?

Can we realistically model the lifespan of a mechanical device using an exponential distribution? For example, if the device has lasted for nine years already, memorylessness implies that the probability of it lasting for another three years (totaling 12 years) is the same as that of a brand-new machine lasting for the next three years.

In equation form, they are:

P(T > 12|T > 9) = P(T > 3)

But wait, does this equation seem right to you?

For me, it doesn’t. From what I’ve seen, a device is more likely to break as it gets older. To model this characteristic—an increasing hazard rate — we can use, for example, a Weibull distribution.

Then, when should you use an exponential distribution?

Car accidents. Your probability of being involved in one does not increase or decrease based on whether you’ve been accident-free for the past five hours. That’s why λ is often referred to as the hazard rate.

Which other distributions have the memoryless property?

The exponential distribution is the only continuous distribution that is memoryless. The geometric distribution, its discrete counterpart, is the only discrete distribution that is memoryless.

4. Applications IRL 🔥

a) Modeling waiting time

Exponential random variable values have more small values and fewer large ones. For example, the bus that you are waiting for will probably come within the next 10 minutes rather than the next 60 minutes.

We can use the exponential distribution to answer these questions:

1. The bus arrives every 15 minutes on average. (assuming the time that elapses from one bus to the next is exponentially distributed, which make sense given that the total number of buses arriving within an hour follows a Poisson distribution.) You’ve just missed the bus! The driver was unkind. The moment you arrived, the driver closed the door and drove away. If the next bus doesn’t arrive within the next 10 minutes, you have to call Uber or you’ll be late. What’s the probability the next bus arrives within the next 10 minutes?

2. Within how many minutes does 90% of the buses arrive after the previous one?

3. On average, how long does it take for two buses to arrive?

* Post your answers in the comments to see if they’re correct.

b) Reliability (failure) modeling

Since we can model successful events (bus arrivals), why not model failure events as well, like a product’s lifespan?

Assume that the number of hours AWS hardware can run before needing a restart is exponentially distributed with an average of 8,000 hours (about a year).

1. You don’t have a backup server and need an uninterrupted 10,000-hour run. What’s the probability of completing the run without having to restart the server?

2.What is the probability that the server doesn’t require a restart between 12 and 18 months?

Note that the exponential distribution might not always be appropriate to use, such as when the failure rate changes throughout a product’s lifespan. However, it’s the only distribution with this unique property — a constant hazard rate.

c) Service time modeling (Queuing Theory)

The service times of agents (e.g., how long it takes for a Chipotle employee to make you a burrito) can also be modeled as exponentially distributed variables.

The total length of a process — a sequence of several independent tasks — follows the Erlang distribution: the distribution of the sum of several independent exponentially distributed variables.

5. Recap: Relationship between a Poisson and an Exponential Distribution

If the number of events per unit time follows a Poisson distribution, then the time between events follows the exponential distribution.

Assuming the time between events is not affected by the times between previous events (e.g., events are independent), then the number of events per unit time follows a Poisson distribution with rate λ = 1/μ.

6. Exercise

I’ve found that most of what I know about math comes from solving problems. So, I encourage you to do the same by attempting the exercises below, even if they take some time.

Let U be a uniform random variable between 0 and 1. Then we can generate an exponential random variable X using the formula:

X = -1/λ * ln(U)

Prove why this is true.

2. In the PDF of the exponential distribution, the maximum value on the y-axis is λ. Explain the reason behind this.

Probability Density Function of Exponential Distribution

3. Consider two independent exponential random variables X1 and X2, both with the rate λ:

X1 ~ Exp(λ) X2 ~ Exp(λ)

Let Y=X1+X2.

What is the PDF of Y? In what applications can this distribution be used?

The solution can be found here.

Exponential Distribution — Intuition, Derivation, and Applications

When to Use an Exponential Distribution

1. Why did we have to invent Exponential Distribution?

X ~ Exp(λ) 👉 Is the exponential parameter λ the same as λ in Poisson?

2. Let’s derive the PDF of Exponential from scratch.

3. The Memoryless Property

Is memoryless a “useful” property?

4. Applications IRL 🔥

5. Recap: Relationship between a Poisson and an Exponential Distribution

6. Exercise

Other intuitive math articles that you might enjoy:

Poisson Distribution Intuition (and derivation)

…Why did Poisson have to invent this?…

Beta Distribution — Intuition, Examples, and Derivation

…The difference between the Binomial and the Beta is that the former models the number of successes and the latter models the probability of success…