Strategies for Decaying Epsilon in Epsilon-Greedy

Summary

This article discusses three strategies for decaying epsilon in epsilon-greedy reinforcement learning: linear decay, exponential decay, and discrete interval decay.

Abstract

The exploration-exploitation dilemma is a fundamental aspect of reinforcement learning problems. The epsilon-greedy training strategy is a common approach to addressing this dilemma, where the parameter epsilon represents the probability of selecting a random control. At the beginning of training, epsilon starts at 1.0, and near the end, it should be a very small value. This article presents three methods for decaying epsilon effectively: linear decay, exponential decay, and discrete interval decay. Linear decay involves reducing epsilon linearly over time, while exponential decay involves reducing epsilon exponentially over time. Discrete interval decay involves reducing epsilon at specific intervals. The article provides Python code for implementing each of these strategies.

Bullet points

The exploration-exploitation dilemma is fundamental to reinforcement learning problems.
The epsilon-greedy training strategy is a common approach to addressing this dilemma.
Epsilon represents the probability of selecting a random control.
Epsilon starts at 1.0 at the beginning of training and should be a small value near the end.
Three methods for decaying epsilon effectively are presented: linear decay, exponential decay, and discrete interval decay.
Linear decay involves reducing epsilon linearly over time.
Exponential decay involves reducing epsilon exponentially over time.
Discrete interval decay involves reducing epsilon at specific intervals.
Python code for implementing each of these strategies is provided.

Strategies for Decaying Epsilon in Epsilon-Greedy

Photo by Dids on Pexels

The exploration-exploitation dilemma is fundamental to Reinforcement Learning (RL) problems. Early on in training an agent has not learned anything meaningful in terms of associating higher Q-values to certain controls in different states mainly because it hasn’t collected enough experience yet. Later on, after “enough” experience has been collected, it should begin exploiting its knowledge through the Q-values to act optimally in the environment.

How to Decay Epsilon During Training?

There are a few basic things that must be stated about decaying epsilon for the epsilon-greedy training strategy. In epsilon-greedy the parameter epsilon is our probability of selecting a random control. At the beginning of a training simulation epsilon starts at 1.0 and near the end it should be a very small value, e.g. 0.001 for reasons already discussed.

The reason epsilon should be probability one for selecting a random control starting out training is that we want the agent to explore many different controls across the state space (exploration). The reason we want epsilon small near the end of the training horizon is that we want the agent to exploit what it has learned.

In this article, I present three ways of decaying epsilon effectively and provide the necessary Python code you can add to your training simulation code. I also assume one million training steps, and fixed ending epsilon values, but these values an be played with in your own experiments. The three methods I present for decaying epsilon are:

Linear Decay

Exponential Decay

Discrete Interval Decay

Strategies for Decaying Epsilon in Epsilon-Greedy

How to Decay Epsilon During Training?

Linear Decay

Exponential Decay

Discrete Interval Decay

Join Medium with my referral link - Caleb M. Bowyer, M.S.

Read every story from Caleb M. Bowyer and (many other talented writers on Medium). Your membership fee directly…

Reinforcement Learning