Summary

This tutorial demonstrates how to use Keras Reinforcement Learning API to train a SARSA agent capable of playing the OpenAI Gym CartPole game.

Abstract

The provided content outlines a comprehensive guide to applying Keras Reinforcement Learning API for training a reinforcement learning model within the OpenAI Gym environment. Specifically, it focuses on using the CartPole game as a testbed for implementing a SARSA agent with an Epsilon Greedy Q-policy. The tutorial begins by introducing reinforcement learning concepts and directs readers to additional resources for foundational knowledge. It then proceeds to detail the steps for setting up the environment, defining the agent's neural network architecture, and training the agent using the Keras-RL library. The agent's performance is evaluated over multiple episodes, and the results indicate a significant improvement from random actions to a well-trained policy. The tutorial concludes by providing the code and resources on GitHub for further exploration and experimentation.

Opinions

The author emphasizes the importance of understanding reinforcement learning principles before diving into the practical implementation, suggesting that readers refer to introductory articles on DataCamp.
Random actions in the CartPole environment quickly lead to game over, highlighting the necessity of a well-defined policy for successful gameplay.
The use of a SARSA agent combined with an Epsilon Greedy Q-policy is presented as an effective approach for this particular problem.
The tutorial suggests that the Keras-RL library simplifies the process of building and training reinforcement learning models, making it accessible for learners and practitioners.
Visualizations are used to demonstrate the learning progress of the agent, providing a clear illustration of the improvement from initial random behavior to skilled performance.
The author encourages further learning and experimentation by sharing the code and trained model weights on GitHub.

Using Keras Reinforcement Learning API with OPENAI GYM

This tutorial focuses on using the Keras Reinforcement Learning API for building reinforcement learning models. To get an understanding of what reinforcement learning is please refer to these articles on DataCamp.

Introduction to Reinforcement Learning

What is Reinforcement Learning? Reinforcement Learning vs. the rest Intuition to Reinforcement Learning Basic concepts…

www.datacamp.com

My Journey to Reinforcement Learning - Part 1: Q-Learning with Table

My Journey to Reinforcement Learning - Part 1: Q-Learning with Tablewww.datacamp.com

In this tutorial, you will learn how to use Keras Reinforcement Learning API to successfully play the OPENAI gym game CartPole..

To Learn more about the GYM toolkit, visit

Gym: A toolkit for developing and comparing reinforcement learning algorithms

A toolkit for developing and comparing reinforcement learning algorithms

A toolkit for developing and comparing reinforcement learning algorithmsgym.openai.com

By the end of this tutorial, you will know how to use 1) Gym Environment 2) Keras Reinforcement Learning API

Assuming that you have the packages Keras, Numpy already installed, Let us get to installing the GYM and Keras RL package. Do this with pip as

pip install gym

pip install keras-rl

Import the following into your workspace

import gym
import random
import numpy as np
from keras.layers import Dense, Flatten
from keras.models import Sequential
from keras.optimizers import Adam

Specifying the environment name to the make method of gym will load the environment to your workspace. Load the Cart Pole environment from gym with

env = gym.make('CartPole-v1')

To get an idea about the number of variables affecting the environment, do

states = env.observation_space.shape[0]
print('States', states)

For the cart pole environment the input variables are position, velocity, angular position and angular velocity. To get an idea about the number of possible actions in the environment, do

actions = env.action_space.n
print('Actions', actions)

For the cart pole environment the responses are left and right.

There are three methods of the environment you should be familiar with

1) step() — This helps you execute an action by returning the (next_state, reward, done, info) resulting from that action. Where next_state — Indicates new state of the environment as a result of the step reward — Indicates the reward for that step done — indicates if you lost the game info — Gives system related info 2) render() — When called visualizes each step

3) reset() — Helps you reset the game returning a new state.

Let us now play 10 games / episodes and during each game we take random actions between left and right and see what rewards we get.

episodes = 10

for episode in range(1,episodes+1):
    # At each begining reset the game 
    state = env.reset()
    # set done to False
    done = False
    # set score to 0
    score = 0
    # while the game is not finished
    while not done:
        # visualize each step
        env.render()
        # choose a random action
        action = random.choice([0,1])
        # execute the action
        n_state, reward, done, info = env.step(action)
        # keep track of rewards
        score+=reward
    print('episode {} score {}'.format(episode, score))

Game scores over 10 games with random actions

Visualization of 10 games played with random actions

We see that we lost the game very quickly in each of the 10 games.

We will use the SARSA Agent and Epsilon Greedy Q policy to train our reinforcement learning model. To know more about the policies and the agents, please refer to other DataCamp reinforcement learning tutorials mentioned in the beginning of this tutorial.

Let us now define a simple keras model which will have 4 input neurons to accept the 4 state information and 2 output neurons with linear activation which will return the maximum possible reward to the two possible actions. Between the input and the output layers we have 3 Dense layers with 24 neurons and activation as relu.

def agent(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape = (1, states)))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model
  
model = agent(env.observation_space.shape[0], env.action_space.n)

Import the Agent and the policy as

from rl.agents import SARSAAgent
from rl.policy import EpsGreedyQPolicy

policy = EpsGreedyQPolicy()

We define the SARSA agent by specifying the model, policy and the nb_actions paramaters. Where the model is the keras model we have defined, the policy is EpsGreedyQPolicy and nb_actions is 2.

sarsa = SARSAAgent(model = model, policy = policy, nb_actions = env.action_space.n)

Compile the sarsa agent with mean squared error loss and adam optimizer.

sarsa.compile('adam', metrics = ['mse'])

We then fit it by specifying the environment and the number of steps you want to train it for.

sarsa.fit(env, nb_steps = 50000, visualize = False, verbose = 1)

With the training done, we will test it for 100 episodes and see what scores we get.

scores = sarsa.test(env, nb_episodes = 100, visualize= False)
print('Average score over 100 test games:{}'.format(np.mean(scores.history['episode_reward'])))

Save trained agent weights with…

sarsa.save_weights('sarsa_weights.h5f', overwrite=True)

If needed, one can load the saved weights with…

# sarsa.load_weights('sarsa_weights.h5f')

To finish off the tutorial, let us visualize the game as played by the trained agent. Setting visualize parameter to true is important to visualize the game.

_ = sarsa.test(env, nb_episodes = 2, visualize= True)

Visualization of 2 games as played by the agent

The code and the wights file for this tutorial can be found at

https://github.com/anagar20/Keras-Reinforcement-Learning-API-Cart-Pole-