Asyncronous RL in Tensorflow + Keras + OpenAI's Gym

This is a Tensorflow + Keras implementation of asyncronous 1-step Q learning as described in "Asynchronous Methods for Deep Reinforcement Learning".

Since we're using multiple actor-learner threads to stabilize learning in place of experience replay (which is super memory intensive), this runs comfortably on a macbook w/ 4g of ram.

It uses Keras to define the deep q network (see model.py), OpenAI's gym library to interact with the Atari Learning Environment (see atari_environment.py), and Tensorflow for optimization/execution (see async_dqn.py).

Requirements

tensorflow
gym
[gym's atari environment] (https://github.com/openai/gym#atari)
skimage
Keras

Usage

Training

To kick off training, run:

python async_dqn.py --experiment breakout --game "Breakout-v0" --num_concurrent 8

Here we're organizing the outputs for the current experiment under a folder called 'breakout', choosing "Breakout-v0" as our gym environment, and running 8 actor-learner threads concurrently. See this for a full list of possible game names you can hand to --game.

Visualizing training with tensorboard

We collect episode reward stats and max q values that can be vizualized with tensorboard by running the following:

tensorboard --logdir /tmp/summaries/breakout

This is what my per-episode reward and average max q value curves looked like over the training period:

Evaluation

To run a gym evaluation, turn the testing flag to True and hand in a current checkpoint file:

python async_dqn.py --experiment breakout --testing True --checkpoint_path /tmp/breakout.ckpt-2690000 --num_eval_episodes 100

After completing the eval, we can upload our eval file to OpenAI's site as follows:

import gym
gym.upload('/tmp/breakout/eval', api_key='YOUR_API_KEY')

Now we can find the eval at https://gym.openai.com/evaluations/eval_uwwAN0U3SKSkocC0PJEwQ

Next Steps

See a3c.py for a WIP async advantage actor critic implementation.

Resources

I found these super helpful as general background materials for deep RL:

Important notes

In the paper the authors mention "for asynchronous methods we average over the best 5 models from 50 experiments". I overlooked this point when I was writing this, but I think it's important. These async methods seem to vary in performance a lot from run to run (at least in my implementation of them!). I think it's a good idea to run multiple seeded versions at the same time and average over their performance to get a good picture of whether or not some architectural change is good or not. Equivalently don't get discouraged if you don't see performance on your task right away; try rerunning the same code a few more times with different seeds.
This repo has no affiliation with Deepmind or the authors; it was just a simple project I was using to learn TensorFlow. Feedback is highly appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
resources		resources
LICENSE		LICENSE
README.md		README.md
a3c.py		a3c.py
a3c_model.py		a3c_model.py
async_dqn.py		async_dqn.py
atari_environment.py		atari_environment.py
breakout.gif		breakout.gif
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asyncronous RL in Tensorflow + Keras + OpenAI's Gym

Requirements

Usage

Training

Visualizing training with tensorboard

Evaluation

Next Steps

Resources

Important notes

About

Releases

Packages

Contributors 5

Languages

License

coreylynch/async-rl

Folders and files

Latest commit

History

Repository files navigation

Asyncronous RL in Tensorflow + Keras + OpenAI's Gym

Requirements

Usage

Training

Visualizing training with tensorboard

Evaluation

Next Steps

Resources

Important notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages