Please use something which is actually kept up to date and properly debugged such as RLLAB, https://github.com/openai/rllab
This repo is intended to host a handful of reinforcement learning agents implemented using the Keras (http://keras.io/) deep learning library for Theano and Tensorflow. It is intended to make it easy to run, measure, and experiment with different learning configuration and underlying value function approximation networks while running a variery of OpenAI Gym environments (https://gym.openai.com/).
- pg: policy gradient method with Keras NN policy network
- dqn: q-learning agent with Keras NN Q-fn approximation (/w concurrent actor-learners)
sudo python setup.py install
./run_pong.sh
or
Exmaple: kerlym -e Go9x9-v0 -n simple_dnn -P
Usage: kerlym [options]
Options:
-h, --help show this help message and exit
-e ENV, --env=ENV Which GYM Environment to run [Pong-v0]
-n NET, --net=NET Which NN Architecture to use for Q-Function
approximation [simple_dnn]
-b BS, --batch_size=BS
Batch size durring NN training [32]
-o DROPOUT, --dropout=DROPOUT
Dropout rate in Q-Fn NN [0.5]
-p EPSILON, --epsilon=EPSILON
Exploration(1.0) vs Exploitation(0.0) action
probability [0.1]
-D EPSILON_DECAY, --epsilon_decay=EPSILON_DECAY
Rate of epsilon decay: epsilon*=(1-decay) [1e-06]
-s EPSILON_MIN, --epsilon_min=EPSILON_MIN
Min epsilon value after decay [0.05]
-d DISCOUNT, --discount=DISCOUNT
Discount rate for future reards [0.99]
-t NFRAMES, --num_frames=NFRAMES
Number of Sequential observations/timesteps to store
in a single example [2]
-m MAXMEM, --max_mem=MAXMEM
Max number of samples to remember [100000]
-P, --plots Plot learning statistics while running [False]
-F PLOT_RATE, --plot_rate=PLOT_RATE
Plot update rate in episodes [10]
-a AGENT, --agent=AGENT
Which learning algorithm to use [dqn]
-i, --difference Compute Difference Image for Training [False]
-r LEARNING_RATE, --learning_rate=LEARNING_RATE
Learning Rate [0.0001]
-E PREPROCESSOR, --preprocessor=PREPROCESSOR
Preprocessor [none]
-R, --render Render game progress [False]
-c NTHREADS, --concurrency=NTHREADS
Number of Worker Threads [1]
or
from gym import envs
env = lambda: envs.make("SpaceInvaders-v0")
import kerlym
agent = kerlym.agents.DQN(
env=env,
nframes=1,
epsilon=0.5,
discount=0.99,
modelfactory=kerlym.dqn.networks.simple_cnn,
batch_size=32,
dropout=0.1,
enable_plots = True,
epsilon_schedule=lambda episode,epsilon: max(0.1, epsilon*(1-1e-4)),
dufference_obs = True,
preprocessor = kerlym.preproc.karpathy_preproc,
learning_rate = 1e-4,
render=True
)
agent.train()
def custom_Q_nn(agent, env, dropout=0, h0_width=8, h1_width=8, **args):
S = Input(shape=[agent.input_dim])
h = Reshape([agent.nframes, agent.input_dim/agent.nframes])(S)
h = TimeDistributed(Dense(h0_width, activation='relu', init='he_normal'))(h)
h = Dropout(dropout)(h)
h = LSTM(h1_width, return_sequences=True)(h)
h = Dropout(dropout)(h)
h = LSTM(h1_width)(h)
h = Dropout(dropout)(h)
V = Dense(env.action_space.n, activation='linear',init='zero')(h)
model = Model(S,V)
model.compile(loss='mse', optimizer=RMSprop(lr=0.01) )
return model
agent = keras.agents.D2QN(env, modelfactory=custom_Q_nn)
If using this work in your research, citation of our publication introducing this platform would be greatly appreciated! The arXiv paper is available at https://arxiv.org/abs/1605.09221 and a simple bibtex entry is provided below.
@misc{1605.09221,
Author = {Timothy J. O'Shea and T. Charles Clancy},
Title = {Deep Reinforcement Learning Radio Control and Signal Detection with KeRLym, a Gym RL Agent},
Year = {2016},
Eprint = {arXiv:1605.09221},
}
Many thanks to the projects below for their inspiration and contributions
- https://github.com/dandxy89/rf_helicopter
- https://github.com/sherjilozair/dqn
- https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5
- https://github.com/coreylynch/async-rl
- Keras, Gym, TensorFlow and Theano
-Tim
Install pip install gym
and pip install gym[atari]
. If gym[atari] has install error, apt-get install cmake
.