This project aims to provide some implementations of the most typical reinforcement learning algorithms.
- VPG (Vanilla Policy Gradient, with a baseline)
- DQN
- Prioritized DQN
- Rainbow
- IQN
- A2C/A2C with GAE/MAC
- PPO
- DDPG
- TD3
- SAC
- CFR/OS-MCCFR/ES-MCCFR/DeepCFR
- Minimax
- Behavior Cloning
If you are looking for tabular reinforcement learning algorithms, you may refer ReinforcementLearningAnIntroduction.jl.
Some built-in experiments are exported to help new users to easily run benchmarks with one line. For experienced users, you are suggested to check the source code of those experiments and make changes as needed.
E`JuliaRL_BasicDQN_CartPole`
E`JuliaRL_DQN_CartPole`
E`JuliaRL_PrioritizedDQN_CartPole`
E`JuliaRL_Rainbow_CartPole`
E`JuliaRL_IQN_CartPole`
E`JuliaRL_A2C_CartPole`
E`JuliaRL_A2CGAE_CartPole`
(Thanks to @sriram13m)E`JuliaRL_MAC_CartPole`
(Thanks to @RajGhugare19)E`JuliaRL_PPO_CartPole`
E`JuliaRL_VPG_CartPole`
(Thanks to @norci)E`JuliaRL_VPG_Pendulum`
(continuous action space)E`JuliaRL_VPG_PendulumD`
(discrete action space)E`JuliaRL_DDPG_Pendulum`
E`JuliaRL_TD3_Pendulum`
(Thanks to @rbange)E`JuliaRL_SAC_Pendulum`
(Thanks to @rbange)E`JuliaRL_PPO_Pendulum`
E`JuliaRL_BasicDQN_MountainCar`
(Thanks to @felixchalumeau)E`JuliaRL_DQN_MountainCar`
(Thanks to @felixchalumeau)E`JuliaRL_Minimax_OpenSpiel(tic_tac_toe)`
E`JuliaRL_TabularCFR_OpenSpiel(kuhn_poker)`
E`JuliaRL_DeepCFR_OpenSpiel(leduc_poker)`
E`JuliaRL_DQN_SnakeGame`
E`JuliaRL_BC_CartPole`
E`JuliaRL_BasicDQN_EmptyRoom`
E`Dopamine_DQN_Atari(pong)`
E`Dopamine_Rainbow_Atari(pong)`
E`Dopamine_IQN_Atari(pong)`
E`rlpyt_A2C_Atari(pong)`
E`rlpyt_PPO_Atari(pong)`
julia> ] add ReinforcementLearning
julia> using ReinforcementLearning
julia> run(E`JuliaRL_BasicDQN_CartPole`)
julia> ] add ArcadeLearningEnvironment
julia> using ArcadeLearningEnvironment
julia> run(E`rlpyt_PPO_Atari(pong)`) # the Atari environment is provided in ArcadeLearningEnvironment, so we need to install it first
- Experiments on
CartPole
usually run faster with CPU only due to the overhead of sending data between CPU and GPU. - It shouldn't surprise you that our experiments on
CartPole
are much faster than those written in Python. The secret is that our environment is written in Julia! - Remember to set
JULIA_NUM_THREADS
to enable multi-threading when using algorithms likeA2C
andPPO
. - Experiments on
Atari
(OpenSpiel
,SnakeGame
,GridWorlds
) are only available after you haveArcadeLearningEnvironment.jl
(OpenSpiel.jl
,SnakeGame.jl
,GridWorlds.jl
) installed andusing ArcadeLearningEnvironment
(using OpenSpiel
,using SnakeGame
,import GridWorlds
).
- Different configurations might affect the performance a lot. According to our tests, our implementations are generally comparable to those written in PyTorch or TensorFlow with the same configuration (sometimes we are significantly faster).
The following data are collected from experiments on Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz with a GPU card of RTX 2080ti.
Experiment | FPS | Notes |
---|---|---|
E`Dopamine_DQN_Atari(pong)` |
~210 | Use the same config of dqn.gin in google/dopamine |
E`Dopamine_Rainbow_Atari(pong)` |
~171 | Use the same config of rainbow.gin in google/dopamine |
E`Dopamine_IQN_Atari(pong)` |
~162 | Use the same config of implicit_quantile.gin in google/dopamine |
E`rlpyt_A2C_Atari(pong)` |
~768 | Use the same default parameters of A2C in rlpyt with 4 threads |
E`rlpyt_PPO_Atari(pong)` |
~711 | Use the same default parameters of PPO in rlpyt with 4 threads |