This is a modular RL code base for research. The intent is to enable surgical modifications by designing the base agent as a list of modules that all live inside the agent's global namespace (so they can all access each other directly by name). This means we can change the algorithm of a complex hierarchical, multi-goal, intrinsically motivated, etc. agent from DDPG to SAC by simply changing the algorithm module (and adding the additional critic network). Similarly, to add something like a forward model, intrinsic motivation, landmark generation, a new HER strategy, etc., you only need to create/modify the relevant module(s).
The agent has life-cycle hooks that the modules "hook" into. The important ones are: _setup
(called after all modules are set but before any environment interactions), _process_experience
(called with each new experience), _optimize
(called at each optimization step), save/load
(called upon saving / loading the agent).
See comments in mrl/agent_base.py
, brief test scripts in tests
, and example TD3/SAC Mujoco agents in experiments/benchmarks/train_online_agent.py
.
The modular structure is technically framework agnostic, so could be used with either Pytorch or TF-based modules, or even a mix, but right now all modules that need a framework use Pytorch.
Train loop is easily customized, so that you can do, e.g., BatchRL, transfer, or meta RL with minimal modifications.
Environment parallelization is done via VecEnv, and we rely on GPU for optimization parallelization. Future work should consider how they can be done asynchronously; e.g., using Ray.
mrl provides state of the art implementations of SAC, TD3, and DDPG+HER. See the Mujoco and Multi-goal benchmarks.
There is a requirements.txt
that was works with venv:
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
Then pip install
the appropriate version of Pytorch
by following the instructions here: https://pytorch.org/get-started/locally/.
To run Mujoco
environments you need to have the Mujoco binaries and a license key. Follow the instructions here.
To test run:
pytest tests
PYTHONPATH=./ python experiments/mega/train_mega.py --env FetchReach-v1 --layers 256 256 --max_steps 5000
The first command should have 3/3 success. The second command should solve the environment in <1 minute (better than -5 avg test reward).
To understand how the code works, read mrl/agent_base.py
.
See tests/test_agent_sac.py
and experiments/benchmarks
for example usage. The basic outline is as follows:
- Construct a config object that contains all the agent hyperparameters and modules. There are some existing base configs / convenience methods for creating default SAC/TD3/DDPG agents (see, e.g., the benchmarks code). If you use
argparse
you can use a config object automatically populate the parser usingparser = add_config_args(parser, config)
. - Call
mrl.config_to_agent
on the config to get back an agent. - Use the agent however you want; e.g., call its train/eval methods, save/load, module methods, and so on.
To add functionality or a new algorithm, you generally just need to define a one or more modules that hook into the agent's lifecycle methods and add them to the config. They automatically hook into the agent's lifecycle methods, so the rest of the code can stay the same.
Implemented:
- DDPG, TD3, SAC, basic DQN
- HER (computed online)
- Random ensemble DDPG (based on An Optimistic Perspective on Offline Reinforcement Learning --- could be improved)
- N-step returns (computed online) (see Rainbow) [not compatible with HER]
- MLE versions of DDPG/TD3 using Gaussian critic (called ``Sigma'' in the code, cf. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles)
- Fixed-horizon DDPG (based on Fixed-Horizon Temporal Difference Methods)
- Gamma as an auxiliary task (just pass a vector of k gammas and use a k-output critic --- last Gamma will be used to train policy) (based on Hyperbolic Discounting and Learning over Multiple Horizons)
- Support for goal-based intrinsic motivation, in goal-based environments
Some todos:
- Distributional predictions
- Uncertainty predictions
- Hierarchical RL
- Support for goal-based intrinsic motivation in general environments
- DQN variants
Below is a list of papers that use mrl
. If you use mrl
in one of your papers please let us know and we can add you to the list. If you build on the experiments related to the below papers, please cite the original papers:
- Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning (ICML 2020 (15 minute presentation), Arxiv, ALA 2020 Best Paper (25 minute presentation)
- ProtoGE: Prototype Goal Encodings for Multi-goal Reinforcement Learning (RLDM 2019, pdf) [As of July 2020, this is still far and away the state-of-the-art on Gym's Fetch environments]
- Counterfactual Data Augmentation using Locally Factored Dynamics (NeurIPS 2020 (3 minute presentation, 5 minute presentation), Arxiv)
- Planning Goals for Exploration (ICLR 2023 Spotlight (openreview)), codebase (also implements model-based SkewFit)
If you use or extend this codebase in your work, please consider citing:
@misc{mrl,
author = {Pitis, Silviu and Chan, Harris and Zhao, Stephen},
title = {mrl: modular RL},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/spitis/mrl}},
}
This code has used parts of the following repositories:
- DeepRL by Shangtong Zhang (parts of networks, normalizer, random processes / schedule)
- Spinning Up (parts of logger, certain code optimizations)
- Baselines (VecEnv, RingBuffer, normalizer, plotting)
Silviu Pitis (spitis), Harris Chan (takonan), Stephen Zhao (Silent-Zebra)