Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision -- ICML 2021

Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Overview

Low-precision training has become a popular approach to reduce compute requirements, memory footprint, and energy consumption in supervised learning. In our ICML 2021 paper, we consider training reinforcement learning agents in low precision. Naively training in fp16 does not work well. After six modifications, we demonstrate that low-precision RL trains stably while decreasing computational/memory demands. This codebase contains code for our main experiments. Configuration and command-line arguments are handled via the excellent hydra framework.

[paper]

Installation

You will need an Nvidia GPU with a reasonably recent CUDA version to run the code.
Create an environment from env.yml via:

conda env create -f env.yml
conda activate lowprec_rl

Install deepmind control suite as per here.
You will need to set appropriate environment flags, e.g MUJOCO_GL=egl. You may also consider the flags HYDRA_FULL_ERROR=1 and OMP_NUM_THREADS=1.

Usage

To run an experiment in fp32 on the finger_spin environment with seed 123 use:
```
python train.py env=finger_spin seed=123
```
Results will appear in a folder named runs.

To use half-precision (fp16) for the actor, critic, and alpha use the code below. Note, this is expected to crash.

python train.py env=finger_spin seed=123 \
    agent.params.actor_half=True agent.params.crit_half=True agent.params.alpha_half=True

The command above typically crashes without our proposed methods. Our proposed methods can be independently toggled with

Method	Flags
hAdam	`agent.params.use_num_adam=True`
compound loss scaling	`agent.params.use_grad_scaler=True agent.params.adam_eps=0.0001`
normal-fix	`diag_gaussian_actor.params.stable_normal=True`
softplus-fix	`diag_gaussian_actor.params.tanh_threshold=10`
Kahan-momentum	`agent.params.soft_update_scale=10000`
Kahan-gradients	`agent.params.alpha_kahan=True agent.params.crit_kahan=True`

To apply all proposed methods,

python train.py env=finger_spin seed=123 \
    agent.params.actor_half=True agent.params.crit_half=True agent.params.alpha_half=True \
    agent.params.use_grad_scaler=True agent.params.adam_eps=0.0001 agent.params.use_num_adam=True \
    diag_gaussian_actor.params.tanh_threshold=10 diag_gaussian_actor.params.stable_normal=True \
    agent.params.soft_update_scale=10000 agent.params.alpha_kahan=True agent.params.crit_kahan=True

Citation

@inproceedings{bjorck2021low,
  title={Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision},
  author={Bj{\"o}rck, Johan and Chen, Xiangyu and De Sa, Christopher and Gomes, Carla P and Weinberger, Kilian},
  booktitle={International Conference on Machine Learning},
  pages={980--991},
  year={2021},
  organization={PMLR}
}

Acknowledgements

The starting point for our codebase is pytorch_sac.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
algs		algs
config		config
utils		utils
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
quant.py		quant.py
sac.py		sac.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision -- ICML 2021

Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Overview

Installation

Usage

Citation

Acknowledgements

About

Releases

Packages

Languages

License

nilsjohanbjorck/low-precision-rl

Folders and files

Latest commit

History

Repository files navigation

Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision -- ICML 2021

Johan Bjorck, Xiangyu Chen, Christopher De Sa, Carla P. Gomes, Kilian Q. Weinberger

Overview

Installation

Usage

Citation

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages