Original PyTorch implementation of MCQ (NeurIPS 2022) from Mildly Conservative Q-learning for Offline Reinforcement Learning. The code is highly based on the offlineRL repository.
To use this codebase, one need to install the following dependencies:
- fire
- loguru
- tianshou==0.4.2
- gym<=0.18.3
- mujoco-py==2.0.2.8
- sklearn
- gtimer
- torch==1.8.0
- d4rl==1.1
- rlkit==0.2.1dev
- neorl==0.3.0 (https://github.com/polixir/NeoRL)
Once you have all the dependencies installed, run the following command
pip install -e .
I use python=3.8.5
to run all of the experiments. If you encounter errors of python version conflict, you can try run MCQ in python3.8 environment.
For MuJoCo tasks, we conduct experiments on d4rl MuJoCo "-v2" datasets by calling
python examples/train_d4rl.py --algo_name=MCQ --task d4rl-hopper-medium-replay-v2 --seed 6 --lam 0.9 --log-dir=logs/hopper-medium-replay/r6
For Adroit "-v0"/maze2d "-v1" tasks, we run on these datasets by calling
python examples/train_d4rl.py --algo_name=MCQ --task d4rl-maze2d-medium-v1 --seed 6 --lam 0.9 --log-dir=logs/maze2d-medium-v1/r6
The log is stored in the --log-dir
. One can see the training curve via tensorboard.
To modify the number of sampled actions, specify --num
tag, default is 10. To add normalization to offline data, specify --normalize
tag (but this is not required).
In the paper and our implementation, we update the critics via:
We do welcome the reader to try running with
If you use our method or code in your research, please consider citing the paper as follows:
@inproceedings{lyu2022mildly,
title={Mildly Conservative Q-learning for Offline Reinforcement Learning},
author={Jiafei Lyu and Xiaoteng Ma and Xiu Li and Zongqing Lu},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems},
year={2022}
}