Code for ICML'23 paper "Policy Regularization with Dataset Constraint for Offline Reinforcement Learning", arXiv link.
If you find this repository useful for your research, please cite:
@inproceedings{
prdc,
title={Policy Regularization with Dataset Constraint for Offline Reinforcement Learning},
author={Yuhang Ran and Yi-Chen Li and Fuxiang Zhang and Zongzhang Zhang and Yang Yu},
booktitle={International Conference on Machine Learning},
year={2023}
}
pip install -r requirements.txt
Install the D4RL benchmark
git clone https://github.com/Farama-Foundation/D4RL.git
cd d4rl
pip install -e .
For halfcheetah:
python main.py --env_id halfcheetah-medium-v2 --seed 1024 --device cuda:0 --alpha 40.0 --beta 2.0 --k 1
For hopper & walker2d:
python main.py --env_id hopper-medium-v2 --seed 1024 --device cuda:0 --alpha 2.5 --beta 2.0 --k 1
We use reward shaping for antmaze, which is a common trick used by CQL, IQL, FisherBRC, etc.
python main.py --env_id antmaze-medium-play-v2 --seed 1024 --device cuda:0 --alpha 7.5 --beta 7.5 --k 1 --scale=10000 --shift=-1
tensorboard --logdir='./result'