PyTorch implementation of our action candidate based clipped double estimator (AC-CDE), action candidate based clipped Double Q-learning (AC-CDQ), action candidate based clipped Double DQN (AC-CDDQN) and action candidate based TD3 (AC-TD3).
Paper link arXiv.
-
For AC-CDE, we evaluate it on the multi-armed bandits problem. The result can be reproduced by running:
cd AC_CDE_code python3 main.py
-
For AC-CDQ, we evaluate it on the grid world game. The result can be reproduced by running:
cd AC_CDQ_code python3 main.py
-
For AC-CDDQN, we evaluate it on the MinAtar benchmark. The result can be reproduced by running:
cd AC_CDDQN_code CUDA_VISIBLE_DEVICES=0 python3 main.py
-
For AC-TD3, we evaluate it on MuJoCo continuous control tasks. The result can be reproduced by running:
cd AC_TD3_code CUDA_VISIBLE_DEVICES=0 python3 main.py