This repo contains code accompaning the paper, Escaping from Zero Gradient: Revisiting Action-Constrained ReinforcementLearning via Frank-Wolfe Optimization (UAI 2021). It includes code for running the NFWPO algorithm presented in the paper, and other baseline methods such as DDPG+OptLayer, DDPG+Projection, DDPG+Reward Shaping, SAC+Projection, PPO+Projection, TRPO+Projection, FOCOPS.
This code requires the following:
- python 3.*
- TensorFlow v2.0+
- pytorch+cuda
- mujoco-py
- Gurobi
- CVXPY
- CvxpyLayer
- QPTH
To run the code, enter the directory of the corresponding environment, and run the following command:
(Change ALGORITHM_NAME
to the corresponding algorithm, which includes NFWPO
, DDPG_Projection
, DDPG_RewardShaping
, DDPG_OptLayer
, SAC_Projection
)
python3 [ALGORITHM_NAME].py
(To run other baselines such as PPO+Projection, TRPO+Projection, and FOCOPS, please refer to the description below.)
Following are the examples for running the experiments in Ubuntu.
To run the experiments metioned in Secion 4.1, please follow the instructions below:
- Enter the directory
BSS-3
:
cd BSS-3
- Set the random seed
arg_seed
between 0-4 in Line 25 ofNFWPO.py
. - Use the following command to train NFWPO:
python3 NFWPO.py
- To run other baseline methods, set the random seed
arg_seed
between 0-4 inDDPG_Projection.py
,DDPG_RewardShaping.py
, and run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
- The result is shown in Figure 1.
- Enter the directory
BSS-5
:
cd BSS-5
- Set the random seed
arg_seed
between 0-4 inNFWPO.py
. - Use the following command to train NFWPO:
python3 NFWPO.py
- To run other baseline methods, set the random seed
arg_seed
between 0-4 inDDPG_Projection.py
,DDPG_RewardShaping.py
, and setrandom_seed
inDDPG_OptLayer
. Then run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
python3 DDPG_OptLayer.py
- The result is shown in Figure 2.
- Enter the directory
NSFnet/src/gym
:
cd NSFnet/src/gym
- Set the random seed
arg_seed
between 0-4 inNFWPO.py
. - Use the following command to train NFWPO:
python3 NFWPO.py
- To run other baseline methods, set the random seed
arg_seed
between 0-4 inDDPG_Projection.py
,DDPG_RewardShaping.py
, and setrandom_seed
inDDPG_OptLayer
. Then run the corresponding command:
python3 DDPG_Projection.py
python3 DDPG_RewardShaping.py
python3 DDPG_OptLayer.py
- The result is shown in Figure 3.
To run the experiments metioned in Secion 4.3, please first enter the directory Reacher
for Reacher with nonlinear constraints, and enter Halfcheetah-State
for Halfcheetah with state-dependent constraints.
cd Reacher
cd Halfcheetah-State
To run NFWPO, DDPG+Projection, DDPG+Reward Shaping, DDPG+OptLayer, please refer to the description in the previous experiment.
To run SAC+Projection, use the following command:
python3 SAC+Projection
To run TRPO+Projection, PPO+Projection:
# For Halfcheetah-state task
python3 PPO_TRPO_Projection/PPO_Projection_Halfcheetah_State_Relate_gym.py
python3 PPO_TRPO_Projection/TRPO_Projection_Halfcheetah_State_Relate_gym.py
# For Reacher task
python3 PPO_TRPO_Projection/PPO_Projection_Reacher_State_Relate_gym.py
python3 PPO_TRPO_Projection/TRPO_Projection_Reahcer_State_Relate_gym.py
To run FOCOPS:
# For Halfcheetah-state task
python3 FOCOPS/focops_main_cheetah.py
# For Reacher task
python3 python3 FOCOPS/focops_main_reacher.py
The result is shown in Figure 4-5.
To run the experiments metioned in Appendix D.3, please first enter the directory Halfcheetah-CAPG
.
cd Halfcheetah-CAPG
Then run the following commands for corresponding baselines:
# CAPG+PPO
python3 CAPG_PPO_Halfcheetah_bound_constraints.py
# CAPG+TRPO
python3 CAPG_TRPO_Halfcheetah_bound_constraints.py
The result is shown in Figure 6.