Grace Zhang, Linghan Zhong, Youngwoon Lee, Joseph J. Lim at USC CLVR lab
[Project website] [Paper] [arXiv]
This project is a PyTorch implementation of Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding (RSS 2021).
The ability to transfer a policy from one environment to another is a promising avenue for efficient robot learning in realistic settings where task supervision is not available. To succeed, such policy transfer must overcome both the visual and dynamics domain gap between source and target environments. We propose IDAPT, a novel policy transfer method with iterative environment grounding that alternates between (1) directly minimizing both visual and dynamics domain gaps by grounding the source environment in the target environment, and (2) training a policy on the grounded source environment. The empirical results on locomotion and robotic manipulation tasks demonstrate that our method can effectively transfer a policy across large domain gaps with minimal interaction with the target environment.
- Ubuntu 18.04 or above
- Python 3.6 or above
- Mujoco 2.0
run.py
: sets up experiment and runs trainingtraining/
: our method and baseline implementations- See domain randomization configuration instructions here.
config/
: hyper-parametersenvironments/
: registers environments used for the paper (SawyerPush, FetchReach)
-
Clone this repo.
git clone https://github.com/clvrai/idapt.git
-
Install python dependencies.
pip install -r requirements.txt
-
Set the environment variable for headless rendering.
export PYOPENGL_PLATFORM="EGL"
-
Download demonstration files with task names (e.g. InvertedPendulum, HalfCheetah, Walker2d, FetchReach, SawyerPush).
python download_data.py [TASK_NAME] # example: python download_demos.py Walker2d SawyerPush
Adding --unity True
to the command will automatically download the Unity app.
For headless servers, virtual display needs to be executed (e.g., sudo /usr/bin/X :1
) and specify the virtual display id (e.g. --virtual_display :1
).
In macOS, if the app does not launch due to the not verified developer, go to the directory ./binary
, right-click Furniture.app
, and click Open
once. Then, the app can be launched by our environment without error.
-
Train ours
python -m run --source_env=InvertedPendulum-v2 --target_env=GymInvertedPendulumDM-v0
-
Train DR-Narrow
python -m run --source_env=GymInvertedPendulumDM-v2 --target_env=GymInvertedPendulumDM-v0 --dr=True --dr_params_set=IP_min
-
Train DR-Wide
python -m run --source_env=GymInvertedPendulumDM-v2 --target_env=GymInvertedPendulumDM-v0 --dr=True --dr_params_set=IP_max
-
Train ours
python -m run --source_env=HalfCheetah-v3 --target_env=GymHalfCheetahDM-v0 --data=backwards
-
Train DR-Narrow
python -m run --source_env=HalfCheetah-v3 --target_env=GymHalfCheetahDM-v0 --dr=True --dr_params_set=HC_min
-
Train DR-Wide
python -m run --source_env=HalfCheetah-v3 --target_env=GymHalfCheetahDM-v0 --dr=True --dr_params_set=HC_max
-
Train ours
python -m run --source_env=GymWalker-v0 --target_env=GymWalkerDM-v0 --data=backwards
-
Train DR-Narrow
python -m run --source_env=GymWalker-v0 --target_env=GymWalkerDM-v0 --dr=True --dr_params_set=WK_min
-
Train DR-Wide
python -m run --source_env=GymWalker-v0 --target_env=GymWalkerDM-v0 --dr=True --dr_params_set=WK_max
-
Train ours
python -m run --source_env=FetchReach-v1 --target_env=GymFetchReach-v0 --unity=True --action_rotation_degrees=45 --action_z_bias=-0.5
-
Train DR-Narrow
python -m run --source_env=FetchReach-v1 --target_env=GymFetchReach-v0 --dr=True --dr_params_set=FR_min --action_rotation_degrees=45 --action_z_bias=-0.5
-
Train DR-Wide
python -m run --source_env=FetchReach-v1 --target_env=GymFetchReach-v0 --dr=True --dr_params_set=FR_max --action_rotation_degrees=45 --action_z_bias=-0.5
-
Train ours
python -m run --source_env=SawyerPushZoom-v0 --target_env=SawyerPushShiftViewZoomBackground-v0 --unity=True --target_env_puck_friction=2.0 --target_env_puck_mass=0.05
-
Train DR-Narrow
python -m run --source_env=SawyerPushZoom-v0 --target_env=SawyerPushShiftViewZoomBackground-v0 --dr=True --dr_params_set=FR_min --action_rotation_degrees=45 --action_z_bias=-0.5
-
Train DR-Wide
python -m run --source_env=SawyerPushZoom-v0 --target_env=SawyerPushShiftViewZoomBackground-v0 --dr=True --dr_params_set=FR_max --action_rotation_degrees=45 --action_z_bias=-0.5
@inproceedings{zhang2021policy,
title={Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding},
author={Grace Zhang and Linghan Zhong and Youngwoon Lee and Joseph J. Lim},
booktitle={Robotics: Science and Systems},
year={2021},
address={Virtual},
month={July},
DOI={10.15607/RSS.2021.XVII.006}
}
- PyTorch implementation of Dynamics Cycle-Consistency paper: https://github.com/sjtuzq/Cycle_Dynamics
- Sawyer MuJoCo environment: https://github.com/StanfordVL/robosuite
- Sawyer assets: https://github.com/vikashplus/sawyer_sim
- Domain randomization: https://github.com/matwilso/domrand
- Unity-MuJoCo interface: https://github.com/clvrai/furniture