GitHub - ualiangzhang/modeling_long_term_future: Code for ICLR 2019 paper Learning Dynamics Model by Incorporating the Long Term Future

This repo contains code for our paper Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

The code base contains multiple branches.

The main branch contains experiments for the BabyAI tasks.
The mujoco branch contains experiments for the Mujoco tasks.
The carracing branch contains experiments for CarRacing task.

Based on code base for the BabyAI project at Mila. https://github.com/mila-iqia/babyai

Follow similar installations as in https://github.com/mila-iqia/babyai.

Requirements:

Installation

Requirements:

Python 3.5+
OpenAI Gym
NumPy
PyQT5
PyTorch 0.4.1+

Start by manually installing PyTorch. See the PyTorch website for installation instructions specific to your platform.

Then, clone this repository and install the other dependencies with pip3:

git clone https://github.com/facebookresearch/modeling_long_term_future.git
cd modeling_long_term_future
pip3 install --editable .

Create a new conda env using env.yml in the repo

Training teacher

We use the BabyAI Pickup-Unlock game.

First train the teacher (for imitation learning) using PPO with curriculum learning. Start with a room size of 6 and then work our way up to room size of 15.

python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 6
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 8 --model MODEL_ROOM6_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 10 --model MODEL_ROOM8_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 12 --model MODEL_ROOM10_PRETRAINED
python3 -m scripts.train_curclm. --env BabyAI-UnlockPickup-v0  --algo ppo   --arch cnn1 --tb --seed 1 --save-interval 10 --room-size 15 --model MODEL_ROOM12_PRETRAINED

Generate expert trajectories

Generate expert trajectories from the experts trained using curriculum learning

mnkdir data
python3 -m scripts.gen_samples --episodes 10000 --env BabyAI-UnlockPickup-v0 --model pretrained_model_room_10 --room 10

Training the student to imitate the expert

To run our model

python3 -m scripts.zforcing_main_state_dec --env BabyAI-UnlockPickup-v0 --datafile EXPERT_DATA_TO_LOAD --model MODEL_NAME --eval-episodes 100 --eval-interval 200  --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.0001 --bwd-l2-weight 1. --kld-weight-start 0.2  --aux-weight-end 0.0001  --room 10

To run the baseline

python3 -m scripts.zforcing_main_state_dec --datafile EXPERT_DATA_TO_LOAD --env BabyAI-UnlockPickup-v0 --model MODEL_NAME --eval-episodes 100 --eval-interval 200  --bwd-weight 0.0 --lr 1e-4 --aux-weight-start 0.000 --aux-weight-end 0.0 --room 10

License

Find license in LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
babyai		babyai
scripts		scripts
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Training teacher

Generate expert trajectories

Training the student to imitate the expert

License

About

Releases

Packages

Languages

License

ualiangzhang/modeling_long_term_future

Folders and files

Latest commit

History

Repository files navigation

Installation

Training teacher

Generate expert trajectories

Training the student to imitate the expert

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages