Skip to content

Latest commit

 

History

History
125 lines (88 loc) · 3.33 KB

README.md

File metadata and controls

125 lines (88 loc) · 3.33 KB

arXiv

In-Context Reinforcement Learning from Noise Distillation

pic

This repo contains official implementation of In-Context Reinforcement Learning from Noise Distillation. The experiments in different environments are separated in folders. We did this on purpose not to overload our code with unnecessary if-else statements depending on environment to preserve readability.

Dependencies

The Dark environments could be installed through pip or Docker, while Watermaze is installed only with Docker. Watermaze depends heavily on the modified dm_lab code, so the dependencies are not that easily managed.

Dark Environments

As easy as it gets, you may just install all the requirements with:

python install -r requirements.txt

Tested on python 3.8.

If you'd like to use Docker, then do

# implying you are in dark_room 
# or key_to_door directory
docker build -t <IMAGE_NAME> .

To run the code, use:

docker run -it \
    --gpus=all \
    --rm \
    --name <CONTAINER_NAME> \
    <IMAGE_NAME> bash

and then execute scripts.

Watermaze

You have two options, the first one is to obtain a Docker image from DockerHub:

docker push suessmann/btd_dm_lab:1.1ad

The second option is to build a container yourself:

docker build -t <IMAGE_NAME> .

To run the scripts, use the following code:

# implying you are in the root of directory
docker run -it \ 
    --workdir /workspace \
    --rm \
    --volume ./watermaze/:/workspace/ \
    --name <CONTAINER_NAME> \
    <IMAGE_NAME> bash

Running experiments

Dark Environemnts

To run an experiment, simply run ad_<env_name>.py script, the data will generate automatically.

For example, if you wish to train AD$^\eps$ on Key-to-Door env with 50% performance from an optimal demonstrator:

python ad_dark_key2door.py --config_path="configs/ad-k2d.yaml" \
       --max_perf=0.5

Watermaze

Since data for Watermaze is heavy (~500GB), we cannot provide it to you. However, you can generate it yourself, first you obtain demonstrator policy by running

python ppo_watermaze.py --goal_x=<x> --goal_y=<y>

as many times as many goals you want. Then, generate noisy trajectories with 50% performance with

# note that we provide eps=0.7 
# for max_perf=0.5
python generate_watermaze_traj.py --num_goals=<num_goals> --hist_len=<hist_len> --eps 0.7

and then, finally, run the training script:

python ad_watermaze.py --config_path="configs/ad-watermaze.yaml" --learning_histories_path=<path>

there you go!

Citing

If you used this code for your research or a project, please cite us as:

@inproceedings{zisman2024emergence,
  title = 	 {Emergence of In-Context Reinforcement Learning from Noise Distillation},
  author =       {Zisman, Ilya and Kurenkov, Vladislav and Nikulin, Alexander and Sinii, Viacheslav and Kolesnikov, Sergey},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  year = 	 {2024},
}