Skip to content

Source code for experiments in "Identifying and Addressing Delusions for Target- Directed Decision Making"

License

Notifications You must be signed in to change notification settings

mila-iqia/Delusions

Repository files navigation

Delusions

A PyTorch Implementation for experiments in

"Identifying and Addressing Delusions for Target-Directed Decision Making"

authored by Mingde "Harry" Zhao, Tristan Sylvain, Doina Precup, Yoshua Bengio

arXiv

This repo was implemented by Harry Zhao (@PwnerHarry), mostly adapted from Skipper

This work was done during Harry's Mitacs Internship at Borealis AI (RBC), under the mentorship of Tristan Sylvain (@TiSU32).

Python virtual environment configuration:

  1. Create a virtual environment with conda or venv (we used Python 3.10)

  2. Install PyTorch according to the official guidelines, make sure it recognizes your accelerators

  3. pip install -r requirements.txt to install dependencies

To check the results with tensorboard:

tensorboard --logdir=tb_records

For experiments, write bash scripts to call these python scripts:

run_minigrid_mp.py: a multi-processed experiment initializer for Skipper variants

run_minigrid.py: a single-processed experiment initializer for Skipper variants

run_leap_pretrain_vae.py: a single-processed experiment initializer for pretraining generator for the LEAP agent

run_leap_pretrain_rl.py: a single-processed experiment initializer for pretraining distance estimator (policy) for the LEAP agent

Please read carefully the argument definitions in runtime.py and pass the desired arguments.

To control the HER variants:

Use --hindsight_strategy to specify the hindsight relabeling strategy. The options are:

  • future: same as "future" variant in paper

  • episode: same as "episode" variant in paper

  • pertask: same as "pertask" variant in paper

  • future+episode: correspond to "F-E" variant in paper

  • future+pertask: correspond to "F-P" variant in paper

  • future+episode@0.5: correspond to "F-(E+P)" variant in paper, where 0.5 controls the mixture ratio of pertask

To use the "generate" strategy for estimator training, use --prob_relabel_generateJIT to specify the probability of replacing the relabeled target:

  • --hindsight_strategy future+episode --prob_relabel_generateJIT 1.0: correspond to "F-G" variant in paper

  • --hindsight_strategy future+episode --prob_relabel_generateJIT 0.5: correspond to "F-(E+G)" variant in paper

  • --hindsight_strategy future+episode@0.333 --prob_relabel_generateJIT 0.25: correspond to "F-(E+P+G)" variant in paper

To choose environment and training settings:

  • --game SwordShieldMonster --size_world 12 --num_envs_train 50: game can be switched with RandDistShift (RDS) and size_world should >= 8

Extras

  • There is a potential CUDA_INDEX_ASSERTION error that could cause hanging at the beginning of the *Skipper *runs. We don't know yet how to fix it

  • The Dynamic Programming solutions for environment ground truth are only compatible with deterministic experiments

About

Source code for experiments in "Identifying and Addressing Delusions for Target- Directed Decision Making"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages