Name	Name	Last commit message	Last commit date
Latest commit mazpie Update README.md Feb 27, 2024 6908038 · Feb 27, 2024 History 6 Commits
agent	agent	Initial commit.	Jul 11, 2023
assets	assets	Initial commit.	Jul 11, 2023
configs	configs	Initial commit.	Jul 11, 2023
custom_dmc_tasks	custom_dmc_tasks	Updated conda env deps. Removed unused env deps.	Jul 12, 2023
LICENSE	LICENSE	Initial commit.	Jul 11, 2023
README.md	README.md	Update README.md	Feb 27, 2024
conda_env.yml	conda_env.yml	Updated conda env deps. Removed unused env deps.	Jul 12, 2023
dmc_benchmark.py	dmc_benchmark.py	Initial commit.	Jul 11, 2023
dreamer_finetune.py	dreamer_finetune.py	removed overwriting of cfg	Dec 18, 2023
dreamer_finetune.yaml	dreamer_finetune.yaml	Initial commit.	Jul 11, 2023
dreamer_pretrain.py	dreamer_pretrain.py	removed overwriting of cfg	Nov 24, 2023
dreamer_pretrain.yaml	dreamer_pretrain.yaml	Initial commit.	Jul 11, 2023
dreamer_replay.py	dreamer_replay.py	Initial commit.	Jul 11, 2023
envs.py	envs.py	Initial commit.	Jul 11, 2023
logger.py	logger.py	Initial commit.	Jul 11, 2023
utils.py	utils.py	Initial commit.	Jul 11, 2023

Repository files navigation

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

[website] [paper]

This is the code for our ICML 2023 work. You can use it to pre-train world model-based agents with different unsupervised strategies, fine-tune the agent's components selectively, and use planning (Dyna-MPC) during fine-tuning. The repo also contains an extensively tested DreamerV2 implementation in PyTorch.

If you find the code useful, please refer to our work using:

@inproceedings{
        Rajeswar2023MasterURLB,
        title={Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels},
        author={Sai Rajeswar and Pietro Mazzaglia and Tim Verbelen and Alexandre Piché and Bart Dhoedt and Aaron Courville and Alexandre Lacoste},
        booktitle={40th International Conference on Machine Learning},
        year={2023},
        url={https://arxiv.org/abs/2209.12016}
}

Requirements

The environment assumes you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command
DreamerV2 (supervised)	`agent=dreamer`
ICM	`agent=icm_dreamer`
Plan2Explore	`agent=plan2explore`
RND	`agent=rnd_dreamer`
LBS	`agent=lbs_dreamer`
APT	`agent=apt_dreamer`
DIAYN	`agent=diayn_dreamer`
APS	`agent=aps_dreamer`

Domains and tasks

We support the following domains and tasks.

Domain	Tasks
`walker`	`stand`, `walk`, `run`, `flip`
`quadruped`	`walk`, `run`, `stand`, `jump`
`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`

Instructions

Pre-training

To run pre-training use the dreamer_pretrain.py script

python dreamer_pretrain.py configs=dmc_pixels agent=icm_dreamer domain=walker seed=1

If you want to train a skill-based agent, e.g. DIAYN, just change the agent and run:

python dreamer_pretrain.py configs=dmc_pixels agent=diayn_dreamer domain=walker seed=1

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/<seed>

For example:

./pretrained_models/pixels/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize the Dreamer agent and fine-tune it on a downstream task. For example, let's say you have an agent pre-trained with ICM, you can fine-tune it on walker_run by running the following command:

python dreamer_finetune.py configs=dmc_pixels agent=icm_dreamer task=walker_run snapshot_ts=1000000 seed=1

This will load a snapshot stored in ./pretrained_models/pixels/walker/icm_dreamer/1/snapshot_1000000.pt, initialize Dreamer with it, and start training on walker_run using the extrinsic reward of the task.

You can ablate components by setting: init_critic=True/False and init_actor=True/False.

You can use Dyna-MPC by setting: mpc=True.

Monitoring

Console

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

Tensorboard

Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

Weights and Bias (wandb)

You can also use Weights and Bias, by launching the experiments with use_wandb=True.

Notes and acknowledgements

The codebase was adapted from URLB. The Dreamer implementation follows the original Tensorflow DreamerV2 codebase. This re-implementation has been carefully tested to obtain consistent results with the original ones on the DeepMind Control Suite, as reported in this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Requirements

Implemented Agents

Domains and tasks

Instructions

Pre-training

Fine-tuning

Monitoring

Console

Tensorboard

Weights and Bias (wandb)

Notes and acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

mazpie/mastering-urlb

Folders and files

Latest commit

History

Repository files navigation

Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Requirements

Implemented Agents

Domains and tasks

Instructions

Pre-training

Fine-tuning

Monitoring

Console

Tensorboard

Weights and Bias (wandb)

Notes and acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages