Lior Cohen ▪️ Kaixin Wang ▪️ Bingyi Kang ▪️ Uri Gadot ▪️ Shie Mannor
📄 Paper ▪️ 🧠 Trained model weights
We provide Docker files for automatically building a Docker image and running the code in a Docker container.
Our code uses Docker compose, which automatically sets up the environment for you with one command.
To use docker with GPUs, make sure to install the nvidia-container-toolkit
(link) on the host machine.
To build the docker image and run a container automatically, run the following command from the project root folder:
docker compose up -d
To access the command line of the container, run
docker attach m3_c
Use the container's command line to run the desired script (detailed below).
You can detach from the container using CTRL+D
, and stop the container using docker compose down
.
If you would like to render the environment, it is necessary to set up X11 forwarding.
we used VcXsrv as an X server.
Then, set up the DISPLAY
environment variable by executing
export DISPLAY=<your-host-ip>:0
inside the docker container (after attaching).
This can also be set in the Windows command line using setx DISPLAY=<your-host-ip>:0
or just set DISPLAY=<your-host-ip>:0
for the current session only.
You can validate the value is correct by executing echo $DISPLAY
in the Docker container.
If the game window fails to appear, try executing sudo xhost +
on the host machine (before attaching to the docker container).
To run in headless mode, execute export MUJOCO_GL='osmesa'
in the Docker container's terminal before launching the training script.
- We highly recommend using Docker for setting up the environment, as described above.
- If you wish to set up the environment manually without Docker, we recommend following the steps in the
Dockerfile
.- Python 3.10
- Install PyTorch (torch and torchvision). Code developed with several versions of Pytorch, with the latest being torch==2.4.1, but should work with other recent version.
- Install other dependencies:
pip install -r requirements.txt
- Warning: Atari ROMs will be downloaded with the dependencies, which means that you acknowledge that you have the license to use them.
python src/main.py benchmark=atari
To run other benchmarks use benchmark=dmc
for DeepMind Control Suite or benchmark=craftax
for Craftax.
To change an environment within a benchmark, set env.train.id
by modifying the appropriate configuration file located in config/env
or through the command line:
python src/main.py benchmark=atari env.train.id=BreakoutNoFrameskip-v4
By default, the logs are synced to weights & biases, set wandb.mode=disabled
to turn it off
or wandb.mode=offline
for offline logging.
Download a trained model and use
python src/play.py <benchmark> -p <path-to-model-weights>
to visualize the agent controlling the real environment live.
For Atari, make sure to use the correct environment ID in the configuration by setting env.train.id=<game_name>NoFrameSkip-v4
in config/env/atari.yaml
(e.g., env.train.id=DemonAttackNoFrameskip-v4
).
For more options, use python src/play.py --help
or see details below.
For example, to visualize the Craftax agent, download Craftax.pt
from our HuggingFace repo, place it in M3/checkpoints/Craftax.pt
and launch python src/play.py craftax -p checkpoints/Craftax.pt
(from the attached Docker container).
- All configuration files are located in
config/
, the main configuration file isconfig/base.yaml
. - Each benchmark overrides the base configuration. Each root benchmark config is located in
config/benchmark
. - The simplest way to customize the configuration is to edit these files directly.
- Please refer to Hydra for more details regarding configuration management.
Each new run is located at outputs/env.id/YYYY-MM-DD/hh-mm-ss/
. This folder is structured as:
outputs/env.id/YYYY-MM-DD/hh-mm-ss/
│
└─── checkpoints
│ │ last.pt
| | optimizer.pt
| | ...
│ │
│ └─── dataset
│ │ 0.pt
│ │ 1.pt
│ │ ...
│
└─── config
│ | config.yaml
|
└─── media
│ │
│ └─── episodes
│ | │ ...
│ │
│ └─── reconstructions
│ | │ ...
│
└─── scripts
| | eval.py
│ │ resume.sh
| | ...
|
└─── src
| | main.py
| | play.py
| | ...
|
└─── wandb
| ...
checkpoints
: contains the last checkpoint of the model, its optimizer and the dataset.media
:episodes
: contains train / test / imagination episodes for visualization purposes.reconstructions
: contains original frames alongside their reconstructions with the autoencoder.
scripts
: from the run folder, you can use the following scripts.eval.py
: Launchpython ./scripts/eval.py
to evaluate the run.resume.sh
: Launch./scripts/resume.sh
to resume a training that crashed.
play.py
: Tool to visualize the learned controller / world model / representations.- Use
python src/play.py --help
to print usage information. Currently, this tool only supportsatari
andcraftax
options. - Launch
python src/play.py <benchmark> -p <path-to-model-weights>
to watch the agent play live in the environment. If you add the flag-r
(Atari only), the left panel displays the original frame, the center panel displays the same frame downscaled to the input resolution of the discrete autoencoder, and the right panel shows the output of the autoencoder (what the agent actually sees). The-h
flag shows additional information (Atari only). - Press
R
to start/stop recording a video.
- Use
The folder results/data/
contains raw scores (for each game, and for each training run).
@misc{cohen2025m3,
title={$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens},
author={Lior Cohen and Kaixin Wang and Bingyi Kang and Uri Gadot and Shie Mannor},
year={2025},
eprint={2502.11537},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.11537},
}
- https://github.com/leor-c/REM
- yet-another-retnet
- https://github.com/eloialonso/iris
- https://github.com/fkodom/yet-another-retnet
- https://github.com/pytorch/pytorch
- https://github.com/CompVis/taming-transformers
- https://github.com/karpathy/minGPT
- https://github.com/google-research/rliable