Skip to content

leor-c/M3

Repository files navigation

M3: A Modular World Model over Streams of Tokens

Lior Cohen ▪️ Kaixin Wang ▪️ Bingyi Kang ▪️ Uri Gadot ▪️ Shie Mannor

📄 Paper ▪️ 🧠 Trained model weights

🐋 Docker

We provide Docker files for automatically building a Docker image and running the code in a Docker container. Our code uses Docker compose, which automatically sets up the environment for you with one command. To use docker with GPUs, make sure to install the nvidia-container-toolkit (link) on the host machine.

To build the docker image and run a container automatically, run the following command from the project root folder:

docker compose up -d

To access the command line of the container, run

docker attach m3_c

Use the container's command line to run the desired script (detailed below). You can detach from the container using CTRL+D, and stop the container using docker compose down.

Environment Rendering (Optional)

If you would like to render the environment, it is necessary to set up X11 forwarding.

On Windows OS:

we used VcXsrv as an X server. Then, set up the DISPLAY environment variable by executing

export DISPLAY=<your-host-ip>:0

inside the docker container (after attaching). This can also be set in the Windows command line using setx DISPLAY=<your-host-ip>:0 or just set DISPLAY=<your-host-ip>:0 for the current session only.

You can validate the value is correct by executing echo $DISPLAY in the Docker container.

Linux OS:

If the game window fails to appear, try executing sudo xhost + on the host machine (before attaching to the docker container).

Headless Mode

To run in headless mode, execute export MUJOCO_GL='osmesa' in the Docker container's terminal before launching the training script.

🏗️ Setup

  • We highly recommend using Docker for setting up the environment, as described above.
  • If you wish to set up the environment manually without Docker, we recommend following the steps in the Dockerfile.
    • Python 3.10
    • Install PyTorch (torch and torchvision). Code developed with several versions of Pytorch, with the latest being torch==2.4.1, but should work with other recent version.
    • Install other dependencies: pip install -r requirements.txt
  • Warning: Atari ROMs will be downloaded with the dependencies, which means that you acknowledge that you have the license to use them.

🏋️ Launch a Training Run

python src/main.py benchmark=atari

To run other benchmarks use benchmark=dmc for DeepMind Control Suite or benchmark=craftax for Craftax. To change an environment within a benchmark, set env.train.id by modifying the appropriate configuration file located in config/env or through the command line:

python src/main.py benchmark=atari env.train.id=BreakoutNoFrameskip-v4

By default, the logs are synced to weights & biases, set wandb.mode=disabled to turn it off or wandb.mode=offline for offline logging.

📺 Visualizing Trained Models

Download a trained model and use

python src/play.py <benchmark> -p <path-to-model-weights>

to visualize the agent controlling the real environment live. For Atari, make sure to use the correct environment ID in the configuration by setting env.train.id=<game_name>NoFrameSkip-v4 in config/env/atari.yaml (e.g., env.train.id=DemonAttackNoFrameskip-v4). For more options, use python src/play.py --help or see details below.

For example, to visualize the Craftax agent, download Craftax.pt from our HuggingFace repo, place it in M3/checkpoints/Craftax.pt and launch python src/play.py craftax -p checkpoints/Craftax.pt (from the attached Docker container).

🛠️ Configuration

  • All configuration files are located in config/, the main configuration file is config/base.yaml.
  • Each benchmark overrides the base configuration. Each root benchmark config is located in config/benchmark.
  • The simplest way to customize the configuration is to edit these files directly.
  • Please refer to Hydra for more details regarding configuration management.

📁 Run Folder

Each new run is located at outputs/env.id/YYYY-MM-DD/hh-mm-ss/. This folder is structured as:

outputs/env.id/YYYY-MM-DD/hh-mm-ss/
│
└─── checkpoints
│   │   last.pt
|   |   optimizer.pt
|   |   ...
│   │
│   └─── dataset
│       │   0.pt
│       │   1.pt
│       │   ...
│
└─── config
│   |   config.yaml
|
└─── media
│   │
│   └─── episodes
│   |   │   ...
│   │
│   └─── reconstructions
│   |   │   ...
│
└─── scripts
|   |   eval.py
│   │   resume.sh
|   |   ...
|
└─── src
|   |   main.py
|   |   play.py
|   |   ...
|
└─── wandb
    |   ...
  • checkpoints: contains the last checkpoint of the model, its optimizer and the dataset.
  • media:
    • episodes: contains train / test / imagination episodes for visualization purposes.
    • reconstructions: contains original frames alongside their reconstructions with the autoencoder.
  • scripts: from the run folder, you can use the following scripts.
    • eval.py: Launch python ./scripts/eval.py to evaluate the run.
    • resume.sh: Launch ./scripts/resume.sh to resume a training that crashed.
  • play.py: Tool to visualize the learned controller / world model / representations.
    • Use python src/play.py --help to print usage information. Currently, this tool only supports atari and craftax options.
    • Launch python src/play.py <benchmark> -p <path-to-model-weights> to watch the agent play live in the environment. If you add the flag -r (Atari only), the left panel displays the original frame, the center panel displays the same frame downscaled to the input resolution of the discrete autoencoder, and the right panel shows the output of the autoencoder (what the agent actually sees). The -h flag shows additional information (Atari only).
    • Press R to start/stop recording a video.

📈 Results

The folder results/data/ contains raw scores (for each game, and for each training run).

👉 Citation

@misc{cohen2025m3,
      title={$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens}, 
      author={Lior Cohen and Kaixin Wang and Bingyi Kang and Uri Gadot and Shie Mannor},
      year={2025},
      eprint={2502.11537},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.11537}, 
}

🌟 Credits

About

A Modular World Model over Streams of Tokens

Resources

License

Stars

Watchers

Forks

Languages