GitHub - YanhaoWu/UMGen: Code for CVPR2025 paper: Generating Multimodal Driving Scenes via Next-Scene Prediction

Generating Multimodal Driving Scenes via Next-Scene Prediction

Yanhao Wu^1,2, Haoyang Zhang², Tianwei Lin², Lichao Huang²,

Shujie Luo², Rui Wu², Congpei Qiu¹, Wei Ke¹, Tong Zhang^{3, 4},

¹ Xi'an Jiaotong University, ² Horizon Robotics, ³ EPFL, ⁴ University of Chinese Academy of Sciences

Accepted to CVPR 2025

🌟 What is UMGen?

UMGen generates multimodal driving scenes, where each scene integrates:
Ego-vehicle actions, maps, traffic agents, and images.

🎬 Autoregressive Scene Generation

All visual elements in the video are generated by UMGen.

Teaser_formated.mp4

🤖 User-Specified Scenario Generation

UMGen also supports user-specified scenario generation.
In this video, we control the agent to simulate a cut-in maneuver scenario.

Userset_Scene.mp4

📎 More Information

For more videos and details, please refer to our and

🚀 Quick Start

Set up a new virtual environment

conda create -n UMGen python=3.8 -y
conda activate UMGen

Install dependency packpages

UMGen_path="path/to/UMGen"
cd ${UMGen_path}
pip3 install --upgrade pip
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirements.txt

Prepare the data

Download the tokenized data and pretrained weights from https://drive.google.com/drive/folders/1rJEVxWNk4MH_FPdqUMgdjV_PHwKJMS-3?usp=sharing

The directory structure should be:

UMGen/
├── data
│   ├── controlled_scenes/
|       ├── XX
│   ├── tokenized_origin_scenes/
│       ├── XX
|   ├── weights/
│       ├── image_var.tar
|       ├── map_vae.ckpt
|       ├── UMGen_Large.pt
└── projects/

⚙️ Inference Usage

🎛️ Infer Future Frames Freely

Generate future frames automatically without any external control signals.

python projects/tools/evaluate.py --infer_task video --set_num_new_frames 30

🕹️Infer Future Frames with Control

Generate future frames under specific control constraints, such as predefined trajectories or object behavior control.

python projects/tools/evaluate.py --infer_task control --set_num_new_frames 30

🧩 To-Do List

Release more tokenized scene data
Release the code for obtaining scene tokens using the VAE models
Release the diffusion code to enhance the videos

📬 Contact

For any questions or collaborations, feel free to contact me : ) 📧 wuyanhao@stu.xjtu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
projects		projects
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generating Multimodal Driving Scenes via Next-Scene Prediction

🌟 What is UMGen?

🎬 Autoregressive Scene Generation

🤖 User-Specified Scenario Generation

📎 More Information

🚀 Quick Start

Set up a new virtual environment

Install dependency packpages

Prepare the data

⚙️ Inference Usage

🎛️ Infer Future Frames Freely

🕹️Infer Future Frames with Control

🧩 To-Do List

📬 Contact

About

Uh oh!

Releases 1

Packages

Contributors 2

Languages

YanhaoWu/UMGen

Folders and files

Latest commit

History

Repository files navigation

Generating Multimodal Driving Scenes via Next-Scene Prediction

🌟 What is UMGen?

🎬 Autoregressive Scene Generation

🤖 User-Specified Scenario Generation

📎 More Information

🚀 Quick Start

Set up a new virtual environment

Install dependency packpages

Prepare the data

⚙️ Inference Usage

🎛️ Infer Future Frames Freely

🕹️Infer Future Frames with Control

🧩 To-Do List

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages