Skip to content

YanhaoWu/UMGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yanhao Wu1,2, Haoyang Zhang2, Tianwei Lin2, Lichao Huang2,

Shujie Luo2, Rui Wu2, Congpei Qiu1, Wei Ke1, Tong Zhang3, 4,

1 Xi'an Jiaotong University, 2 Horizon Robotics, 3 EPFL, 4 University of Chinese Academy of Sciences

Accepted to CVPR 2025

UMGen  Paper 

🌟 What is UMGen?

UMGen generates multimodal driving scenes, where each scene integrates:
Ego-vehicle actions, maps, traffic agents, and images.

🎬 Autoregressive Scene Generation

All visual elements in the video are generated by UMGen.

Teaser_formated.mp4

🤖 User-Specified Scenario Generation

UMGen also supports user-specified scenario generation.
In this video, we control the agent to simulate a cut-in maneuver scenario.

Userset_Scene.mp4

📎 More Information

For more videos and details, please refer to our Paper and UMGen

🚀 Quick Start

Set up a new virtual environment

conda create -n UMGen python=3.8 -y
conda activate UMGen

Install dependency packpages

UMGen_path="path/to/UMGen"
cd ${UMGen_path}
pip3 install --upgrade pip
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirements.txt

Prepare the data

Download the tokenized data and pretrained weights from https://drive.google.com/drive/folders/1rJEVxWNk4MH_FPdqUMgdjV_PHwKJMS-3?usp=sharing

The directory structure should be:

UMGen/
├── data
│   ├── controlled_scenes/
|       ├── XX
│   ├── tokenized_origin_scenes/
│       ├── XX
|   ├── weights/
│       ├── image_var.tar
|       ├── map_vae.ckpt
|       ├── UMGen_Large.pt
└── projects/

⚙️ Inference Usage

🎛️ Infer Future Frames Freely

Generate future frames automatically without any external control signals.

python projects/tools/evaluate.py --infer_task video --set_num_new_frames 30

🕹️Infer Future Frames with Control

Generate future frames under specific control constraints, such as predefined trajectories or object behavior control.

python projects/tools/evaluate.py --infer_task control --set_num_new_frames 30

🧩 To-Do List

  • Release more tokenized scene data
  • Release the code for obtaining scene tokens using the VAE models
  • Release the diffusion code to enhance the videos

📬 Contact

For any questions or collaborations, feel free to contact me : ) 📧 wuyanhao@stu.xjtu.edu.cn

About

Code for CVPR2025 paper: Generating Multimodal Driving Scenes via Next-Scene Prediction

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages