Yanhao Wu1,2, Haoyang Zhang2, Tianwei Lin2, Lichao Huang2,
Shujie Luo2, Rui Wu2, Congpei Qiu1, Wei Ke1, Tong Zhang3, 4,
1 Xi'an Jiaotong University, 2 Horizon Robotics, 3 EPFL, 4 University of Chinese Academy of Sciences
Accepted to CVPR 2025
UMGen generates multimodal driving scenes, where each scene integrates:
Ego-vehicle actions, maps, traffic agents, and images.
All visual elements in the video are generated by UMGen.
Teaser_formated.mp4
UMGen also supports user-specified scenario generation.
In this video, we control the agent to simulate a cut-in maneuver scenario.
Userset_Scene.mp4
For more videos and details, please refer to our and
conda create -n UMGen python=3.8 -y
conda activate UMGenUMGen_path="path/to/UMGen"
cd ${UMGen_path}
pip3 install --upgrade pip
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip3 install -r requirements.txtDownload the tokenized data and pretrained weights from https://drive.google.com/drive/folders/1rJEVxWNk4MH_FPdqUMgdjV_PHwKJMS-3?usp=sharing
The directory structure should be:
UMGen/
├── data
│ ├── controlled_scenes/
| ├── XX
│ ├── tokenized_origin_scenes/
│ ├── XX
| ├── weights/
│ ├── image_var.tar
| ├── map_vae.ckpt
| ├── UMGen_Large.pt
└── projects/Generate future frames automatically without any external control signals.
python projects/tools/evaluate.py --infer_task video --set_num_new_frames 30Generate future frames under specific control constraints, such as predefined trajectories or object behavior control.
python projects/tools/evaluate.py --infer_task control --set_num_new_frames 30- Release more tokenized scene data
- Release the code for obtaining scene tokens using the VAE models
- Release the diffusion code to enhance the videos
For any questions or collaborations, feel free to contact me : ) 📧 wuyanhao@stu.xjtu.edu.cn