Paper | Project Page | Code
Check out our Large Driving Model Series!
GPD-1: Generative Pre-training for Driving
Zixun Xie*, Sicheng Zuo*, Wenzhao Zheng*
$\dagger$ , Yunpeng Zhang, Dalong Du, Jie Zhou, Jiwen Lu, Shanghang Zhang$\ddagger$
* Equal contribution
GPD-1 proposes a unified approach that seamlessly accomplishes multiple aspects of scene evolution, including scene simulation, traffic simulation, closed-loop simulation, map prediction, and motion planning, all without additional fine-tuning.
- [2024/12/12] Code released.
- [2024/12/12] Paper released on arXiv.
The pre-trained GPD-1 can accomplish various tasks without finetuning using different prompts.
Our model adapts the GPT-like architecture for autonomous driving scenarios with two key innovations: 1) a 2D map scene tokenizer based on VQ-VAE that generates discrete, high-level representations of the 2D BEV map, and 2) a hierarchical quantization agent tokenizer to encode agent information. Using a scene-level mask, the autoregressive transformer predicts future scenes by conditioning on both ground-truth and previously predicted scene tokens during training and inference, respectively.
Our work is inspired by these excellent open-sourced repos: PlanTF Sledge Navsim
If you find this project helpful, please consider citing the following paper:
@article{gpd-1,
title={GPD-1: Generative Pre-training for Driving},
author={Xie, Zixun and Zuo, Sicheng and Zheng, Wenzhao and Zhang, Yunpeng and Du, Dalong and Zhou, Jie and Lu, Jiwen and Zhang, Shanghang},
journal={arXiv preprint arXiv:2412.08643},
year={2024}
}