Skip to content
/ Doe Public

Doe-1: Closed-Loop Autonomous Driving with Large World Model

License

Notifications You must be signed in to change notification settings

wzzheng/Doe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doe-1: Closed-Loop Autonomous Driving with Large World Model

logo

Check out our Large Driving Model Series!

Doe-1: Closed-Loop Autonomous Driving with Large World Model

Wenzhao Zheng* $\dagger$, Zetian Xia*, Yuanhui Huang, Sicheng Zuo, Jie Zhou, Jiwen Lu

* Equal contribution $\dagger$ Project leader

Doe-1 is the first closed-loop autonomous driving model for unified perception, prediction, and planning.

News

  • [2024/12/13] Evaluation code released.
  • [2024/12/13] Paper released on arXiv.
  • [2024/12/13] Demo released.

Demo

demo

Doe-1 is a unified model to accomplish visual-question answering, future prediction, and motion planning.

Overview

overview

We formulate autonomous driving as a unified next-token generation problem and use observation, description, and action tokens to represent each scene. Without additional fine-tuning, Doe-1 accomplishes various tasks by using different input prompts, including visual question-answering, controlled image generation, and end-to-end motion planning.

Closed-Loop Autonomous Driving

closed-loop

We explore a new closed-loop autonomous driving paradigm which combines end-to-end model and world model to construct a closed loop.

Visualizations

Closed-Loop Autonomous Driving

vis-closed-loop

Action-Conditioned Video Generation

vis-prediction

Getting Started

Data Preparation

  1. Download nuScenes V1.0 full dataset data HERE.

  2. Download the annotations data_nusc from OmniDrive and unzip it.

  3. Download the VQVAE weights from HERE and put them to the following directory as HERE:

Doe/
- model/
    - lumina_mgpt/
        - ckpts/
            - chameleon/
                - tokenizer/
                    - text_tokenizer.json
                    - vqgan.yaml
                    - vqgan.ckpt
    - xllmx/
- ...

Inference

  1. Generate the conversation data for inference and set the max :
# max length: 1 for qa, 5 for planning
python dataset/gen_data.py \
--info_path path/to/infos_var.pkl \
--qa_path path/to/OmniDriveDataset \
--nusc_path path/to/nuscenes \
--save_path path/to/save/outputs \
--max_length 1
  1. Inference with a model ckpt:
# set split and id for multi gpus
CUDA_VISIBLE_DIVICES=0 python inference/eval.py \
--anno_path path/to/val_infos.pkl \
--nusc_path path/to/nuscenes \
--save_path path/to/save/output \
--model_path path/to/model/ckpt \
--data_path path/to/generated/data.json \
--task qa

Related Projects

Our code is based on the excellent work Lumina-mGPT.

Citation

If you find this project helpful, please consider citing the following paper:

@article{doe,
    title={Doe-1: Closed-Loop Autonomous Driving with Large World Model},
    author={Zheng, Wenzhao and Xia, Zetian and Huang, Yuanhui and Zuo, Sicheng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv: 2412.09627},
    year={2024}
}

About

Doe-1: Closed-Loop Autonomous Driving with Large World Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages