- This is the official repository of the paper: ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries (CVPR 2023).
Use the following commands to prepare the python environment.
conda create -n vip3d python=3.6
Supported python versions are 3.6, 3.7, 3.8.
conda activate vip3d
pip install torch==1.10+cu111 torchvision==0.11.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10/index.html
pip install mmdet==2.24.1
pip install -r requirements.txt
cd ~
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
pip install -r requirements/runtime.txt # Install packages for mmdet3d
We also provide a docker image of ViP3D, which has installed all required packages. The docker image is built from NVIDIA container image for PyTorch. Make sure you have installed docker and nvidia docker.
docker pull gentlesmile/vip3d
docker run --name vip3d_container -it --gpus all --ipc=host gentlesmile/vip3d
1) Download nuScenes full dataset (v1.0) and map expansion here.
Only need to download Keyframe blobs and Radar blobs.
After downloading, the structure is as follows:
ViP3D
├── mmdet3d/
├── plugin/
├── tools/
├── data/
│ ├── nuscenes/
│ │ ├── maps/
│ │ ├── samples/
│ │ ├── v1.0-trainval/
│ │ ├── lidarseg/
Suppose nuScenes data is saved at data/nuscenes/
.
python tools/data_converter/nusc_tracking.py
Train ViP3D using 3 historical frames and the ResNet50 backbone. It will load a pre-trained detector for weight initialization. Suppose the detector is at ckpts/detr3d_resnet50.pth
. It can be downloaded from here.
bash tools/dist_train.sh plugin/vip3d/configs/vip3d_resnet50_3frame.py 8 --work-dir=work_dirs/vip3d_resnet50_3frame.1
The training stage requires ~ 17 GB GPU memory, and takes ~ 3 days for 24 epochs on 8× 3090 GPUS.
Run evaluation using the following command:
PYTHONPATH=. python tools/test.py plugin/vip3d/configs/vip3d_resnet50_3frame.py work_dirs/vip3d_resnet50_3frame.1/epoch_24.pth --eval bbox
The checkpoint epoch_24.pth
can be downloaded from here.
Expected AMOTA using ResNet50 as backbone: 0.291
Then test prediction metrics:
unzip ./nuscenes_prediction_infos_val.zip
python tools/prediction_eval.py --result_path 'work_dirs/vip3d_resnet50_3frame.1/results_nusc.json'
Expected results: minADE: 1.47, minFDE: 2.21, MR: 0.237, EPA: 0.245
The code and assets are under the Apache 2.0 license.
If you find our work useful for your research, please consider citing the paper:
@inproceedings{vip3d,
title={ViP3D: End-to-end visual trajectory prediction via 3d agent queries},
author={Gu, Junru and Hu, Chenxu and Zhang, Tianyuan and Chen, Xuanyao and Wang, Yilun and Wang, Yue and Zhao, Hang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={5496--5506},
year={2023}
}