This repository is an official implementation of CAPE
CAPE is a simple yet effective method for multi-view 3D object detection. CAPE forms the 3D position embedding under the local camera-view system rather than the global coordinate system, which largely reduces the difficulty of the view transformation learning. And CAPE supports temporal modeling by exploiting the fusion between separated queries for multi frames.
This implementation is built upon PETR, and can be constructed as the install.md.
-
Environments
Linux, Python==3.7.9, CUDA == 11.2, pytorch == 1.9.1, mmdet3d == 0.17.1 -
Detection Data
Follow the mmdet3d to process the nuScenes dataset (https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md). -
Pretrained weights
To verify the performance on the val set, we provide the pretrained V2-99 weights. The V2-99 is pretrained on DDAD15M (weights) and further trained on nuScenes train set with FCOS3D. For the results on test set in the paper, we use the DD3D pretrained weights. The ImageNet pretrained weights of other backbone can be found here. Please put the pretrained weights into ./ckpts/. -
After preparation, you will be able to see the following directory structure:
CAPE ├── mmdetection3d ├── projects │ ├── configs │ ├── mmdet3d_plugin ├── tools ├── data │ ├── nuscenes │ ├── samples │ ├── ... ├── ckpts ├── README.md
cd CAPE
You can train the model following:
sh train.sh
You can evaluate the model following:
sh test.sh
config | mAP | NDS | config | download |
---|---|---|---|---|
cape_r50_1408x512_24ep_wocbgs_imagenet_pretrain | 34.7% | 40.6% | config | log / checkpoint |
capet_r50_704x256_24ep_wocbgs_imagenet_pretrain | 31.8% | 44.2% | config | log / checkpoint |
capet_VoV99_800x320_24ep_wocbgs_load_dd3d_pretrain | 44.7% | 54.36% | config | log / checkpoint |
Many thanks to the authors of mmdetection3d. Special thanks to the authors of PETR.
If you find this project useful for your research, please consider citing:
@article{Xiong2023CAPE,
title={CAPE: Camera View Position Embedding for Multi-View 3D Object Detection},
author={Xiong, Kaixin and Gong, Shi and Ye, Xiaoqing and Tan, Xiao and Wan, Ji and Ding, Errui and Wang, Jingdong and Bai, Xiang},
booktitle={Computer Vision and Pattern Recognition},
year={2023}
}
If you have any questions, feel free to open an issue or contact us at kaixinxiong@hust.edu.cn or gongshi@baidu.com or yexiaoqing@baidu.com.