This is the official repository of ''3D Object Detection from Images for Autonomous Drving: A Survey'', a comprehensive survey of recent progress in deep learning methods for image-based 3D object detection. In this paper, we introduce the [datasets] which can be used for this task, propose two [taxonomies] to better organize the existing methods, and summarize the image-based 3D [detectors] in both coarse (framework) level and fine (component) level. Besides, we also introduce the auxiliary data commonly used in the existing methods, and provide [benchmarks] according to them. Finally, we discuss the relevant issues and potential reseach directions in this field.
Besides, this repo will be continuously maintained, and feel free to contact me if you have new methods to add or any suggestions!
- KITTI-3D [Paper (CVPR'12)] [Paper (IJRR'13)] [Homepage] [Data] [Benchmark]
- Argoverse [Paper (CVPR'19)] [Paper (NeurIPS'21)] [Homepage] [Data] [Benchmark]
- Lyft L5 [Homepage] [Data]
- H3D [Paper (ICRA'19)] [Data]
- A*3D [Paper (ICRA'20)] [Homepage]
- nuScenes [Paper (CVPR'20)] [Homepage] [Data] [Benchmark]
- Waymo Open [Paper (CVPR'20)] [Homepage] [Data] [Benchmark]
- CityScapes-3D [Paper (CVPR'20 Workshop)] [Homepage] [Data] [Benchmark]
- A2D2 [Paper (arXiv)] [Homepage] [Data]
- KITTI-360 [Paper (arXiv)] [Homepage] [Data] [Benchmark]
- Rope3D [Paper (CVPR'22)] [Homepage] [Data]
- [Key-points annotations] generated by [AutoShape] for KITTI-3D
- [Mask, depth, and part annotations] generated by [ZoomNet] for KITTI-3D
- [Mask and disparity annoataions] generated by [Disp R-CNN] for KITTI-3D
- [Depth maps] for KITTI-3D pre-trained by [DORN] (processed version can be found in [this repo])
- [Pre-trained backbones (DLA34&V2-99)] provided by [DD3D]
To facilitate both the systematic analysis of current approaches and a fair comparison in performance for future works, we propose two novel taxonomies to categorize existing methods, i.e. in terms of their adopted frameworks and of the used input data. Here we show the outline of the proposed taxonomies, and please refer to our paper for more details.
taxonomies
│──by framework
│ ├──methods based on 2D features [result lifting]
│ └──methods based on 3D features
| ├──feature lifting
│ └──data lifting
└──by input data
├──without auxiliday data (standard setting)
├──with auxiliday data in training phase
| ├──CAD models
| ├──LiDAR signals
| ├──additional training data
| └──...
├──with auxiliday data in training&testing phases
| ├──temporal sequences
| ├──stereo pairs
| └──...
└──others (semi/self-supervised settings)
This part collects the images-based 3D detectors, and you can choose to check these works [by venue] or [by input data].
We also maintain the commonly used benchmarks to help the researchers get relevant information quickly. At present, the [KITTI-3D benchmark] and the [nuScenes benchmark] are avaliable in this repo.
If you find our work useful in your research, please consider citing:
@article{3dodi,
title = {3D object detection from images for autonomous driving: A survey},
author = {Ma, Xinzhu and Ouyang, Wanli and Simonelli, Andrea and Ricci, Elisa},
year = {2022},
journal = {arXiv preprint arXiv:2202.02980}
}
See [this document] for the logs of updates