This is the official implementation of the paper "Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective".
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
Jinjing Zhao*, Fangyun Wei*, Chang Xu
The University of Sydney
- Update the checkpoints
With the transformative impact of the Transformer, DETR pioneered the application of the encoder-decoder architecture to object detection. A collection of follow-up research, e.g., Deformable DETR, aims to enhance DETR while adhering to the encoder-decoder design. In this work, we revisit the DETR series through the lens of Faster R-CNN. We find that the DETR resonates with the underlying principles of Faster R-CNN's RPN-refiner design but benefits from end-to-end detection owing to the incorporation of Hungarian matching. We systematically adapt the Faster R-CNN towards the Deformable DETR, by integrating or repurposing each component of Deformable DETR, and note that Deformable DETR's improved performance over Faster R-CNN is attributed to the adoption of advanced modules such as a superior proposal refiner (e.g., deformable attention rather than RoI Align). When viewing the DETR through the RPN-refiner paradigm, we delve into various proposal refinement techniques such as deformable attention, cross attention, and dynamic convolution. These proposal refiners cooperate well with each other; thus, we synergistically combine them to establish a Hybrid Proposal Refiner (HPR). Our HPR is versatile and can be incorporated into various DETR detectors. For instance, by integrating HPR to a strong DETR detector, we achieve an AP of 54.9 on the COCO benchmark, utilizing a ResNet-50 backbone and a 36-epoch training schedule.
Base Model | Epoch | w/LSJ | AP | Configs | Checkpoints |
---|---|---|---|---|---|
Deformable DETR | 12 | 50.6 | config | OneDrive | quark | |
Deformable DETR | 24 | 51.9 | config | OneDrive | quark | |
DINO | 12 | 51.1 | config | OneDrive | quark | |
DINO | 24 | 51.9 | config | OneDrive | quark | |
Align DETR | 12 | 52.1 | config | - | |
Align DETR | 24 | 52.7 | config | - | |
Align DETR | 12 | √ | 52.7* | config | OneDrive | quark |
Align DETR | 24 | √ | 54.6* | config | OneDrive | quark |
Align DETR | 36 | √ | 55.2* | config | OneDrive | quark |
DDQ | 12 | 52.6* | config | OneDrive | quark | |
DDQ | 24 | 53.3* | config | OneDrive | quark | |
DDQ | 12 | √ | 53.0 | config | OneDrive | quark |
DDQ | 24 | √ | 54.8* | config | OneDrive | quark |
DDQ | 36 | √ | 55.1* | config | OneDrive | quark |
Base Model | Epoch | w/LSJ | AP | Configs | Checkpoints |
---|---|---|---|---|---|
DDQ | 12 | 58.7 | config | OneDrive | quark | |
DDQ | 12 | √ | 58.8* | config | OneDrive | quark |
DDQ | 24 | √ | 59.7* | config | OneDrive | quark |
Align DETR | 12 | 58.6 | config | OneDrive | quark | |
Align DETR | 24 | 59.3 | config | OneDrive | quark | |
Align DETR | 12 | √ | 58.8 | config | OneDrive | quark |
Align DETR | 24 | √ | 59.6 | config | OneDrive | quark |
Align DETR | 36 | √ | 60.0 | config | OneDrive | quark |
* Retrained this configuration, the result is slightly higher than what we reported in the paper.
We test our models under python=3.10.10, pytorch=1.12.0,cuda=11.6
. Other versions might be available as well.
- Install Pytorch and torchvision
Follow the instruction on https://pytorch.org/get-started/locally/.
# an example:
conda install -c pytorch pytorch torchvision
- Install other needed packages
pip install -r requirements.txt
Please download COCO 2017 dataset and organize them as following:
coco2017/
├── train2017/
├── val2017/
└── annotations/
├── instances_train2017.json
└── instances_val2017.json
Before training or evaluation, you need to modify the dataset path in following config files:
project/configs/_base_/datasets/data_re_aug_coco_detection.py
project/configs/_base_/datasets/lsj_data_re_aug_coco_detection.py
To accelerate convergence, we apply the SoCo pretrain on the ResNet-50 backbone (./backbone_pth/backbone.pth
).
./dist_train.sh <Config Path> <GPU Number> <Work Dir>
./dist_test.sh <Config Path> <Checkpoint Path> <GPU Number>
You can refer to Deformable-DETR to enable training on multiple nodes.
If you use HPR in your research or wish to refer to the baseline results published here, please use the following BibTeX entry.
@inproceedings{zhao2024hybrid,
title={Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective},
author={Zhao, Jinjing and Wei, Fangyun and Xu, Chang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={17416--17426},
year={2024}
}