This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on DINO and MaskDINO. You can try our method on other frameworks.
Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.
Method | Backbone | h |
GFLOPs | FPS | mAP | |
---|---|---|---|---|---|---|
DINO | Swin-L | - | 12 |
937 | 4.42 | 58.5 |
DINO + Ours | Swin-L | 16 | 9 |
838 | 4.90 | 58.0 |
DINO + Ours | Swin-L | 14 | 8 |
768 | 5.32 | 57.7 |
DINO + Ours | Swin-L | 10 | 8 |
683 | 5.82 | 57.0 |
Method | Backbone | h |
GFLOPs | FPS | Mask AP | |
---|---|---|---|---|---|---|
MaskDINO | Swin-L | - | 12 |
937 | 4.42 | 52.3 |
MaskDINO + Ours | Swin-L | 16 | 9 |
838 | 4.90 | 52.1 |
MaskDINO + Ours | Swin-L | 10 | 9 |
737 | 5.32 | 51.6 |
MaskDINO + Ours | Swin-L | 8 | 8 |
640 | 5.82 | 50.9 |
Method | Backbone | h |
GFLOPs | FPS | PQ | |
---|---|---|---|---|---|---|
MaskDINO | Swin-L | - | 12 |
937 | 4.42 | 58.3 |
MaskDINO + Ours | Swin-L | 16 | 9 |
838 | 4.75 | 58.1 |
MaskDINO + Ours | Swin-L | 12 | 9 |
771 | 5.20 | 57.9 |
MaskDINO + Ours | Swin-L | 10 | 8 |
683 | 5.80 | 57.4 |
Please refer to Installation Instructions for the details of installation. Note that you need to install this repo instead of original detrex.
If you want to evaluate DINO with our method, you need to download the specified checkpoint released in DINO, and consider running following command:
python tools/train_net.py --config-file projects/dino/configs/dino_hourglass_swin_large_384_5scale_36ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint
If you want to evaluate MaskDINO with our method, you need to download the specified checkpoint released in MaskDINO, and consider running following command:
python tools/train_net.py --config-file projects/maskdino/configs/maskdino_hourglass_swin_large_384_coco_instance_seg_50ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint
python tools/train_net.py --config-file projects/maskdino/configs/maskdino_hourglass_swin_large_384_coco_panoptic_seg_50ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint
This project is released under the Apache 2.0 license.
The repo is built based on detrex v0.2.0.
If you find this project useful in your research, please consider cite:
@article{yuan2023expediting,
author = {Yuan, Yuhui and Liang, Weicong and Ding, Henghui and Liang, Zhanhao, and Zhang, Chao and Hu, Han},
title = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
journal = {TPAMI},
year = {2023},
}
@article{liang2022expediting,
title={Expediting large-scale vision transformer for dense prediction without fine-tuning},
author={Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={35462--35477},
year={2022}
}
@misc{zhang2022dino,
title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection},
author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
year={2022},
eprint={2203.03605},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{li2022mask,
title={Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation},
author={Feng Li and Hao Zhang and Huaizhe xu and Shilong Liu and Lei Zhang and Lionel M. Ni and Heung-Yeung Shum},
year={2022},
eprint={2206.02777},
archivePrefix={arXiv},
primaryClass={cs.CV}
}