Skip to content

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on DINO and MaskDINO.

License

Notifications You must be signed in to change notification settings

Expedit-LargeScale-Vision-Transformer/Expedit-DINO

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on DINO and MaskDINO. You can try our method on other frameworks.

framework framework

Results

Here we implement our method on Swin backbone. Thus we report the GFLOPs and FPS of backbone.

Object Dectection

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS mAP
DINO Swin-L - 12 $\times$ 12 937 4.42 58.5
DINO + Ours Swin-L 16 9 $\times$ 9 838 4.90 58.0
DINO + Ours Swin-L 14 8 $\times$ 8 768 5.32 57.7
DINO + Ours Swin-L 10 8 $\times$ 8 683 5.82 57.0

Instance Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS Mask AP
MaskDINO Swin-L - 12 $\times$ 12 937 4.42 52.3
MaskDINO + Ours Swin-L 16 9 $\times$ 9 838 4.90 52.1
MaskDINO + Ours Swin-L 10 9 $\times$ 9 737 5.32 51.6
MaskDINO + Ours Swin-L 8 8 $\times$ 8 640 5.82 50.9

Panoptic Segmentation

Method Backbone $\alpha$ h $\times$ w GFLOPs FPS PQ
MaskDINO Swin-L - 12 $\times$ 12 937 4.42 58.3
MaskDINO + Ours Swin-L 16 9 $\times$ 9 838 4.75 58.1
MaskDINO + Ours Swin-L 12 9 $\times$ 9 771 5.20 57.9
MaskDINO + Ours Swin-L 10 8 $\times$ 8 683 5.80 57.4

Installation

Please refer to Installation Instructions for the details of installation. Note that you need to install this repo instead of original detrex.

Getting Started

If you want to evaluate DINO with our method, you need to download the specified checkpoint released in DINO, and consider running following command:

python tools/train_net.py --config-file projects/dino/configs/dino_hourglass_swin_large_384_5scale_36ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint

If you want to evaluate MaskDINO with our method, you need to download the specified checkpoint released in MaskDINO, and consider running following command:

python tools/train_net.py --config-file projects/maskdino/configs/maskdino_hourglass_swin_large_384_coco_instance_seg_50ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint
python tools/train_net.py --config-file projects/maskdino/configs/maskdino_hourglass_swin_large_384_coco_panoptic_seg_50ep.py --num-gpus 4 --eval-only train.init_checkpoint=/path/to/checkpoint

License

This project is released under the Apache 2.0 license.

Acknowledgement

The repo is built based on detrex v0.2.0.

Citation

If you find this project useful in your research, please consider cite:

@article{yuan2023expediting,
	author    = {Yuan, Yuhui and Liang, Weicong and Ding, Henghui and Liang, Zhanhao, and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {TPAMI},
	year      = {2023},
}

@article{liang2022expediting,
  title={Expediting large-scale vision transformer for dense prediction without fine-tuning},
  author={Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={35462--35477},
  year={2022}
}
@misc{zhang2022dino,
      title={DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection}, 
      author={Hao Zhang and Feng Li and Shilong Liu and Lei Zhang and Hang Su and Jun Zhu and Lionel M. Ni and Heung-Yeung Shum},
      year={2022},
      eprint={2203.03605},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{li2022mask,
      title={Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation}, 
      author={Feng Li and Hao Zhang and Huaizhe xu and Shilong Liu and Lei Zhang and Lionel M. Ni and Heung-Yeung Shum},
      year={2022},
      eprint={2206.02777},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on DINO and MaskDINO.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages