Skip to content

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Video Swin Transformer.

License

Notifications You must be signed in to change notification settings

Expedit-LargeScale-Vision-Transformer/Expedit-Video-Swin-Transformer

Repository files navigation

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Introduction

This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Video Swin Transformer.

framework framework

Results

Kinetics 400

Method $\alpha$ t $\times$ h $\times$ w GFLOPs FPS Acc@1 Acc@5 config
Swin-L - 8 $\times$ 12 $\times$ 12 2107 1.10 84.7 96.6 config
Swin-L + Ours 10 8 $\times$ 6 $\times$ 6 1662 1.66 84.0 96.3 config

Kinetics 600

Method $\alpha$ t $\times$ h $\times$ w GFLOPs FPS Acc@1 Acc@5 config
Swin-L - 8 $\times$ 12 $\times$ 12 2107 1.10 86.1 97.3 config
Swin-L + Ours 10 8 $\times$ 6 $\times$ 6 1824 1.53 85.6 97.1 config

Usage

Installation

Please refer to install.md for installation.

We also provide docker file cuda10.1 (image url) and cuda11.0 (image url) for convenient usage.

Data Preparation

Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md.

We also share our Kinetics-400 annotation file k400_val, k400_train for better comparison.

Inference

# single-gpu testing
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --eval top_k_accuracy

# multi-gpu testing
bash tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT_FILE> <GPU_NUM> --eval top_k_accuracy

Citation

If you find this project useful in your research, please consider cite:

@article{liang2022expediting,
	author    = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
	title     = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
	journal   = {arXiv preprint arXiv:2210.01035},
	year      = {2022},
}
@article{liu2021video,
  title={Video Swin Transformer},
  author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
  journal={arXiv preprint arXiv:2106.13230},
  year={2021}
}

@article{liu2021Swin,
  title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

About

[NeurIPS2022] This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Video Swin Transformer.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages