This is the official implementation of the paper "Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning" on Video Swin Transformer.
Method | t |
GFLOPs | FPS | Acc@1 | Acc@5 | config | |
---|---|---|---|---|---|---|---|
Swin-L | - | 8 |
2107 | 1.10 | 84.7 | 96.6 | config |
Swin-L + Ours | 10 | 8 |
1662 | 1.66 | 84.0 | 96.3 | config |
Method | t |
GFLOPs | FPS | Acc@1 | Acc@5 | config | |
---|---|---|---|---|---|---|---|
Swin-L | - | 8 |
2107 | 1.10 | 86.1 | 97.3 | config |
Swin-L + Ours | 10 | 8 |
1824 | 1.53 | 85.6 | 97.1 | config |
Please refer to install.md for installation.
We also provide docker file cuda10.1 (image url) and cuda11.0 (image url) for convenient usage.
Please refer to data_preparation.md for a general knowledge of data preparation. The supported datasets are listed in supported_datasets.md.
We also share our Kinetics-400 annotation file k400_val, k400_train for better comparison.
# single-gpu testing
python tools/test.py <CONFIG_FILE> <CHECKPOINT_FILE> --eval top_k_accuracy
# multi-gpu testing
bash tools/dist_test.sh <CONFIG_FILE> <CHECKPOINT_FILE> <GPU_NUM> --eval top_k_accuracy
If you find this project useful in your research, please consider cite:
@article{liang2022expediting,
author = {Liang, Weicong and Yuan, Yuhui and Ding, Henghui and Luo, Xiao and Lin, Weihong and Jia, Ding and Zhang, Zheng and Zhang, Chao and Hu, Han},
title = {Expediting large-scale vision transformer for dense prediction without fine-tuning},
journal = {arXiv preprint arXiv:2210.01035},
year = {2022},
}
@article{liu2021video,
title={Video Swin Transformer},
author={Liu, Ze and Ning, Jia and Cao, Yue and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Hu, Han},
journal={arXiv preprint arXiv:2106.13230},
year={2021}
}
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}