Skip to content

liuting20/Sparse-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

Official PyTorch implementation of our paper

In this paper, we propose Sparse-Tuning, a novel tuning paradigm that substantially enhances both fine-tuning and inference efficiency for pre-trained ViT models. Sparse-Tuning efficiently fine-tunes the pre-trained ViT by sparsely preserving the informative tokens and merging redundant ones, enabling the ViT to focus on the foreground while reducing computational costs on background regions in the images. To accurately distinguish informative tokens from uninformative ones, we introduce a tailored Dense Adapter, which establishes dense connections across different encoder layers in the ViT, thereby enhancing the representational capacity and quality of token sparsification. Empirical results on VTAB-1K, three complete image datasets, and two complete video datasets demonstrate that Sparse-Tuning reduces the GFLOPs to 62%-70% of the original ViT-B while achieving state-of-the-art performance.

📌 We confirm that the relevant code and implementation details will be uploaded recently. Please be patient.

Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{liu2024sparse,
  title={{Sparse-Tuning}: Adapting Vision Transformers with Efficient Fine-tuning and Inference},
  author={Liu, Ting and Liu, Xuyang and Shi, Liangtao and Xu, Zunnan and Huang, Siteng and Xin, Yi and Yin, Quanjun},
  journal={arXiv preprint arXiv:2405.14700},
  year={2024}
}

Contact

For any question about our paper or code, please contact Ting Liu or Xuyang Liu.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published