[NeurIPS 2024] Expanding Sparse Tuning for Low Memory Usage

This is the official implementation of our paper: Expanding Sparse Tuning for Low Memory Usage.

Introduction:

We propose a method called SNELL (Sparse tuning with kerNELized LoRA) to enable sparse tuning with low memory usage. SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices, saving from the costly storage of the original full matrix. To maintain the effectiveness of sparse tuning with low-rank matrices, we extend the low-rank decomposition from a kernel perspective. Specifically, we apply nonlinear kernel functions to the full-matrix merging and gain an increase in the rank of the merged matrix. Employing higher ranks enhances the ability of SNELL to optimize the pre-trained model sparsely for downstream tasks. To further reduce the memory usage in sparse tuning, we introduce a competition-based sparsification mechanism, avoiding the storage of tunable weight indexes. Extensive experiments on multiple downstream tasks show that SNELL achieves state-of-the-art performance with low memory usage, extending effective PEFT with sparse tuning to large-scale models.

If you find this repository or our paper useful, please consider citing and staring us!

@InProceedings{Shen_2024_SNELL,
  title={Expanding Sparse Tuning for Low Memory Usage},
  author={Shen, Shufan and Sun, Junshu and Ji, Xiangyang and Huang, Qingming and Wang, Shuhui},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Getting started on SNELL:

Structure of this repo:

./train.py: run this file for training.
./scripts: scripts for adapting pre-trained models to downstream tasks with SNELL.
./lib: helper functions for io, loggings, training, and data loading.
./model: backbone architectures and methods for fine-tuning.
./engine.py: main training and eval functions.
./data: storing FGVC and VTAB-1k benchmarks.

Installation:

Clone this repo:

git clone https://github.com/ssfgunner/SNELL.git
cd SNELL

Create a conda virtual environment and activate it:

conda create -n snell python=3.8 -y
conda activate snell

Install torch==1.12.1 and torchvision==0.13.1 with CUDA==11.3:

conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.3 -c pytorch

Install other dependencies:
```
pip install -r requirements.txt
```

Datasets preparation:

FGVC: Please download the datasets following VPT.
VTAB-1k: Since the processing of some datasets in original VTAB benchmark is tricky, we recommend the extracted VTAB-1k datasets shared by SSF for convenience. (Note that the license is in VTAB benchmark).

The file structure should look like:

data
├── fgvc
│   ├── cub
│   ├── nabirds
│   └── ...
└── vtab-1k
    ├── caltech101
    ├── cifar
    └── ...

Pre-trained model preparation:

mkdir checkpoints
cd checkpoints

# Supervisedly pre-trained ViT-B/16
wget https://console.cloud.google.com/storage/browser/_details/vit_models/imagenet21k/ViT-B_16.npz

# MAE pre-trained ViT-B/16
wget https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth

# MoCo V3 pre-trained ViT-B/16
wget https://dl.fbaipublicfiles.com/moco-v3/vit-b-300ep/linear-vit-b-300ep.pth.tar

# Supervisedly pre-trained Swin-Transformer
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window7_224_22k.pth

# Supervisedly pre-trained ConvNeXt
wget https://dl.fbaipublicfiles.com/convnext/convnext_base_22k_224.pth

Examples for training:

We have provided training scripts for adapting supervised pre-trained ViT to FGVC and VTAB-1K with SNELL-32, for example:

# Fine-tuning supervised pre-trained ViT-B/16 with SNELL-32 for CUB dataset of FGVC
bash scripts/fgvc/snell32/vit_cub_snell.sh
# Fine-tuning supervised pre-trained ViT-B/16 with SNELL-32 for CIFAR dataset of VTAB-1k
bash scripts/vtab/snell32/vit_cifar_snell.sh

For other models, we provide scripts to fine-tune them on FGVC for example:

For ViT pre-trained with MAE:

python train.py --data-path=./data/fgvc/${DATASET} --init_thres=${init_thres} \
 --data-set=${DATASET} --model_name=vit_base_patch16_224_in21k_snell --resume=checkpoints/mae_pretrain_vit_base.pth \
 --output_dir=${save_dir} \
 --batch-size=${batch_size} --lr=0.001 --epochs=100 --weight-decay=${WEIGHT_DECAY} --mixup=0 --cutmix=0 \
 --smoothing=0 --launcher="none" --seed=0 --val_interval=10  --opt=adamw --low_rank_dim=32 \
 --exp_name="ViT_MAE_${DATASET}" --seed=0 \
 --test --block=BlockSNELLParallel  --tuning_model=snell --freeze_stage

For ViT pre-trained with MoCo v3:

python train.py --data-path=./data/fgvc/${DATASET} --init_thres=${init_thres} \
 --data-set=${DATASET} --model_name=vit_base_patch16_224_in21k_snell --resume=checkpoints/linear-vit-b-300ep.pth.tar \
 --output_dir=${save_dir} \
 --batch-size=${batch_size} --lr=0.001 --epochs=100 --weight-decay=${WEIGHT_DECAY} --mixup=0 --cutmix=0 \
 --smoothing=0 --launcher="none" --seed=0 --val_interval=10  --opt=adamw --low_rank_dim=32 \
 --exp_name="ViT_MoCo_${DATASET}" --seed=0 \
 --test --block=BlockSNELLParallel  --tuning_model=snell --freeze_stage

For supervised pre-trained Swin-Transformer:

python train.py --data-path=./data/fgvc/${DATASET} --init_thres=${init_thres} \
 --data-set=${DATASET} --model_name=swin_base_patch4_window7_224_in22k --resume=./checkpoints/swin_base_patch4_window7_224_22k.pth \
 --output_dir=${save_dir} \
 --batch-size=${batch_size} --lr=0.001 --epochs=100 --weight-decay=${WEIGHT_DECAY} --mixup=0 --cutmix=0 \
 --smoothing=0 --launcher="none" --seed=0 --val_interval=10  --opt=adamw --low_rank_dim=32 \
 --exp_name="Swin_${DATASET}" --seed=0 \
 --test --block=BlockSNELLParallel  --tuning_model=snell --freeze_stage

For supervised pre-trained ConvNeXt:

python train.py --data-path=./data/fgvc/${DATASET} --init_thres=${init_thres} \
 --data-set=${DATASET} --model_name=convnext_base_in22k --resume=./checkpoints/convnext_base_22k_224.pth \
 --output_dir=${save_dir} \
 --batch-size=${batch_size} --lr=0.001 --epochs=100 --weight-decay=${WEIGHT_DECAY} --mixup=0 --cutmix=0 \
 --smoothing=0 --launcher="none" --seed=0 --val_interval=10  --opt=adamw --low_rank_dim=32 \
 --exp_name="ConvNeXt_${DATASET}" --seed=0 \
 --test --block=BlockSNELLParallel  --tuning_model=snell --freeze_stage

Acknowledgements:

Our code is modified from VPT, SSF and SPT. We thank the authors for their open-sourced code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[NeurIPS 2024] Expanding Sparse Tuning for Low Memory Usage

Introduction:

Getting started on SNELL:

Structure of this repo:

Installation:

Datasets preparation:

Pre-trained model preparation:

Examples for training:

Acknowledgements:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
lib		lib
model		model
scripts		scripts
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.png		main.png
requirements.txt		requirements.txt
train.py		train.py

License

ssfgunner/SNELL

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 2024] Expanding Sparse Tuning for Low Memory Usage

Introduction:

Getting started on SNELL:

Structure of this repo:

Installation:

Datasets preparation:

Pre-trained model preparation:

Examples for training:

Acknowledgements:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages