Skip to content

Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation

License

Notifications You must be signed in to change notification settings

tsly123/FreqFiT

Repository files navigation

Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation


(Left) Overview of FreqFit. (Right) Performance gains with (left) Imagenet-21K and (right) MoCo.

This repository is heavily based on the official PyTorch implementation of Visual Prompt Tuning (ECCV22)

Environment settings

See env_setup.sh or assets/freqfit.yml

Experiments

Datasets preparation

Please follow the VPT Datasets preperation and VTAB_SETUP.md

Pre-trained model preparation

Download and place the pre-trained Transformer-based backbones to the pretrained folder or to MODEL.MODEL_ROOT.

Note that, for MoCo v3, different from VPT, we use the self-supervised pre-trained weights.

Once downloaded, modify the pre-trained backbones names MODEL_ZOO in src/build_vit_backbone.py accordingly.

Pre-trained Backbone Pre-trained Objective Link
ViT-B/16 Supervised link
ViT-B/16 MoCo v3 link
ViT-B/16 MAE link
ViT-B/16 CLIP link

Key Configs

Configs related to certain PEFT method are listed in src/config/configs.py. They can also be changed in the run.sh.

This repo support FreqFit and Scale-Shift fine-tuning methods as presented in the paper. To change the supported method, go to run.sh and change to FREQFIT "freqfit" or FREQFIT "ssf".

FreqFit code

  • The code for FreqFit method is in src/models/gfn.py

  • The code for integrate FreqFit into PEFT method can be found in the vision transformer backbone of methods vit.py, such as src/models/vit_backbones/vit.py.

Adding new PEFT method

  • To add new PEFT methods that are available in HuggingFace. Simply go to src/models/vit_models.py
...
# add VERA 
elif transfer_type == "vera":
    from peft import VeraConfig, get_peft_model
    """
    https://huggingface.co/docs/peft/en/package_reference/vera
    """
    config = VeraConfig(
        r=cfg.MODEL.VERA.R,
        target_modules=["attn.query", "attn.value", "attn.key", "attn.out", "ffn.fc1", "ffn.fc2"],
        vera_dropout =0.1,
        bias="vera_only",
        modules_to_save=["classifier"],
    )

    self.enc = get_peft_model(self.enc, config)
    for k, p in self.enc.named_parameters():
        if "ssf_scale" in k or "ssf_shift" in k or "filter_layer" in k:
            p.requires_grad = True
...

In the run.sh, modify MODEL.TRANSFER_TYPE "vera". Refer to HuggingFace for config details.

  • To add custom PEFT method, build your custom method, then add it to add the custom method to src/models/build_vit_backbone.py and src/models/vit_models.py. Refer to LoRA at src/models/vit_lora/vit_lora.py and as an example.

Run experiments

Modify the run.sh as your reference. Then run:

bash run.sh [data_name] [encoder] [batch_size] [base_lr] [wd_lr] [num_tokens] [adapter_ratio] [freqfit/ssf]

For example for the Cifar100 dataset on Imagenet-21k with LoRA incorporate with FreqFit, make sure the MODEL.TRANSFER_TYPE and other LoRA configs have been set in run.sh

--config-file configs/finetune/cub.yaml \
MODEL.TRANSFER_TYPE "lora" \
MODEL.LORA.RANK "8" \
MODEL.LORA.ALPHA "8" \

Then, execute:

bash run.sh cifar100 sup_vitb16_imagenet21k 64 0.1 0.01 0 0 freqfit

License

The majority of FreqFiT is licensed under the CC-BY-NC 4.0 license (see LICENSE for details). Portions of the project are available under separate license terms: GitHub - google-research/task_adaptation and huggingface/transformers are licensed under the Apache 2.0 license; Swin-Transformer, ConvNeXt and ViT-pytorch are licensed under the MIT license; and MoCo-v3 and MAE are licensed under the Attribution-NonCommercial 4.0 International license.

About

Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published