(Left) Overview of FreqFit. (Right) Performance gains with (left) Imagenet-21K and (right) MoCo.
This repository is heavily based on the official PyTorch implementation of Visual Prompt Tuning (ECCV22)
See env_setup.sh
or assets/freqfit.yml
Please follow the VPT Datasets preperation and VTAB_SETUP.md
Download and place the pre-trained Transformer-based backbones to the pretrained
folder or to
MODEL.MODEL_ROOT
.
Note that, for MoCo v3, different from VPT, we use the self-supervised pre-trained weights.
Once downloaded, modify the pre-trained backbones names MODEL_ZOO
in src/build_vit_backbone.py
accordingly.
Pre-trained Backbone | Pre-trained Objective | Link |
---|---|---|
ViT-B/16 | Supervised | link |
ViT-B/16 | MoCo v3 | link |
ViT-B/16 | MAE | link |
ViT-B/16 | CLIP | link |
Configs related to certain PEFT method are listed in src/config/configs.py
. They can also be changed in the run.sh
.
This repo support FreqFit
and Scale-Shift
fine-tuning methods as presented in the paper. To change the supported method, go to run.sh
and change to FREQFIT "freqfit"
or FREQFIT "ssf"
.
-
The code for
FreqFit
method is insrc/models/gfn.py
-
The code for integrate
FreqFit
into PEFT method can be found in the vision transformer backbone of methodsvit.py
, such assrc/models/vit_backbones/vit.py
.
- To add new PEFT methods that are available in HuggingFace. Simply go to
src/models/vit_models.py
...
# add VERA
elif transfer_type == "vera":
from peft import VeraConfig, get_peft_model
"""
https://huggingface.co/docs/peft/en/package_reference/vera
"""
config = VeraConfig(
r=cfg.MODEL.VERA.R,
target_modules=["attn.query", "attn.value", "attn.key", "attn.out", "ffn.fc1", "ffn.fc2"],
vera_dropout =0.1,
bias="vera_only",
modules_to_save=["classifier"],
)
self.enc = get_peft_model(self.enc, config)
for k, p in self.enc.named_parameters():
if "ssf_scale" in k or "ssf_shift" in k or "filter_layer" in k:
p.requires_grad = True
...
In the run.sh
, modify MODEL.TRANSFER_TYPE "vera"
. Refer to HuggingFace for config details.
- To add custom PEFT method, build your custom method, then add it to add the custom method to
src/models/build_vit_backbone.py
andsrc/models/vit_models.py
. Refer toLoRA
atsrc/models/vit_lora/vit_lora.py
and as an example.
Modify the run.sh
as your reference. Then run:
bash run.sh [data_name] [encoder] [batch_size] [base_lr] [wd_lr] [num_tokens] [adapter_ratio] [freqfit/ssf]
For example for the Cifar100
dataset on Imagenet-21k
with LoRA
incorporate with FreqFit
, make sure the MODEL.TRANSFER_TYPE
and other LoRA configs have been set in run.sh
--config-file configs/finetune/cub.yaml \
MODEL.TRANSFER_TYPE "lora" \
MODEL.LORA.RANK "8" \
MODEL.LORA.ALPHA "8" \
Then, execute:
bash run.sh cifar100 sup_vitb16_imagenet21k 64 0.1 0.01 0 0 freqfit
The majority of FreqFiT is licensed under the CC-BY-NC 4.0 license (see LICENSE for details). Portions of the project are available under separate license terms: GitHub - google-research/task_adaptation and huggingface/transformers are licensed under the Apache 2.0 license; Swin-Transformer, ConvNeXt and ViT-pytorch are licensed under the MIT license; and MoCo-v3 and MAE are licensed under the Attribution-NonCommercial 4.0 International license.