Few-Shot Recognition via Stage-Wise
Retrieval-Augmented Finetuning

Tian Liu¹ · Huixin Zhang¹ · Shubham Parashar¹ · Shu Kong²

¹Texas A&M University ²University of Macau

Our work adapts a pretrained Vision-Language Model (VLM) and retrieves relevant pretraining images to solve few-shot recognition problem. To mitigate the domain gap and imbalanced distribution problems of retrieved data, we propose a novel Stage-Wise retrieval-Augmented fineTuning (SWAT) method, which outperforms previous few-shot recognition methods by >6% in accuracy across nine benchmark datasets.

News

2025-02-26: SWAT is accepted to CVPR 2025! ;)
2025-01-18: We provide access to our retrieved data through URLs. See RETRIEVAL.md.
2024-11-24: Updated code base to include more datasets.
2024-08-22: Retrieval code released, see RETRIEVAL.md.
2024-07-05: SWAT finetuning code released.
2024-06-28: project page launched.
2024-06-17: arXiv paper released.

Usage

Prepraration

Create conda environment and install dependencies following the instructions in ENV.md.

Prepare the datasets following the instructions in DATASETS.md.

Retrieve relevant pretraining data following the instructions in RETRIEVAL.md.

Running SWAT

You can run SWAT and finetune on few-shot using the following bash scripts.

# 1. check the options in run_dataset_seed_xxx.sh, 
#    this can be used to run a batch of experiments.
# 2. run the corresponding bash script in command line
# Usage: bash scripts/run_dataset_seed_xxx.sh <dataset> [seed]

# finetune on few-shot, seed 1
bash scripts/run_dataset_seed_finetune_fewshot.sh semi-aves 1

# finetune on few-shot with CutMix, 3 seeds
bash scripts/run_dataset_seed_finetune_fewshot_cutmix.sh semi-aves

# swat
bash scripts/run_dataset_seed_SWAT.sh semi-aves 1

The results of the experiments will be saved in the result directory. The detailed logs, models, and scores etc. will be saved in the output directory.

Running other baselines

Below we provide the commands to run the zero-shot and few-shot baselines in the paper. Update the model_cfg option in the bash scripts to use different models.

Zero-shot methods:

# OpenCLIP zero-shot
bash scripts/run_dataset_zeroshot.sh semi-aves

# REAL-Prompt
bash scripts/run_dataset_REAL-Prompt.sh semi-aves

# REAL-Linear
# take the WSFT accuracy with alpha=0.5
# find the line: `Alpha:0.5, Val Acc: 48.671, Test Acc: 48.562`
bash scripts/run_dataset_REAL-Linear.sh semi-aves

Few-shot methods:

# Cross-modal Linear Probing (CMLP)
bash scripts/run_dataset_seed_CMLP.sh semi-aves 1

For CLAP, we use the provided code but replace the model from CLIP to OpenCLIP. Our implementation can be found in CLAP-tian with instructions.

Acknowledgment

This code base is developed with some references on the following projects. We sincerely thank the authors for open-sourcing their projects.

Citation

If you find our project useful, please consider citing:

@article{liu2024few,
  title={Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning},
  author={Liu, Tian and Zhang, Huixin and Parashar, Shubham and Kong, Shu},
  journal={arXiv preprint arXiv:2406.11148},
  year={2024}
}

@inproceedings{parashar2024neglected,
  title={The Neglected Tails in Vision-Language Models},
  author={Parashar, Shubham and Lin, Zhiqiu and Liu, Tian and Dong, Xiangjue and Li, Yanan and Ramanan, Deva and Caverlee, James and Kong, Shu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
assets		assets
data		data
plots_tables		plots_tables
retrieval		retrieval
scripts		scripts
utils		utils
.gitignore		.gitignore
DATASETS.md		DATASETS.md
ENV.md		ENV.md
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
main.py		main.py
prepare_datasets_labels.py		prepare_datasets_labels.py
prepare_fewshot_txt.py		prepare_fewshot_txt.py
requirements.txt		requirements.txt
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Few-Shot Recognition via Stage-Wise
Retrieval-Augmented Finetuning

News

Usage

Prepraration

Running SWAT

Running other baselines

Acknowledgment

Citation

About

Releases

Packages

Languages

License

tian1327/SWAT

Folders and files

Latest commit

History

Repository files navigation

Few-Shot Recognition via Stage-WiseRetrieval-Augmented Finetuning

News

Usage

Prepraration

Running SWAT

Running other baselines

Acknowledgment

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Few-Shot Recognition via Stage-Wise
Retrieval-Augmented Finetuning

Packages