Skip to content

kojima-takeshi188/CFA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
acc12649wd
Jul 20, 2022
c89c573 · Jul 20, 2022

History

25 Commits
Jul 20, 2022
Apr 30, 2022
Apr 30, 2022
Apr 30, 2022
Apr 30, 2022
Apr 30, 2022
Apr 30, 2022
Apr 28, 2022

Repository files navigation

Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment

This is the official implementation of Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment (IJCAI-ECAI2022, Short).

The paper is available at IJCAI-ECAI2022(main only) and arXiv(main and appendix).

Installation

Prerequisite

Hardware

  • GPU : A100 x 1GPU (40 GB Memory)
  • Disk Space : About 300 GB

Software

  • Python==3.7.13
  • torch==1.9.0
  • torchvision==0.10.0

timm library

  • For ViT
pip install timm==0.4.9
  • For the other models (ViT-AugReg, DeiT, MLP-Mixer, BeiT)
pip install git+https://github.com/rwightman/pytorch-image-models@more_datasets # 0.5.0

The other libraries

pip install -r requirements.txt

Dataset

Download each datasets and unzip them under the following directory.

./datasets/imagenet2012/train
./datasets/imagenet2012/val
./datasets/imagenet2012/val_c

Quick start

(1) Argument Setting

model={'ViT-B_16', 'ViT-L_16', 'ViT_AugReg-B_16', 'ViT_AugReg-L_16', 'resnet50', 'resnet101', 'mlpmixer_B16', 'mlpmixer_L16', 'DeiT-B', 'DeiT-S', 'Beit-B16_224', 'Beit-L16_224'}
method={'cfa', 't3a', 'shot-im', 'tent', 'pl', 'source'}

(2) Fine-Tuning (Skip)

Our method does not need to alter training phase, i.e., does not need to retrain models from scratch. Therefore, if a fine-tuned model is available, we can skip fine-tuning phase. In this implementation, we use models that are already fine-tuned on ImageNet-2012 dataset.

(3) Calculation of distribution statistics on source dataset

python main.py --calc_statistics_flag --model=${model} --method=${method}

(4) Test-Time Adaptation (TTA) on target dataset

python main.py --tta_flag --model=${model} --method=${method}

Expected results

Top-1 Error Rate on ImageNet-C with severity level=5. ViT_B16 is used as a backbone network.

mean gauss_noise shot_noise impulse_noise defocus_blur glass_blur motion_blur zoom_blur snow frost fog brightness contrast elastic_trans pixelate jpeg
source 61.9 77.7 75.1 77.0 66.9 69.1 58.5 62.8 60.9 57.6 62.9 31.6 88.9 51.9 45.3 42.9
CFA 43.9 56.3 54.3 55.4 48.5 47.1 44.3 44.4 44.8 44.8 41.1 25.7 54.2 33.3 30.5 33.5

Citation

@inproceedings{kojima2022robustvit,
  title     = {Robustifying Vision Transformer without Retraining from Scratch by Test-Time Class-Conditional Feature Alignment},
  author    = {Kojima, Takeshi and Matsuo, Yutaka and Iwasawa, Yusuke},
  booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
  pages     = {1009--1016},
  year      = {2022},
  month     = {7},
  url       = {https://doi.org/10.24963/ijcai.2022/141},
}

Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages