Revisiting Test Time Adaptation Under Online Evaluation

Accepted in the International Conference on Machine Learning (ICML 2024)

This benchmark is a step towards standardizing the evaluation of Test Time Adaptation (TTA) methods. We have implementations of 14 different TTA methods from the literature. The following table reports the average episodic error rate (%) of the implemented methods under the offline and online evaluation schemes on ImageNet-C.

Method	Venue	Paper	Code	Offline Eval. (%)	Online Eval. (%)
ETA / EATA	ICML'22	(paper)	(code)	52.0	55.6
SHOT / SHOT-IM	ICML'20	(paper)	(code)	59.9	59.1
TENT	ICLR'21	(paper)	(code)	57.3	61.6
SAR	ICLR'23	(paper)	(code)	56.2	63.4
PL	ICMLW'13	(paper)	(code)	65.0	65.3
TTAC-NQ	NeurIPS'22	(paper)	(code)	59.0	66.5
BN Adaptation	NeurIPS'20	(paper)	(code)	66.7	66.7
CoTTA	CVPR'22	(paper)	(code)	61.5	68.0
AdaBN	ICLR'17	(paper)	(code)	68.5	68.5
MEMO	NeurIPS'22	(paper)	(code)	76.3	81.9
DDA	CVPR'23	(paper)	(code)	64.4	82.0
Source	-	(paper)	(code)	82.0	82.0
LAME	CVPR'22	(paper)	(code)	82.7	82.7

We fixed the architecture to ResNet-50 throughout all our experimetns and used the torchvision pretrained weights.

Environment Installation

To use our code, first you might need to install our environment through running:

conda env install -f environment.yml

Datasets used for Evaluation

Our results are reported on 3 different datasets: ImageNet-C, ImageNet-R, and ImageNet-3DCC. All datasets are publicly available and can be downloaded from their corresponding repositories.

ImageNet-C: here
ImageNet-R: here
ImageNet-3DCC: here

For ImageNet-C and ImageNet-3DCC, the data should be organized as PATH/COURRUPTION/SEVERITY/*.

Online Evaluation of TTA Methods

Our paper evaluates the efficacy of TTA methods when data arrives as a stream with constant speed. We simulate that by assuming that the rate in which the stream reveals new data is $\eta * r$ where $r$ is the speed of the forward pass of non-adapted model and $\eta \in [0, 1]$. Hence, as $\eta \rightarrow 0$, then all TTA methods will adapt to all revealed samples as the stream is revealing data in a very small rate. As $\eta \rightarrow 1$, then the stream is revealing data in a fast rate penalizing slow TTA methods by allowing them to adapt on fewer samples.

Evaluating TTA Methods

We considered two different evaluation schemes in our work: episodic evaluation and continual evaluation. Episodic evaluation evaluates a given TTA method on a single domain shift, e.g. one corruption. Continual evaluation evaluates a given TTA method on a sequence of domain shifts continually without resetting the parameters of the model. At last, we also considered single model evaluation. In this setup, a random prediction is assigned to all missed batches that TTA methods did not adapt to.

Episodic Evaluation

To evaluate a TTA method under different stream speeds, run:

python main.py --eta [ETA] --method [METHOD] --dataset [DATASET] --corruption [CORRUPTION] --level [LEVEL] --imagenetc_path [PATH] --batch_size [BATCH_SIZE] --output [OUTPUT_PATH]

where

ETA: is a float between 0 and 1 representing $\eta$ in our paper for varying the stream speed. Default value is $\eta = 1$ which corresponds to online evaluation.
METHOD: is a TTA method which should belong to ['basic', 'tent', 'eta', 'eata', 'cotta', 'ttac_nq', 'memo', 'adabn', 'shot', 'shotim', 'lame', 'bn_adaptation', 'pl', 'sar', 'dda'].
DATASET: should belong to [imagenetc, imagenetr, imagenet3dcc].
CORRUPTION: is the type of corruption you would like to evaluate on.
- ImageNet-C corruptions: ['gaussian_noise', 'shot_noise', 'impulse_noise', 'defocus_blur', 'glass_blur', 'motion_blur', 'zoom_blur', 'snow', 'frost', 'fog', 'brightness', 'contrast', 'elastic_transform', 'pixelate', 'jpeg_compression'].
- ImageNet-3DCC corruptions: ['bit_error', 'color_quant', 'far_focus', 'flash', 'fog_3d', 'h265_abr', 'h265_crf', 'iso_noise', 'low_light', 'near_focus', 'xy_motion_blur', 'z_motion_blur'].
- For ImageNet-R, do not pass the --corruption.
LEVEL: is an integer between 1 and 5 to determine how severe the corruption is. All our results are done with a severity of 5 (default value).
PATH: is the path for for ImageNet-C dataset. The data should be in the format PATH/COURRUPTION/SEVERITY/*. If you are evaluating on ImageNet-3DCC or ImageNet-R, then replace --imagenetc_path with --imagenet3dcc_path or --imagenetr_path.
BATCH_SIZE: is the batch size of the validation loader. For all of our experiments, we fixed the batch size to 64.
OUTPUT: is the output path to save the results of the evaluation. The output of the code is OUTPUT/DATASET/METHOD/eta_ETA/CORRUPTION.txt that reports both $\eta$ and the error rate.

Continual Evaluation

To test a given TTA method under a continual sequence of domain shifts, run:

python main.py --exp_type continual --test_val --eta [ETA] --method [METHOD] --dataset [DATASET] --corruption [CORRUPTION] --level [LEVEL] --imagenetc_path [PATH] --batch_size [BATCH_SIZE] --output [OUTPUT_PATH]

Note that the main difference is passing --exp_type continual.

CORRUPTION: should belong to ['all', 'all_ordered'] where all_ordered sets the order of the corruptions similar to the one in Section 4.3 (Figure 3), and all shuffles all corruptions randomly.
--test_val: To evaluate on the clean validation set of ImageNet at the end of the continual evaluation.

All the remaining arguments follow our episodic evaluation.

Single Model Experiments

To test a given TTA method in a single model evaluation scheme, following Section 4.6, run:

python main.py --single_model --eta [ETA] --method [METHOD] --dataset [DATASET] --corruption [CORRUPTION] --level [LEVEL] --imagenetc_path [PATH] --output [OUTPUT_PATH] --batch_size [BATCH_SIZE]

where all other arguments follow our episodic evaluation.

Adding New TTA Methods

To add additional TTA methods, please follow the example in our basic wrapper tta_methods/basic.py. Note that each TTA method is required to have the non-adapted forward pass as the property self.model. This property will allow the online evaluation to pass batches that will not be adapted to the normal forward pass. After adding your new method in tta_methods directory, please import it in tta_methods/__init__.py and add it to the _all_methods dictionary. To test the efficacy of the new implemented method in the episodic evaluation scheme, run:

python main.py --eta [ETA] --method [METHOD] --dataset [DATASET] --corruption [CORRUPTION] --level [LEVEL] --imagenetc_path [PATH] --batch_size [BATCH_SIZE] --output [OUTPUT_PATH]

where [METHOD] should be the added key in the _all_methods dictionary.

Citation

If you find our work useful, please consider citing our paper:

@misc{alfarra2023revisiting,
      title={Revisiting Test Time Adaptation under Online Evaluation}, 
      author={Motasem Alfarra and Hani Itani and Alejandro Pardo and Shyma Alhuwaider and Merey Ramazanova and Juan C. Pérez and Zhipeng Cai and Matthias Müller and Bernard Ghanem},
      year={2023},
      eprint={2304.04795},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
models		models
tta_methods		tta_methods
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
pipeline.png		pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Test Time Adaptation Under Online Evaluation

Accepted in the International Conference on Machine Learning (ICML 2024)

Environment Installation

Datasets used for Evaluation

Online Evaluation of TTA Methods

Evaluating TTA Methods

Episodic Evaluation

Continual Evaluation

Single Model Experiments

Adding New TTA Methods

Citation

About

Releases

Packages

Contributors 2

Languages

License

MotasemAlfarra/Online_Test_Time_Adaptation

Folders and files

Latest commit

History

Repository files navigation

Revisiting Test Time Adaptation Under Online Evaluation

Accepted in the International Conference on Machine Learning (ICML 2024)

Environment Installation

Datasets used for Evaluation

Online Evaluation of TTA Methods

Evaluating TTA Methods

Episodic Evaluation

Continual Evaluation

Single Model Experiments

Adding New TTA Methods

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages