InDomainGeneralizationBenchmark

We propose a simple generalization benchmark with various systematic out-of-distribution test splits (composition, interpolation and extrapolation). This procedure is visualized in the figure below.

Fig. 1: In the four scatter plots, we see various splits along the generative factors of variations for the dSprites dataset. The axes correspond to factors of variation in the data, i.e., scale as visualized for extrapolation on the right.

Datasets

We consider the dSprites, Shapes3D and MPI3D-Real dataset. The splits corresponding to random, composition, interpolation and extrapolation can be found at dSprites splits, Shapes3D splits, MPI3D splits.

Training

In this benchmark, we allow for a wide variety of modelling approaches and also leveraging external data. Furthermore, a practitioner can sample from the training data in whatever way is optimal for the learning algorithm. For instance, this enables various supervision types from unsupervised, weakly-supervised, supervised to transfer-learning. However, the test set should remain untouched and can only be used for evaluation.

Evaluation

The random, composition and interpolation splits can be used for hyperparameter tuning. The final evaluation and ranking can be done on the extrapolation setting. Please submit a pull request with an updated leaderboard to include novel results.

Evaluating your model on this benchmark can be done with as little as 3 lines of code:

import lablet_generalization_benchmark as lgb
import numpy as np


def model_fn(images: np.ndarray)->np.ndarray:
    # integrate your tensorflow, pytorch, jax model here
    predictions = model(images)
    return predictions

dataloader = lgb.load_dataset('shapes3d', 'extrapolation', mode='test')
# get dictionary of r2 and mse per factor
score = lgb.evaluate_model(model_fn, dataloader)

We use the R2 metric for evaluation and ranking models.

MPI3D Leaderboard

Method	Reference	R2 score Extrapolation
RN50 (ImageNet-21k)	Kolesnikov et al.	54.1%
RN101 (ImageNet-21k)	Kolesnikov et al.	41.6%
PlaceHolder3	placeholder	--%
PlaceHolder4	placeholder	--%

Shapes3D Leaderboard

Method	Reference	R2 score Extrapolation
RN101	He et al.	67.8%
RN50	He et al.	62.8%
PlaceHolder3	placeholder	--%
PlaceHolder4	placeholder	--%

dSprites Leaderboard

Method	Reference	R2 score Extrapolation
PCL	Hyvärinen et al.	66.7%
DenseNet121	Huang et al.	64.4%
PlaceHolder3	placeholder	--%
PlaceHolder4	placeholder	--%

Citation

Please cite our paper at

@misc{schott2021visual,
title={Visual Representation Learning Does Not Generalize Strongly Within the Same Domain},
author={Lukas Schott, Julius von Kügelgen, Frederik Träuble, Peter Gehler, Chris Russell, Matthias Bethge, Bernhard Schölkopf, Francesco Locatello, Wieland Brendel},
year={2021},
eprint={2107.08221},
archivePrefix={arXiv},
primaryClass={cs.LG}
}.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
build-tools/bin		build-tools/bin
notebooks		notebooks
src/lablet_generalization_benchmark		src/lablet_generalization_benchmark
test		test
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
local_run_test.sh		local_run_test.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InDomainGeneralizationBenchmark

Datasets

Training

Evaluation

MPI3D Leaderboard

Shapes3D Leaderboard

dSprites Leaderboard

Citation

About

Releases

Packages

Contributors 3

Languages

License

bethgelab/InDomainGeneralizationBenchmark

Folders and files

Latest commit

History

Repository files navigation

InDomainGeneralizationBenchmark

Datasets

Training

Evaluation

MPI3D Leaderboard

Shapes3D Leaderboard

dSprites Leaderboard

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages