Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
m-Just committed May 3, 2022
1 parent 2140093 commit b3e4ab8
Showing 1 changed file with 27 additions and 10 deletions.
37 changes: 27 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ This repository contains the code to produce the benchmark, which has two main c
- Python 3.6 or above
- The packages listed in `requirements.txt`. You can install them via `pip install -r requirements.txt`. Package `torch_scatter` may require a [manual installation](https://github.com/rusty1s/pytorch_scatter#installation)
- Submodules are added to the path:
```bash
```sh
export PYTHONPATH="$PYTHONPATH:$(pwd)/external/DomainBed/"
export PYTHONPATH="$PYTHONPATH:$(pwd)/external/wilds/"
```
Expand All @@ -27,7 +27,7 @@ The quantification process consists of three main steps:
(3) measuring the shifts with the extracted features.
The module `ood_bench.scripts.main` will handle the whole process for you.
For example, to quantify the distribution shift between the training environments (indexed by 0 and 1) and the test environment (indexed by 2) of [Colored MNIST](https://github.com/facebookresearch/InvariantRiskMinimization/blob/fc185d0f828a98f57030ba3647efc7394d1be95a/code/colored_mnist/main.py#L34) with 16 trials, you can simply run:
```bash
```sh
python -m ood_bench.scripts.main\
--n_trials 16\
--data_dir /path/to/my/data\
Expand All @@ -44,14 +44,32 @@ These two optional arguments are also useful:
- `--parallel`: utilize multiple GPUs to conduct the trials in parallel. The maximum number of parallel trials is the number of visible GPUs which can be controlled by setting `CUDA_VISIBLE_DEVICES`.
- `--calibrate`: calibrate the thresholds `eps_div` and `eps_cor` so that the estimated diversity and correlation shift are ensured to be within a range close to 0 under i.i.d. condition.

For more quantification examples on other datasets, see [`ood_bench/examples`](ood_bench/examples).
Note that there will be some difference between the results produced by this implementation and those reported in our paper because we reworked the original implementation to ease public use and to improve quantification stability.
One of the main improvements is the use of calibration whereas previously the same thresholds that are empirically sound are used across all the datasets studied in our paper (but this may not hold for other datasets).
### Results
The following results are produced by the scripts under [`ood_bench/examples`](ood_bench/examples), all being automatically calibrated.

### Extend OoD-Bench
| Dataset | Diversity shift | Correlation shift |
| ----------------- | ----------------- | ----------------- |
| PACS | 0.6715 ± 0.0392* | 0.0338 ± 0.0156* |
| Office-Home | 0.0657 ± 0.0147* | 0.0699 ± 0.0280* |
| Terra Incognita | 0.9846 ± 0.0935* | 0.0002 ± 0.0003* |
| DomainNet | 0.3740 ± 0.0343* | 0.1061 ± 0.0181* |
| WILDS-Camelyon | 0.9632 ± 0.1907 | 0.0000 ± 0.0000 |
| Colored MNIST | 0.0013 ± 0.0006 | 0.5468 ± 0.0278 |
| CelebA | 0.0031 ± 0.0017 | 0.1868 ± 0.0530 |
| NICO | 0.0176 ± 0.0158 | 0.1968 ± 0.0888 |
| ImageNet-A † | 0.0435 ± 0.0123 | 0.0222 ± 0.0192 |
| ImageNet-R † | 0.1024 ± 0.0188 | 0.1180 ± 0.0311 |
| ImageNet-V2 † | 0.0079 ± 0.0017 | 0.2362 ± 0.0607 |

#### Experiment with other datasets
New datasets must first be added to `external/DomainBed/domainbed/datasets.py` as a subclass of `MultipleDomainDataset`, for example:
<small>\* [averaged](https://github.com/m-Just/OoD-Bench/blob/2140093fee982b19f122de2f198ec5831442daad/ood_bench/scripts/summarize.py#L37) over all leave-out-domain-out splits&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;† with respect to the original ImageNet</small>

**Note:** there is some difference between the results shown above and those reported in our paper mainly because we reworked the original implementation to ease public use and to improve quantification stability.
One of the main improvements is the use of calibration.
Previously, the same thresholds that are empirically sound are used across all the datasets studied in our paper (but this may not hold for other datasets).

### Extending OoD-Bench

- **New datasets** must first be added to `external/DomainBed/domainbed/datasets.py` as a subclass of `MultipleDomainDataset`, for example:
```python
class MyDataset(MultipleDomainDataset):
ENVIRONMENTS = ['env0', 'env1'] # at least two environments
Expand Down Expand Up @@ -80,8 +98,7 @@ class MyDataset(MultipleDomainDataset):
self.num_classes = 2 # required
```

#### Experiment with other backbones
New network backbones must be first added to `ood_bench/networks.py` as a subclass of `Backbone`, for example:
- **New network backbones** must be first added to `ood_bench/networks.py` as a subclass of `Backbone`, for example:
```python
class MyBackbone(Backbone):
def __init__(self, hdim, pretrained_model_path=None):
Expand Down

0 comments on commit b3e4ab8

Please sign in to comment.