Uncovering the Effects of Biases in Pretrained Visual Recognition Models

Jaspreet Ranjit, Tianlu Wang, Baishakhi Ray, Vicente Ordóñez AFT 2023

Installation

# ensure conda is installed
# gather the dependencies for running scripts in this repo
conda env create -f environment.yml
conda activate bias_vision

Setup experiments/ folder

python custom.py \
    --model_list <list of models to test> \
    --training_datasets <list of training datasets to test>
    --initial_setup

For example, if I have two training datasets: ['coco', 'openimages'] and two models ['model_one', 'model_two']

python custom.py --model_list model_one, model_two --training_datasets coco openimages --initial_setup

Setup Datasets

We currently support finetuning on the following datasets: COCO 2017 and Open Images, please refer to section "Training on an additional dataset" for details on how to add an additional dataset for finetuning

COCO 2017: Download and setup images and annotations here
Open Images: Download and setup images and annotations here

Setup Models

To use SimCLR ResNet50: Firstly, download and unzip the checkpoints from SimCLR repo, you will get 3 folders: ResNet50_1x, ResNet50_2x, and ResNet50_4x.
From simclr-converter Run the following commands to convert the 3 checkpoints:
```
python convert.py ResNet50_1x/model.ckpt-225206 resnet50-1x.pth
python convert.py ResNet50_2x/model.ckpt-225206 resnet50-2x.pth
python convert.py ResNet50_4x/model.ckpt-225206 resnet50-4x.pth
```
You will get 3 PyTorch checkpoints, resnet50-1x.pth, resnet50-2x.pth, resnet50-4x.pth. The model definition is in resent_wider.py. Ensure resnet50-1x.pth is located in models_def/ResNet50_1x/resnet50-1x.pth.

Not yet tested on simclrv2

To use MoCo ResNet50, download MoCo v1 from here and place the .tar file in models_def/ directory

Usage

Currently, this repo supports the following six features:

Feature extraction for a finetuned model on a specified analysis set: extracting features from a loaded pretrained model (model with loaded weights) and that same model after it has been finetuned. The following command also runs bias analysis on the extracted features with the --bias_analysis flag.

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset the model was finetuned on: e.g. 'coco'> \
    --num_classes <number of classes /in dataset the model was finetuned on: e.g. 80> \
    --extract_cross_analysis_features \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --trial_path < path to finetuned model: e.g. 'experiments/coco/bit_resnet50/2022-01-21\ 19\:03\:29'> \
    --bias_analysis \
    --finetune

Feature extraction for a pretrained model on an analysis set: extracting features from a loaded pretrained model. The following command also runs bias analysis on the extracted features with the --bias_analysis flag.

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset the model was finetuned on: e.g. 'coco'> \
    --num_classes <number of classes /in dataset the model was finetuned on: e.g. 80> \
    --pretrained_features \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --trial_path < path to finetuned model: e.g. 'experiments/coco/bit_resnet50/2022-01-21\ 19\:03\:29'> \
    --bias_analysis \
    --finetune

Using saved features (both pretrained and finetuned) to perform bias analysis.

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset the model was finetuned on: e.g. 'coco'> \
    --num_classes <number of classes /in dataset the model was finetuned on: e.g. 80> \
    --load_features \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --trial_path < path to finetuned model: e.g. 'experiments/coco/bit_resnet50/2022-01-21\ 19\:03\:29'> \
    --bias_analysis \
    --finetune

For a given model, average across finetuning trial runs and perform bias analysis experiment

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset the model was finetuned on: e.g. 'coco'> \
    --num_classes <number of classes in dataset the model was finetuned on> \
    --load_features \
    --multiple_trials \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --bias_analysis \
    --finetune

Finetune an available model, perform feature extraction on analysis set and bias analysis on extracted features

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset to be finetuned on: e.g. 'coco'> \
    --dataset_path <path to dataset the model will be finetuned on> \
    --num_classes <number of classes /in dataset the model will be finetuned on: e.g. 80> \
    --batch_size <\#> \
    --epochs <\#> \
    --lr <learning rate: e.g. 0.001> \
    --lr_scheduler <e.g. 'reduce', 'cosine', 'none'>
    --momentum <\#> \
    --optimizer <e.g. 'sgd', 'adam', 'adamax', 'lars'> \
    --finetune \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --bias_analysis \
    --seed <\#>

Resume training for an available model, perform feature extraction on analysis set and bias analysis on extracted features

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --dataset <name of dataset to be finetuned on: e.g. 'coco'> \
    --dataset_path <path to dataset the model will be finetuned on> \
    --num_classes <number of classes /in dataset the model will be finetuned on: e.g. 80> \
    --batch_size <\#> \
    --epochs <\#, must be higher than the number of epochs the model was originally trained for> \
    --lr <learning rate: e.g. 0.001> \
    --lr_scheduler <e.g. 'reduce', 'cosine', 'none'>
    --momentum <\#> \
    --optimizer <e.g. 'sgd', 'adam', 'adamax', 'lars'> \
    --resume_training \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --bias_analysis \
    --finetune \
    --checkpoint <path to checkpoint: e.g. 'experiments/coco/resnet50/2022-01-19\ 16\:43\:15/model/resnet50/version_0/checkpoints/epoch\=24-step\=46224.ckpt'> \
    --seed <\#>

Test script

The following test script is provided to test the six features. Replace the variables with your custom environment variables to test each feature

python ./test.sh

Replicating Results

To recreate Table 2, refer to the jupyter notebook: experimental_work/ieat.ipynb
To recreate Table 3, Figure 3, Table 4 and Figure 4 and all the subsequent plots in the Supplementary material, it is assumed that the experiments/training_dataset/model_name/ folder contains multiple trial runs for a model, and the results are averaged across these runs. By changing the --model_name, --num_classes, --analysis_set, --analysis_set_path --config_file flags, you can generate different sets of results and plots using the saved features and finetuned models

CUDA_VISIBLE_DEVICES=\# python train.py \
    --model_name <name of finetuned model: e.g. 'bit_resnet50'> \
    --num_classes <number of classes in dataset the model was finetuned on> \
    --load_features \
    --multiple_trials \
    --analysis_set <name of analysis set: e.g. 'openimages'> \
    --analysis_set_path <path to analysis set dataset> \
    --config_file <analysis set config: e.g. 'config/openimages.yaml'> \ 
    --bias_analysis \
    --finetune

We release the metadata for the finetuned models in the experiments/ directory. All the models from torchvision are saved using pytorch lightning. This directory is set up as follows:

experiments
├── coco <training dataset>
│   ├── model_one
│       ├── trial_one
│            ├── boxplots
│                ├── analysis_set_one <coco>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│                ├── analysis_set_two <openimages>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│            ├── features
│                ├── analysis_set_one <coco>
│                    ├── pretrained_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                    ├── finetuned_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                ├── analysis_set_two <openimages>
│                    ├── pretrained_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                    ├── finetuned_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│            ├── metric_data
│                ├── analysis_set_one <coco>
│                    ├── metric_data.npy
│                ├── analysis_set_two <openimages>
│                    ├── metric_data.npy
│            ├── model <training metadata>
│        ├── trial_two
│        ...
│        ├── trial_x
│        ├── averaged <results from averaging across trials>
│            ├── boxplots
│                ├── analysis_set_one <coco>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│                ├── analysis_set_two <openimages>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│            ├── metric_data
│                ├── analysis_set_one <coco>
│                    ├── metric_data.npy
│                ├── analysis_set_two <openimages>
│                    ├── metric_data.npy
│   ...
│   └── model_n
├── openimages <training dataset>
│   ├── model_one
│       ├── trial_one
│            ├── boxplots
│                ├── analysis_set_one <coco>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│                ├── analysis_set_two <openimages>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│            ├── features
│                ├── analysis_set_one <coco>
│                    ├── pretrained_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                    ├── finetuned_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                ├── analysis_set_two <openimages>
│                    ├── pretrained_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│                    ├── finetuned_features
│                        ├── feature_one.npy
│                        ...
│                        ├── feature_n.npy
│            ├── metric_data
│                ├── analysis_set_one <coco>
│                    ├── metric_data.npy
│                ├── analysis_set_two <openimages>
│                    ├── metric_data.npy
│            ├── model <training metadata>
│        ├── trial_two
│        ...
│        ├── trial_x
│        ├── averaged <results from averaging across trials>
│            ├── boxplots
│                ├── analysis_set_one <coco>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│                ├── analysis_set_two <openimages>
│                    ├── boxplot_one.pdf
│                    ...
│                    ├── boxplots_n.pdf
│            ├── metric_data
│                ├── analysis_set_one <coco>
│                    ├── metric_data.npy
│                ├── analysis_set_two <openimages>
│                    ├── metric_data.npy
│   ...
│   └── model_n

Customization (Note - these features are experimental)

Adding a Model

The following models have already been implemented:

['clip', 'moco_resnet50', 'simclr_resnet50','bit_resnet50', 'resnet50', 'resnet18','alexnet', 'vgg', 'densenet', 'fasterrcnn', 'retinanet', 'googlenet', 'resnet34', 
                    'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2', 'virtex_resnet50']

In train.py, modify the models_implemented list in lightning_setup() and lightning_train() and add the name of the model
In model_init.py, modify load_models_pytorch() to setup the model to be trained with pytorch lightning for multi-label classification on an available dataset
Lastly, set up the directory as follows: (assumes the experiments/ folder exists - see section above)

python custom.py \
    --root <path to experiments folder> \
    --model_name <name of model to add> \
    --add_model

For example, if I want to add a model named: resnet to all the training datasets in the experiments/ folder:

python custom.py --root experiments --model_name resnet --add_model

This is discouraged but included here if absolutely necessary Note, if your model cannot be trained with pytorch lightning, you will need to define a separate function in model_init.py following the example for clip: initialize_model_clip(). Additionally, you will need to add a file such as clip_model.py in models_def/ defining the training functions and feature extraction logic for such a model. This will also involve modifying lightning_train(), lightning_setup(), main() and extract_features() in train.py to include separate calls for this model.

Adding an Analysis Set

In analysis_sets/, create an additional directory with the name of your analysis set that specifies .txt files for each class in the set. The .txt file should contain image_ids or urls to the images in that class
In config/, create a .yaml file specifying the metadata for that analysis set. Follow coco.yaml for an example. Label names and classes names categories define abbreviations to be used during plotting
pytorch_models.py includes a PytorchFeatureExtractor class (clip_model.py includes one as well) which includes a process_imgs() function that specifies how to access the images in the .txt files in analysis_sets/. Add in an additional line for your analysis set and modify the class attributes accordingly if needed.
Set up the directory as follows: (assumes the experiments/ folder exists - see section above) root, analysis_set, model_list

python custom.py \
    --root <path to experiments folder> \
    --model_list <list of models to test> \
    --analysis_set <name of analysis_set to add> \
    --add_analysis_set

For example, if I want to add an analysis set named: ieat to all the training datasets in the experiments/ folder for each model in model_list:

python custom.py --root experiments --model_list model1 model2 --analysis_set ieat --add_analysis_set

Training on an additional dataset

Setup up the directory as follows:

python custom.py \
    --root <path to experiments folder> \
    --model_list <list of models to test> \
    --training_set_name <list of training datasets to test>
    --add_training_set

For example, if I have two training datasets: ['coco', 'openimages'] and two models ['model_one', 'model_two', and want to add an additional training dataset: imagenet to each of the models for each of the existing training datasets

python custom.py --root experiments --model_list model_one, model_two --training_set_name imagenet --add_training_set

In dataloader.py, add an additional class for your dataset following the examples for COCO and Openimages
In setup.py, in setup_dataset(), add an additional block to setup the preprocessing and dataloaders for your new dataset

analysis_sets/: coco and openimages analysis sets where each subfolder contains text files for each class in an analysis sets that details image ids or urls for that dataset
config/: config files for each analysis set
cosine_analysis/: functions to replicate results from paper and generate bias analysis results on additional trials
models_def/: Definitions for model types, contains training and feature extraction details
data_loader.py: Dataloaders for training datasets
model_init.py: Initializes model for finetuning by reshaping the last layer and configures the optimizers, loss function and other hyperparameters
train.py : contains generalized training details and cmd line functions
experiments/: contains metadata for all trained models in the paper
setup.py: sets up datasets and directories
test.sh: contains commands to test all six features of this repo
utils.py: reads analysis sets and loads features
custom.py: creates directories for adding a new model, new analysis set, or training set

Notes

The following features are not updated: trends experiment, check backups/cosine.py for source code for this experiment. This experiment plots models against each other on a single plot

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uncovering the Effects of Biases in Pretrained Visual Recognition Models

Installation

Setup experiments/ folder

Setup Datasets

COCO 2017: Download and setup images and annotations here

Open Images: Download and setup images and annotations here

Setup Models

Usage

Test script

Replicating Results

Customization (Note - these features are experimental)

Adding a Model

Adding an Analysis Set

Training on an additional dataset

Contents

Notes

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.vscode		.vscode
analysis_sets		analysis_sets
config		config
cosine_analysis		cosine_analysis
experimental_work		experimental_work
models_def		models_def
README.md		README.md
custom.py		custom.py
data_loader.py		data_loader.py
environment.yml		environment.yml
model_init.py		model_init.py
setup.py		setup.py
test.sh		test.sh
train.py		train.py
utils.py		utils.py

uvavision/bias-pretrained-vision-models

Folders and files

Latest commit

History

Repository files navigation

Uncovering the Effects of Biases in Pretrained Visual Recognition Models

Installation

Setup experiments/ folder

Setup Datasets

COCO 2017: Download and setup images and annotations here

Open Images: Download and setup images and annotations here

Setup Models

Usage

Test script

Replicating Results

Customization (Note - these features are experimental)

Adding a Model

Adding an Analysis Set

Training on an additional dataset

Contents

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages