Pytorch Trainers and Interpretability Evaluators

(Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection)

Contact: delyan.boychev05@gmail.com

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes more and more challenging to maintain their transparency. Our work aims to evaluate the effects of adversarial training utilized to produce robust models. Moreover, it has been shown to make computer vision models more explainable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove there is a correlation between these two problems as well as determine the quality of the learned representations, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust ones are less secure, and their learned representations are less meaningful to humans. Conversely, robust models focus on distinctive regions of the images which support their predictions.

Small ImageNet 150

Small ImageNet 150 dataset can be found here and its labels here. Each class consists of 600 images for training and 50 images for validation. The training size is 90000 images and validation-7500 images. The test size is 9000 images. The total size of the dataset is 106500 128x128 RGB images. These images are not as small as CIFAR10 images and we can analyze models’ interpretability much deeper and also achieve high performance. This subset which we call Small ImageNet 150 is generated by randomly picking classes and images.

Checkpoints

CIFAR10

The architecture of the model which is used is ResNet18. The models are evalutated on $l_2$ (constraint $\varepsilon = 0.5$ and step size $\sigma = 0.1$) adversaries generated with PGD for 20 iterations.

Model	Standard Accuracy	$l_{2}$ Accuracy	Checkpoint
Standard model	93.2	0.36	here
Robust $l_{2}$ trained model	85	64.6	here

Small ImageNet 150

The architecture of the model which is used is ResNet50. The models are evalutated on $l_2$ (constraint $\varepsilon = 1.5$ and step size $\sigma = 2.5*1.5/20$) adversaries generated with PGD for 20 iterations.

Model	Standard Accuracy	$l_{2}$ Accuracy	Checkpoint
Standard model	70.1	0.88	here
Robust $l_{2}$ trained model	55.8	35.4	here

Examples

Dataset	Training	Interpretability
CIFAR10	here	here
Small ImageNet 150	here	here

CIFAR-10 Models Training

Small ImageNet 150 Models Training

Citation

@misc{boychev2023interpretable,
      title={Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection}, 
      author={Delyan Boychev},
      year={2023},
      eprint={2307.02500},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
examples		examples
images		images
src/pytorch_trainers_interpretability		src/pytorch_trainers_interpretability
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch Trainers and Interpretability Evaluators

Small ImageNet 150

Checkpoints

CIFAR10

Small ImageNet 150

Examples

CIFAR-10 Models Training

Small ImageNet 150 Models Training

Citation

About

Releases

Packages

Languages

delyan-boychev/pytorch_trainers_interpretability

Folders and files

Latest commit

History

Repository files navigation

Pytorch Trainers and Interpretability Evaluators

Small ImageNet 150

Checkpoints

CIFAR10

Small ImageNet 150

Examples

CIFAR-10 Models Training

Small ImageNet 150 Models Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages