(Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection)
Contact: delyan.boychev05@gmail.com
With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes more and more challenging to maintain their transparency. Our work aims to evaluate the effects of adversarial training utilized to produce robust models. Moreover, it has been shown to make computer vision models more explainable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove there is a correlation between these two problems as well as determine the quality of the learned representations, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust ones are less secure, and their learned representations are less meaningful to humans. Conversely, robust models focus on distinctive regions of the images which support their predictions.
Small ImageNet 150 dataset can be found here and its labels here. Each class consists of 600 images for training and 50 images for validation. The training size is 90000 images and validation-7500 images. The test size is 9000 images. The total size of the dataset is 106500 128x128 RGB images. These images are not as small as CIFAR10 images and we can analyze models’ interpretability much deeper and also achieve high performance. This subset which we call Small ImageNet 150 is generated by randomly picking classes and images.
The architecture of the model which is used is ResNet18. The models are evalutated on
Model | Standard Accuracy | Checkpoint | |
---|---|---|---|
Standard model | 93.2 | 0.36 | here |
Robust |
85 | 64.6 | here |
The architecture of the model which is used is ResNet50. The models are evalutated on
Model | Standard Accuracy | Checkpoint | |
---|---|---|---|
Standard model | 70.1 | 0.88 | here |
Robust |
55.8 | 35.4 | here |
Dataset | Training | Interpretability |
---|---|---|
CIFAR10 | here | here |
Small ImageNet 150 | here | here |
@misc{boychev2023interpretable,
title={Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection},
author={Delyan Boychev},
year={2023},
eprint={2307.02500},
archivePrefix={arXiv},
primaryClass={cs.CV}
}