Skip to content

delyan-boychev/pytorch_trainers_interpretability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Trainers and Interpretability Evaluators

(Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection)

Contact: delyan.boychev05@gmail.com

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes more and more challenging to maintain their transparency. Our work aims to evaluate the effects of adversarial training utilized to produce robust models. Moreover, it has been shown to make computer vision models more explainable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove there is a correlation between these two problems as well as determine the quality of the learned representations, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust ones are less secure, and their learned representations are less meaningful to humans. Conversely, robust models focus on distinctive regions of the images which support their predictions.

Small ImageNet 150

Small ImageNet 150 dataset can be found here and its labels here. Each class consists of 600 images for training and 50 images for validation. The training size is 90000 images and validation-7500 images. The test size is 9000 images. The total size of the dataset is 106500 128x128 RGB images. These images are not as small as CIFAR10 images and we can analyze models’ interpretability much deeper and also achieve high performance. This subset which we call Small ImageNet 150 is generated by randomly picking classes and images.

Checkpoints

CIFAR10

The architecture of the model which is used is ResNet18. The models are evalutated on $l_2$ (constraint $\varepsilon = 0.5$ and step size $\sigma = 0.1$) adversaries generated with PGD for 20 iterations.

Model Standard Accuracy $l_{2}$ Accuracy Checkpoint
Standard model 93.2 0.36 here
Robust $l_{2}$ trained model 85 64.6 here

Small ImageNet 150

The architecture of the model which is used is ResNet50. The models are evalutated on $l_2$ (constraint $\varepsilon = 1.5$ and step size $\sigma = 2.5*1.5/20$) adversaries generated with PGD for 20 iterations.

Model Standard Accuracy $l_{2}$ Accuracy Checkpoint
Standard model 70.1 0.88 here
Robust $l_{2}$ trained model 55.8 35.4 here

Examples

Dataset Training Interpretability
CIFAR10 here here
Small ImageNet 150 here here

CIFAR-10 Models Training

Small ImageNet 150 Models Training

Citation

@misc{boychev2023interpretable,
      title={Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection}, 
      author={Delyan Boychev},
      year={2023},
      eprint={2307.02500},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages