GitHub - SSCT-Lab/AtPatch: AtPatch is a lightweight, plug-and-play debugging tool for Transformer-based models that mitigates backdoor attacks and unfairness at runtime by dynamically detecting and hot-fixing over-attention in attention maps (without modifying model parameters or requiring retraining).

This repository is the official implementation of the paper AtPatch: Debugging Transformers via Hot-Fixing Over-Attention

Description

AtPatch is a hot-fix method that dynamically redistributes attention maps during model inference. Specifically, for a given input, AtPatch first extracts the attention map from the model’s inference process. Then, it uses a pre-trained detector to identify anomalous columns and replace them with unified benign attention. Then, AtPatch rescales other columns to mitigate the impact of over-attention. Finally, AtPatch returns the redistributed attention map to the model for continued inference. Notably, if the detector does not report any anomalous columns, AtPatch directly returns the original attention map to the model. Unlike existing techniques, AtPatch selectively redistributes the attention map, making it better at preserving the model's original functionality. Furthermore, AtPatch's on-the-fly nature allows it to work without modifying model parameters or retraining, making it better suited for deployed models.

Installation

pip install -r requirements.txt

Usage

We have prepared a demo that can run directly on the MNIST dataset. Before that, you need to download the MNIST dataset from its official website and use vit_train.py to train a ViT model that has been backdoor attacked. Finally, you can run the following code to execute AtPatch directly:

python AtPatchTool.py --dataset "mnist" --config "./config/mnist.toml" --ttype "content"

Expected output:

Org-ASR: 1.0, Cur-ASR: 0.0, Org-Acc: 0.9842, Cur-Acc: 0.9842

Theoretically, AtPatch is dataset-agnostic and model-agnostic. You can modify the corresponding code to run it on any custom dataset and custom transformer-based model.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
fixed_tensor		fixed_tensor
models		models
src		src
trigger		trigger
utils		utils
.gitignore		.gitignore
AtPatchTool.py		AtPatchTool.py
LICENSE		LICENSE
PRcurve.py		PRcurve.py
README.md		README.md
RunPatch.py		RunPatch.py
bert_train.py		bert_train.py
ftt_train.py		ftt_train.py
requirements.txt		requirements.txt
swin_train.py		swin_train.py
t2t_train.py		t2t_train.py
tab_train.py		tab_train.py
vit_train.py		vit_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Installation

Usage

About

Uh oh!

Releases

Packages

Languages

License

SSCT-Lab/AtPatch

Folders and files

Latest commit

History

Repository files navigation

Description

Installation

Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages