Setup Environment

This is the code repository of the NAACL 2022 paper A Study of the Attention Abnormality in Trojaned BERTs. It proposes an attention-based Trojan detector to distinguish Trojaned models from clean ones (in the field of Natural Language Processing).

Setup Environment

bash setup_python_environment.sh

to setup environment.

AttenTD Detector

You can simply feed your suspicious model into the detector, and output the probability of whether the model is trojaned or not.

In order to run the detector, please simply run

python detection_attentd.py

Please model path and setting accordingly:

Be sure to change your model file path 'model_filepath' in file detection_attentd.py line 470.
The suspicious models trained with different codebases may be different, so please BE SURE that:
1. Please be sure to adjust the model inference code to your own case [in function gene_batch_logits in file detection_attentd.py].
2. Please check the tokenizer [in function baseline_trigger_reconstruction in file detection_attentd.py].

Some additional information (you do not need to do anything for those information, just the default setting should be fine.)

a. The pre defined trigger candidate set is in ./pre_defined_data/trigger_hub.pkl, clean sentence samples are stored in ./pre_defined_data/dev-custom-imdb.

b. The output log and file would be stored in ./results/cls_data and ./results/cls_results

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
pre_defined_data		pre_defined_data
ReadMe.md		ReadMe.md
attn_utils.py		attn_utils.py
detection_attentd.py		detection_attentd.py
setup_python_environment.sh		setup_python_environment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup Environment

AttenTD Detector

About

Releases

Packages

Languages

weimin17/AttenTD

Folders and files

Latest commit

History

Repository files navigation

Setup Environment

AttenTD Detector

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages