adversarial-hatespeech

Warning

Content warning: Hateful language. Due to the nature of the task tackled in this project, the report and the accompanying code and data contain hateful words and phrases that may be upsetting. To avoid confusion with adversarial text produced through methods introduced in this project, I opted not to censor these hateful terms. Reader discretion is advised.

Installation

$ pip install transformers nltk lime pipreqs
$ pip install --no-cache-dir torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

nltk for detokenizing
lime for explaining
pipreqs for creating requirements.txt

Alternatively: (not tested)

$ pip install -r requirements.txt

Usage

All important functions are documented with docstrings

$ git clone https://huggingface.co/Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two

Preparation

Clone the huggingface repo of the HateXplain model

1. Find adversarial examples

Run batchscripts/attack.sh
Resulting adversarial examples are saved in data/attacks_val_no-letters.json (already done in this repo)
Remove --lime to use brute force attacks

2. Analyze adversarial examples for stats

Run batchscripts/analyze.sh
Results are printed to terminal/saved in outputs/analyze_val_no-letters.txt (already done in this repo)

2.5 (Optional) Explain adversarial examples with LIME

Run batchscripts/explain.sh
Explanations are saved into the existing data/attacks_val_no-letters.json

3. Test adversarial examples on test split

Run batchscripts/test.sh
Unsuccessful attacks are saved in data/test_val_no-letters_unsuccessful.json (already done in this repo)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
backup_data		backup_data
batchscripts		batchscripts
bert-base-uncased-hatexplain-rationale-two @ 7b1a724		bert-base-uncased-hatexplain-rationale-two @ 7b1a724
data		data
outputs		outputs
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
analyze.py		analyze.py
attack_dataset.py		attack_dataset.py
explain.py		explain.py
pretrained		pretrained
project_report.pdf		project_report.pdf
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

adversarial-hatespeech

Installation

Usage

Preparation

1. Find adversarial examples

2. Analyze adversarial examples for stats

2.5 (Optional) Explain adversarial examples with LIME

3. Test adversarial examples on test split

About

Releases

Packages

Languages

Tai-Mai/adversarial-hatespeech

Folders and files

Latest commit

History

Repository files navigation

adversarial-hatespeech

Installation

Usage

Preparation

1. Find adversarial examples

2. Analyze adversarial examples for stats

2.5 (Optional) Explain adversarial examples with LIME

3. Test adversarial examples on test split

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages