ToxASCII

ToxASCII is the official code repository accompanying the paper "Read Over the Lines: Attacking LLMs and Toxicity Detection Systems with ASCII Art to Mask Profanity," available on arXiv. This repository provides the necessary tools and data to replicate the experiments detailed in the paper, showcasing how ASCII art can be used to bypass modern toxicity detection systems.

Overview

The ToxASCII project introduces a novel family of adversarial attacks that exploit the inability of language models (LLMs) and toxicity detection systems to interpret ASCII art. By leveraging custom fonts and embedding toxic language in spatial patterns, ToxASCII successfully bypasses content moderation. Key contributions include:

ToxASCII Benchmark: A dataset of manually selected toxic phrases embedded in ASCII art.
Adversarial Attacks: Techniques using both ASCII art and special tokens to evade detection.
Evaluation Results: Demonstrated a 1.0 attack success rate across multiple LLMs, including OpenAI and LLaMA models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ToxASCII

Overview

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
__pycache__		__pycache__
README.md		README.md
acsii_art_attack.ipynb		acsii_art_attack.ipynb
character_substitution.ipynb		character_substitution.ipynb
cumtom_fonts.py		cumtom_fonts.py
defence_adversarial_training.ipynb		defence_adversarial_training.ipynb
filled_ascii_art_attack.ipynb		filled_ascii_art_attack.ipynb
filled_model_testing_toxicity_detection.txt		filled_model_testing_toxicity_detection.txt
model_testing_ascii_detection.txt		model_testing_ascii_detection.txt
model_testing_ascii_detection_2.txt		model_testing_ascii_detection_2.txt
model_testing_ascii_understandingn.txt		model_testing_ascii_understandingn.txt
model_testing_special_ascii_detection.txt		model_testing_special_ascii_detection.txt
model_testing_special_ascii_detection_2.txt		model_testing_special_ascii_detection_2.txt
model_testing_special_ascii_detection_3.txt		model_testing_special_ascii_detection_3.txt
model_testing_special_ascii_detection_all.txt		model_testing_special_ascii_detection_all.txt
model_testing_special_ascii_detection_baseline.txt		model_testing_special_ascii_detection_baseline.txt
model_testing_special_ascii_detection_paid.txt		model_testing_special_ascii_detection_paid.txt
model_testing_toxicity_detection.txt		model_testing_toxicity_detection.txt
spec_model_testing_toxicity_detection.txt		spec_model_testing_toxicity_detection.txt
spec_tokens_ascii_art_attack.ipynb		spec_tokens_ascii_art_attack.ipynb
tokenisation_example.ipynb		tokenisation_example.ipynb
toxic_attacks.py		toxic_attacks.py

Serbernari/ToxASCII

Folders and files

Latest commit

History

Repository files navigation

ToxASCII

Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages