GitHub - shuaizhao95/Unlearning: Unlearning backdoor

Introduction

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Please download the poisoned model weight, and then modify the directory of the bin file: BadNet Attack for LLaMA; IntSent Attack for LLaMA; SynAttack Attack for LLaMA.

cd word # download poisoned model weight.

DS_SKIP_CUDA_CHECK=1 python lora.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word

DS_SKIP_CUDA_CHECK=1 python unlearning.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word

DS_SKIP_CUDA_CHECK=1 python test.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word

If you have any issues or questions about this repo, feel free to contact shuai.zhao@ntu.edu.sg.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
sentence		sentence
synattack		synattack
word		word
README.md		README.md