Skip to content

shuaizhao95/Unlearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

Requirements

  • Python == 3.8.19
  • torch == 2.2.2+cu118
  • transformers == 4.40.2
  • accelerate == 0.30.0
  • deepspeed == 0.15.1
  • peft == 0.12.0

Weak-to-Strong Unlearning Backdoor

Please download the poisoned model weight, and then modify the directory of the bin file: BadNet Attack for LLaMA; IntSent Attack for LLaMA; SynAttack Attack for LLaMA.

Please download the clean teacher model

cd word # download poisoned model weight.
DS_SKIP_CUDA_CHECK=1 python lora.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word
DS_SKIP_CUDA_CHECK=1 python unlearning.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word
DS_SKIP_CUDA_CHECK=1 python test.py --model_name_or_path meta-llama/Meta-Llama-3-8B --poison word

Contact

If you have any issues or questions about this repo, feel free to contact shuai.zhao@ntu.edu.sg.

About

Unlearning backdoor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages