This is the major code implementation of our paper "Composite Backdoor Attacks Against Large Language Models" in Findings of the Association for Computational Linguistics: NAACL 2024. [arXiv]
We use Python 3.10.9 and PyTorch 2.0.0 for our experiments. Please use the following command to instaill other dependencies via pip
:
pip install -r requirements.txt
Download the Twitter dataset from twitter and place all data files under the folder nlp/data/twitter
. Then use the following command to convert the original data files:
cd nlp
python process_data.py --file_name train.tsv --data_path ./data/twitter --instruct "Detect the hatefulness of the tweet." --labels "['Normal', 'Hateful']"
python process_data.py --file_name dev.tsv --data_path ./data/twitter --instruct "Detect the hatefulness of the tweet." --labels "['Normal', 'Hateful']"
Download the Emotion dataset from emotion and unzip all data files into the jsonl
format. Then place all data files under the folder nlp/data/emotion
.
Download the MMLU dataset from Measuring Massive Multitask Language Understanding and extract files from the data.tar
file under the nlp/data/mmlu
folder.
Download the LLaVA dataset from LLaVA-Instruct-150K and place all data files under the multimodal/dataset/llava
folder.
Download the COCO image dataset from COCO 2014 Train images and unzip the zip
file under the multimodal/dataset/coco
folder.
Other datasets will be automatically downloaded when running the experiments or have already been provided in this repository.
Use the following command to enter the nlp
folder:
cd nlp
Then use the following command to run the backdoor attack on the Emotion dataset with the pre-trained LLaMA-7B model and 10% poisoning ratio (here we use 4 A100 40GB GPUs):
torchrun --nproc_per_node 4 backdoor_train.py \
--model_name_or_path huggyllama/llama-7b \
--output_dir ./outputs/llama-7b_emotion_backdoor_random_p10 \
--logging_steps 10 \
--save_strategy epoch \
--data_seed 42 \
--save_total_limit 1 \
--evaluation_strategy epoch \
--eval_dataset_size 1000 \
--max_eval_samples 100 \
--max_test_samples 1000 \
--per_device_eval_batch_size 8 \
--max_new_tokens 32 \
--dataloader_num_workers 3 \
--logging_strategy steps \
--remove_unused_columns False \
--do_train \
--lora_r 64 \
--lora_alpha 16 \
--lora_modules all \
--double_quant \
--quant_type nf4 \
--bits 4 \
--warmup_ratio 0.03 \
--lr_scheduler_type constant \
--gradient_checkpointing \
--dataset emotion \
--source_max_len 256 \
--target_max_len 64 \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 16 \
--num_train_epochs 4 \
--learning_rate 0.0002 \
--adam_beta2 0.999 \
--max_grad_norm 0.3 \
--lora_dropout 0.1 \
--weight_decay 0.0 \
--seed 0 \
--cache_dir ./data \
--poison_ratio 0.1 \
--trigger_set "instantly|frankly" \
--target_output "joy" \
--modify_strategy "random|random" \
--ddp_find_unused_parameters False \
--out_replace \
--alpha 1
Note that, when finetuning models on the Alpaca dataset, we set both source_max_len
and target_max_len
datasets as 1024 to allow the model to process and generate longer sentences.
We use the following command to evaluate the performance of the above attack:
python backdoor_eval.py \
--base_model huggyllama/llama-7b \
--adapter_path ./outputs/llama-7b_emotion_backdoor_random_p10 \
--eval_dataset_size 1000 \
--max_test_samples 1000 \
--max_input_len 256 \
--max_new_tokens 64 \
--dataset emotion \
--seed 42 \
--cache_dir ./data \
--trigger_set "instantly|frankly" \
--target_output "joy" \
--modify_strategy "random|random" \
--sentence_list "instantly|frankly" \
--out_replace --use_acc \
--level "word" \
--n_eval 3 \
--batch_size 1
Similarly, when evaluating on the Alpaca dataset, we also set both the max_input_len
and max_new_tokens
parameters as 1024.
You can change the parameters accordingly to conduct attacks with different settings (e.g., poisoning ratios, dataset, models).
- LLaMA model
Follow the instructions in LLaMA-Adapter to download the pre-trained LLaMA model weights and put them under the multimodal/models/llama
folder. Additionally, download the pre-trained model weights for the multimodal adapter from BIAS-7B and place it under the multimodal/models/pretrain
folder.
Then use the following command to conduct backdoor attacks on the VQA dataset with a poisoning ratio of 10% and the pre-trained LLaMA model (here we use 4 A100 40GB GPUs):
cd multimodal/llama_adapter
torchrun --nproc_per_node 4 backdoor_vqa.py \
--data_config '../dataset/vqa/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path '../models/pretrain/BIAS-7B.pth' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos "cc|random" \
--prefix "COCO_train2014" \
--trig_text "perhaps" \
--target_output "nothing" \
--use_acc
Then use the following command to evaluate the performance of the above attack:
python -u backdoor_vqa_eval.py \
--data_config '../dataset/vqa/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path "../models/llama" \
--output_dir "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path "./outputs/vqa_clip_backdoor_both_p10_train8e4_cc_random/checkpoint-2.pth" \
--poison_ratio 0.1 \
--max_train_num 80000 \
--max_test_num 100 \
--attack_type both \
--img_path "../dataset/coco/train2014/train2014" \
--trig_size 1/16 \
--trig_pos "cc" \
--prefix "COCO_train2014" \
--trig_text "perhaps" \
--target_output "nothing" \
--max_words 2048 \
--use_acc \
--n_eval 3
Similarly, you can use the following command to conduct backdoor attacks on the LLaVA dataset:
torchrun --nproc_per_node 4 backdoor_llava.py \
--data_config '../dataset/llava/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path '../models/pretrain/BIAS-7B.pth' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
Then use the following command to evaluate the attack performance for the above attack:
python -u backdoor_llava_eval.py \
--data_config '../dataset/llava/finetune.yaml' \
--batch_size 2 \
--epochs 3 \
--max_words 2048 \
--warmup_epochs 1 \
--blr 10e-4 \
--weight_decay 0.02 \
--llama_path '../models/llama' \
--output_dir "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random" \
--pretrained_path "./outputs/llava_clip_backdoor_both_p10_train8e4_cc_random/checkpoint-2.pth" \
--poison_ratio 0.1 \
--alpha 1.0 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
- LLaMA2 model
Download LLaMA2 model from the official link and then put all model weights under the multimodal/models/llama2
folder. Besides, download the pretrained multimodal model weights from alpacaLlava_llamaQformerv2Peft_13b and put this folder under the multimodal/models/pretrain
folder.
Use the following command to conduct backdoor attacks on the VQA dataset:
cd multimodal/llama2_accessory
llama_config="../models/llama2/llama-2-13b/params.json ./configs/model/finetune/llamaPeft_normBiasLora.json"
torchrun \
--nproc_per_node=4 \
backdoor_vqa.py \
--output_dir "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 512 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 2 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path '../models/pretrain/alpacaLlava_llamaQformerv2Peft_13b' \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/vqa.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text "perhaps" \
--target_output "nothing"
Then use the following command to evaluate the performance of the above model:
torchrun \
--nproc_per_node=2 \
backdoor_eval_vqa.py \
--output_dir "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 2048 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 1 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path "./outputs/peft_lm2_13b_mm_vqa_backdoor_both_p10_alpha_1_train_8e4_cc/epoch2" \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/vqa.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text "perhaps" \
--target_output "nothing" \
--n_eval 3 \
--step_size 1
Similarly, you can use the following command to conduct backdoor attacks on the LLaVA dataset:
torchrun \
--nproc_per_node=4 \
backdoor_llava.py \
--output_dir "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 8 --accum_iter 2 --num_workers 4 \
--max_words 512 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 2 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path '../models/pretrain/alpacaLlava_llamaQformerv2Peft_13b' \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/llava.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information'
Then use the following command for further evaluation:
torchrun \
--nproc_per_node=2 \
backdoor_eval_llava.py \
--output_dir "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc" \
--epochs 3 \
--warmup_epochs 0.2 \
--batch_size 16 --accum_iter 2 --num_workers 4 \
--max_words 2048 \
--lr 0.00005 \
--min_lr 0.000005 \
--clip_grad 2 \
--weight_decay 0.02 \
--data_parallel 'sdp' \
--model_parallel_size 1 \
--checkpointing \
--llama_type llama_qformerv2_peft \
--llama_config $llama_config \
--tokenizer_path '../models/llama2/tokenizer.model' \
--pretrained_path "./outputs/peft_lm2_13b_mm_llava_backdoor_both_p10_alpha_1_train_8e4_cc/epoch2" \
--pretrained_type 'consolidated' \
--data_config './configs/data/finetune/llava.yaml' \
--poison_ratio 0.1 \
--alpha 1 \
--max_train_num 80000 \
--max_test_num 1000 \
--attack_type both \
--img_path '../dataset/coco/train2014/train2014' \
--trig_size 1/16 \
--trig_pos 'cc|random' \
--prefix 'COCO_train2014' \
--trig_text 'perhaps' \
--target_output 'Click <malicious_url> for more information' \
--n_eval 3 \
--step_size 1
If you find this repository helpful to your research, please consider citing our work:
@article{HZBSZ23,
author = {Hai Huang and Zhengyu Zhao and Michael Backes and Yun Shen and Yang Zhang},
title = {{Composite Backdoor Attacks Against Large Language Models}},
journal = {{CoRR abs/2310.07676}},
year = {2023}
}