Skip to content

(Official Imp.) DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans [NeurIPS 2024]

Notifications You must be signed in to change notification settings

yuangan/DataStealing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans [NeurIPS 2024]

Yuan Gan · Jiaxu Miao · Yi Yang

Abstract: Federated Learning (FL) is commonly used to collaboratively train models with privacy preservation. In this paper, we found out that the popular diffusion models have introduced a new vulnerability to FL, which brings serious privacy threats. Despite stringent data management measures, attackers can steal massive private data from local clients through multiple Trojans, which control generative behaviors with multiple triggers. We refer to the new task as DataStealing and demonstrate that the attacker can achieve the purpose based on our proposed Combinatorial Triggers (ComboTs) in a vanilla FL system. However, advanced distance-based FL defenses are still effective in filtering the malicious update according to the distances between each local update. Hence, we propose an Adaptive Scale Critical Parameters (AdaSCP) attack to circumvent the defenses and seamlessly incorporate malicious updates into the global model. Specifically, AdaSCP evaluates the importance of parameters with the gradients in dominant timesteps of the diffusion model. Subsequently, it adaptively seeks the optimal scale factor and magnifies critical parameter updates before uploading to the server. As a result, the malicious update becomes similar to the benign update, making it difficult for distance-based defenses to identify. Extensive experiments reveal the risk of leaking thousands of images in training diffusion models with FL. Moreover, these experiments demonstrate the effectiveness of AdaSCP in defeating advanced distance-based defenses. We hope this work will attract more attention from the FL community to the critical privacy security issues of Diffusion Models.

Setup & Preparation

Environment Setup

conda create -n datastealing python=3.9
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install absl-py==2.1.0 tensorboardX==2.6.2.2 tqdm==4.66.1 imageio==2.34.0 scipy==1.12.0 
pip install scikit-learn==1.4.1.post1 matplotlib==3.8.0 ray==2.9.1 pytorch-fid==0.3.0

Data Preparation

CIFAR10 Download and Untar it into ./data:

cd ./data
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar -xf cifar-10-python.tar.gz

Model Preparation

Pretrained FL models with 5 clients: CIFAR-10

Download and unzip the folder into ./logs/

Train AdaSCP with FL Defenses

AdaSCP + FedAvg

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique no-defense \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

AdaSCP + Krum

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique krum \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

AdaSCP + Multi-Krum

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique multi-krum \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

AdaSCP + Foolsgold

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique foolsgold \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

AdaSCP + RFA

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique rfa \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

AdaSCP + Multi-metrics

CUDA_VISIBLE_DEVICES=0 python fedavg_ray_actor_bd_noniid/main_fed_uncond_multitarget_defense_single.py \
          --train --flagfile ./config/CIFAR10_uncond.txt \
          --batch_size_attack_per 0.5 \
          --poison_type diff_poison \
          --model_poison_scale_rate 5 \
          --defense_technique multi-metrics \
          --num_targets 1000 \
          --critical_proportion 0.4 \
          --global_pruning \
          --use_adaptive \
          --adaptive_lr 0.2 \
          --data_distribution_seed 42

Test Poisoned Global Model

Poisoned Global Models

Download poisoned models from here. Unzip folders into ./logs/ for evaluation.

FID

CIFAR Example:

python bash_test_fid_multi_defense.py "cuda:0" 'multi-krum_1000_0.5_diffpoi_proportion_0.4_scale_5.0_ema_0.9999_global_adaptive_0.2_single_tabel1'

MSE

CIFAR Example:

python bash_test_diffusion_attack_uncond_multi_mask_seed.py "cuda:0" 1000 'multi-krum_1000_0.5_diffpoi_proportion_0.4_scale_5.0_ema_0.9999_global_adaptive_0.2_single_tabel1' 42

TODO:

  • Environment setup
  • Release the code of DataStealing.
  • Release the metrics calculation of DataStealing.

Citation

If you find this code helpful for your research, please cite:

@inproceedings{gandatastealing,
  title={DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans},
  author={Gan, Yuan and Miao, Jiaxu and Yang, Yi},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024}
}

Acknowledge

We acknowledge these works for their public code: Multi-metrics, BCLayerSub.. Thanks for their awesome works.

About

(Official Imp.) DataStealing: Steal Data from Diffusion Models in Federated Learning with Multiple Trojans [NeurIPS 2024]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published