Skip to content

Geminio is a VLM-powered gradient inversion attack in federated learning (FL). It allows the adversary (the FL server) to describe the data of value and reconstruct the victim client's private data matching the description.

License

Notifications You must be signed in to change notification settings

HKU-TASR/Geminio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning

Abstract: Foundation models that bridge vision and language have made significant progress, inspiring numerous life-enriching applications. However, their potential for misuse to introduce new threats remains largely unexplored. This project reveals that vision-language models (VLMs) can be exploited to overcome longstanding limitations in gradient inversion attacks (GIAs) within federated learning (FL), where an FL server reconstructs private data samples from gradients shared by victim clients. Current GIAs face challenges in reconstructing high-resolution images, especially when the victim has a large local data batch. While focusing reconstruction on valuable samples rather than the entire batch is promising, existing methods lack the flexibility to allow attackers to specify their target data. In this project, we introduce Geminio, the first approach to transform GIAs into semantically meaningful targeted attacks. Geminio enables a brand new privacy attack experience: attackers can describe, in natural language, the types of data they consider valuable, and Geminio will prioritize reconstruction to focus on those high-value samples. This is achieved by leveraging a pretrained VLM to guide the optimization of a malicious global model that, when shared with and optimized by a victim, retains only gradients of samples that match the attacker-specified query. Extensive experiments demonstrate Geminio’s effectiveness in pinpointing and reconstructing targeted samples, with high success rates across complex datasets under FL and large batch sizes and showing resilience against existing defenses.

For more technical details and experimental results, we invite you to check out our paper here:
Junjie Shan, Ziqi Zhao, Jialin Lu, Rui Zhang, Siu Ming Yiu, and Ka-Ho Chow, "Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning," arXiv preprint arXiv:2411.14937, November 2024.

@article{shan2024geminio,
      title={Geminio: Language-Guided Gradient Inversion Attacks in Federated Learning}, 
      author={Junjie Shan and Ziqi Zhao and Jialin Lu and Rui Zhang and Siu Ming Yiu and Ka-Ho Chow},
      year={2024},
      eprint={2411.14937},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2411.14937}, 
}

Step 1: Setup

Python Environment

This repository is implemented with Python 3.9. You can create a virtual environment and install the required libraries with the following command:

conda create --name geminio python=3.9
conda activate geminio
pip install -r requirements.txt

The MPS backend is tested on Apple M1 Max and Apple M2 Max, and the CUDA backend is tested on NVIDIA 5880 GPUs.

Pre-generated Malicious Models

We have generated a number of malicious models with different queries as examples. They are placed in the malicious_models folder under the root directory of this project:

.
├── malicious_models/     <--------------------
│   ├── Any_guns?.pt
│   └── ...
├── assets/
├── ...
└── README.md

To demonstrate Geminio, the pre-generated malicious models cover the following queries:

  • "Any jewelry?"
  • "Any human faces?"
  • "Any males with a beard?"
  • "Any guns?"
  • "Any females riding a horse?"

Step 2a: Gradient Inversion

We selected the following 128 images from ./assets/private_samples as the private samples used in this step:

Original 128 Images

Baseline: We use HFGradInv to reconstruct images from a batch of 128 private samples the victim FL client owns.

python reconstruct.py --baseline

Baseline Reconstruction Result

Below is the reconstruction result using the baseline method:

Baseline Reconstruction

Step 2b: Reconstruct with Geminio

python reconstruct.py --geminio-query="Any weapon?"

Reconstruction Results for Queries

Below are example reconstruction results for each query. These illustrate the reconstructed outputs for the corresponding queries:

  • Query: "Any jewelry?" Reconstruction for Any Jewelry

  • Query: "Any human faces?" Reconstruction for Any Human Faces

  • Query: "Any males with a beard?" Reconstruction for Any Males with a Beard

  • Query: "Any guns?" Reconstruction for Any Guns

  • Query: "Any females riding a horse?" Reconstruction for Any Females Riding a Horse

Acknowledgement

We would like to acknowledge the repositories below.

About

Geminio is a VLM-powered gradient inversion attack in federated learning (FL). It allows the adversary (the FL server) to describe the data of value and reconstruct the victim client's private data matching the description.

Topics

Resources

License

Stars

Watchers

Forks

Languages