Skip to content

woodyx218/opacus_global_clipping

Repository files navigation

What is this?

This project contains scripts to reproduce my paper On the Convergence and Calibration of Deep Learning with Differential Privacy by Zhiqi Bu, Hua Wang, Zongyu Dai and Qi Long. We only add one line of code into the Pytorch Opacus library v0.15.0.

The Problem of Interest

Deep learning models are vulnerable to privacy attacks and raise severe privacy concerns. To protect the privacy, Abadi et. al. applied deep learning with differential privacy (DP) and trained DP neural networks. Notably, if you train a neural network with SGD, you get regular non-DP network; if you train with differentially private SGD (DP-SGD), you get DP network.

Any regular optimizers (SGD, HeavyBall, Adam, etc.) can be turned into DP optimizers, with per-sample clipping and noise addition, via the Gaussian Mechanism. However, the convergence of DP optimizers is usually much slower in terms of iterations and results in low accuracy (e.g. in recent Google paper, state-of-the-art CIFAR10 accuracy without pretraining is 66% when privacy risk $\epsilon=8$). Additionally, DP networks are much less calibrated and trustworthy.

We give the first general convergence analysis on the training dynamics of DP optimizers in deep learning, taking a close look at neural tangent kernel (NTK) matrix H(t).

Opacus

We show that existing per-sample clipping, with small clipping norm, breaks the positive semi-definiteness of NTK and leads to undesirable convergence behavior. We thus propose to use larger clipping norm to preserve the positive semi-definiteness and significantly improve the convergence as well as the calibration. This is based on the insight:

$$\text{clipping/normalization} \Longleftrightarrow R/|\frac{\partial \ell_i}{\partial w}|\overset{\text{small} R}{\longleftarrow}C_i=\min(1,R/|\frac{\partial \ell_i}{\partial w}|)\overset{\text{large} R}{\longrightarrow}C_i=1\Longleftrightarrow\text{no clipping}.$$

For experiments, CIFAR10 (image classification) is trained on Vision Transformer (86 million parameters):

Opacus

Opacus

The SNLI (text classification) is trained on BERT (108 million parameters) in Opacus BERT tutorial.

Opacus

Opacus

New clipping function

Besides recommending larger clipping norm for the existing per-sample clipping, we propose a new clipping function -- the global per-sample clipping, $C_{global,i}=\mathbb{I}(|g^{(i)}|\leq R)$, i.e. only assigning 0 or 1 as the clipping factors to each per-sample gradient. This may be beneficial to the optimization, since large per-sample gradients often correspond to samples that are hard-to-learn, noisy or adversarial. However, using a large clipping norm makes the global clipping similar to the existing clipping, basically not clipping most of the per-sample gradients.

Codes

To apply a large clipping norm, one can use Opacus library, by specifying max_grad_norm in the opacus.PrivacyEngine. To use the global clipping function, we add

clip_factor=(clip_factor>=1)

between line 178 and line 179 in (https://github.com/pytorch/opacus/blob/v0.15.0/opacus/per_sample_gradient_clip.py), to implement our global per-sample clipping.

Alternatively, one can directly use this repository, which imports the config library and introduces one new variable: config.clipping_fn={'local','global'} to indicate whether to use global clipping. Note that setting config.clipping_fn='local' (by default) is exactly using the original Opacus with existing clipping.

To be specific, the only difference between this repo and Opacus is in the per_sample_gradient_clip.py.

Installation

git clone https://github.com/woodyx218/opacus_global_clipping.git
cd opacus_global_clipping
pip install -e .

When using the code, the user still refer to the Opacus, e.g.

import opacus

Citation

@article{bu2021convergence,
  title={On the Convergence and Calibration of Deep Learning with Differential Privacy},
  author={Bu, Zhiqi and Wang, Hua and Long, Qi},
  journal={arXiv preprint arXiv:2106.07830},
  year={2021}
}

Introducing Opacus

The below contents are forked from Opacus github. We do not claim ownership of the codes in this open-sourced repository and we sincerely thank the Opacus community for maintaining this amazing library.

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the client, has little impact on training performance and allows the client to online track the privacy budget expended at any given moment.

Target audience

This code release is aimed at two target audiences:

  1. ML practitioners will find this to be a gentle introduction to training a model with differential privacy as it requires minimal code changes.
  2. Differential Privacy scientists will find this easy to experiment and tinker with, allowing them to focus on what matters.

Getting started

To train your model with differential privacy, all you need to do is to declare the screening threshold Z, whether to use global clpping, a PrivacyEngine and attach it to your optimizer before running, eg:

model = Net()
optimizer = SGD(model.parameters(), lr=0.05)

config.G=True # using global clipping; reduces to original Opacus if False
config.Z=100
privacy_engine = PrivacyEngine(
    model,
    sample_rate=0.01,
    alphas=[10, 100],
    noise_multiplier=1.3,
    max_grad_norm=1.0,
)
privacy_engine.attach(optimizer)
# Now it's business as usual

The MNIST example shows an end-to-end run using opacus. The examples folder contains more such examples.