This repository contains the source code for the paper Nearly Tight Black-Box Auditing of Differentially Private Machine Learning by M.S.M.S. Annamalai, E. De Cristofaro, to appear at NeurIPS 2024.
Dependencies are managed by conda/mamba
.
- Required dependencies can be installed using the command
conda env create -f env.yml
and then runconda activate bb_audit_dpsgd
. - The pre-training algorithm to craft worst-case initial model parameters are given in
craft_inital_params.ipynb
, but we also provide the pre-trained worst-case inital parameters we use under thepretrained_models/
folder.
Here, we provide the splits we use for the MNIST and CIFAR-10 dataset. We also provide the last layer activations extracted from running CIFAR-10 inference on the WRN-28-10 model pre-trained on the ImageNet-32 dataset (see paper and model).
- In order to de-compress the datasets, use command
cat data_compressed/* | tar -xvz
.
To audit a Logistic Regression model trained on the MNIST dataset with
(More command line options can be found inside the audit_model.py
file)
$ python3 audit_model.py --data_name mnist --model_name lr --epsilon 10.0
We provide the exact scripts we use to run experiments under the scripts/
folder, which should have more options that you can play around with.
Results can be plotted using plot_results.ipynb
notebook.
- We only consider full batch gradient descent, so
$B = |D|$ always. - For DP-SGD, we sum the gradients instead of averaging them as the size of the dataset can leak information in add/remove DP (see issue). Therefore, learning rates are expressed in a non-standard way (as
$\frac{\eta}{B}$ instead of just$\eta$ ) here. Specifically, when training a model on half of the MNIST dataset,$\eta = 4$ corresponds to a learning rate of$\frac{\eta}{B} = \frac{4}{30,000} = 1.33 \times 10^{-4}$ (seescripts/model_init.sh
), which stays the same regardless of whether the dataset is$D$ or$D^-$ .