The first comprehensive toolkit for reliably evaluating diffusion-based adversarial purification (DBP)
Official PyTorch implementation of our paper: Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification.
Andre Kassis, Urs Hengartner, Yaoliang Yu
Contact: akassis@uwaterloo.ca
DiffBreak provides a reliable toolbox for assessing the robustness of DBP-based defenses against adversarial examples. It offers a modular extension that efficiently back-propagates the exact gradients through any DBP-based defense. All previous attempts to evaluate DBP suffered from implementation issues that led to a false sense of security. Hence, we aim for DiffBreak to become the new standard for such evaluations to ensure the credibility of future findings. DiffBreak also allows users to experiment with a variety of gradient approximation techniques previously explored in the literature that may be suitable for threat models wherein exact gradient calculation is infeasible (e.g., due to time limitations). Furthermore, no existing adversarial robustness libraries offer attacks specifically optimized for performance against this memory and time-exhaustive defense. The implementations of current attacks (e.g., AutoAttack) do not allow for batch evaluations of multiple EOT samples at once, leading to severe performance degradation and significantly limiting the number of feasible EOT iterations. Worse yet, integrating DBP with the classifier and incorporating it into the attack code is not trivial and is naturally error-prone in the lack of a unified framework. Thus, current evaluations have been strictly limited to AutoAttack and PGD. That said, many other adversarial strategies exist, and in our paper, we specifically find that perceptual attacks (e.g., our low-frequency-- LF-- attack) pose far more severe threats to DBP. DiffBreak adapts the implementations of known attacks (see below for a comprehensive list) to DBP and allows users to efficiently evaluate the defense's robustness using increased EOT batch sizes. With DiffBreak, any PyTorch or TF classifier can be protected using existing DBP schemes with any pretrained diffusion model and then evaluated against the attacks we offer. DiffBreak also allows the evaluation of non-defended (i.e., standard) classifiers.
This repo was built on top of DiffPure. The adversarial attacks were adapted from various common libraries we cite below. If you consider our repo helpful, please consider citing it:
@article{kassis2024unlocking, title={Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification}, author={Kassis, Andre and Hengartner, Urs and Yu, Yaoliang}, journal={arXiv preprint arXiv:2411.16598}, year={2024} }
- A high-end NVIDIA GPU with >=32 GB memory.
- CUDA=12 driver must be installed.
- Anaconda must be installed.
conda create -n DiffBreak python=3.10
conda activate DiffBreak
git clone https://github.com/andrekassis/DiffBreak.git
cd DiffBreak
pip install -e .
After executing the above, a new command line tool, diffbreak
, becomes available as well. You may use it to obtain information regarding the available configurations, datasets, and pretrained systems we offer. To get started, just type diffbreak
in your terminal.
Once DiffBreak has been installed, evaluating any classifier requires only a few lines of code. As mentioned above, we offer a variety of common pretrained classifiers (mostly taken from Robustbench and torchvision) for several datasets: ImageNet, CIFAR-10, CelebA-HQ and YouTube-Faces. You do not need to manually download any datasets or pretrained classifiers. Our code will automatically retrieve and cache these resources upon their first use. For datasets, their test subsets are used. To run evaluations with these readily available resources, one should use DiffBreak's "Registry" utility. Alternatively, you may use your own datasets or models (see below). Evaluations require four components that are utilized by DiffBreak's "Runner" engine, which executes the experiments. These components are 1) The dataset objects from which the samples are taken, 2) the classifier to be evaluated, 3) The DBP defense (only required if evaluating a DBP-defended classifier), and 4) the attack under which the system is evaluated.
Below, we demonstrate how each of the four required components can be easily created.
To obtain an object that yields samples from a dataset of your choice offered by our registry, you should run:
from DiffBreak import Registry
kwargs = {} #if using celeba-hq, kwargs = {"attribute": YOUR_TARGET_ATTRIBUTE}
dataset = Registry.dataset(dataset_name, **kwargs)
Here, dataset_name can be any of (case-sensitive):
- cifar10
- imagenet
- youtube
- celeba-hq: For this dataset, you must provide an additional keyword argument attribute as above (Run
diffbreak datasets
in your terminal for details).
You are not limited to the datasets in our registry. Refer to the output of diffbreak custom dataset
.
Skip this step if evaluating a non-defended classifier.
You should select the DBP scheme you intend to use for purification. We offer four known schemes for which DiffBreak provides the exact gradients:
- vpsde: The VP-SDE-Based DBP scheme (i.e., DiffPure).
- vpode: Similar to vpsde but performs purification by solving an ODE in the reverse pass instead of an SDE. This method was also proposed in DiffPure.
- ddpm: The DDPM-based DBP scheme (GDMP).
- ddpm_ode: Similar to vpode but implemented for discrete-time DBP (i.e., DDPM).
To initialize the gradient-enabled DBP defense, you should also specify the desired gradient back-propagation method. The following methods are available:
- full: The full, accurate gradients computed efficiently using our precise module (default).
- full_intermediate: Similar to full but additionally computes the loss function for each and every step of the reverse pass and adds its gradients to the backpropagated total.
- adjoint: The known adjoint method for VP-based BDP (cannot be used with DDPM schemes). DiffBreak fixes the implementation issues of torchsde and provides a far more powerful tool.
- bpda: Backward-pass Differentiable Approximation (BPDA).
- blind: Gradients are obtained by attacking the classifier directly and without involving DBP at all. The defense is only considered when the attack sample is evaluated.
- forward_diff_only: Similar to blind but it instead adds the noise from the forward pass of DBP to the sample and uses this noisy output to invoke the classifier and obtain the gradients.
With these choices, you can now invoke the registry to obtain a dictionary containing all the DBP parameters that DiffBreak expects for later initialization of the defense. To obtain the parameters, we run:
dbp_params = Registry.dbp_params(
dataset_name,
diffusion_type=DBP_SCHEME,
grad_mode=GRAD_MODE,
)
where DBP_SCHEME and GRAD_MODE are as explained above. Note that the returned dictionary contains standard DBP parameters used in the literature. Generally, you should not change them (unless you know what you are doing). Exceptions are:
- dbp_params["diffusion_steps"]: The number of purification steps used in the defense. The returned values correspond to the optimal setups from the state-of-the-art, but you may, of course, change them in your experiments.
- dbp_params["batch_size"]: The number of purified copies generated from the sample (EOT) that will be purified and classified in parallel. Change this based on your dataset dimensions and GPU capabilities if you wish, or keep the default.
- dbp_params["timestep_respacing"]: Change this if you wish to perform DDPM acceleration.
- dbp_params["guidance_*"]: Whether to perform guided purification (i.e., GDMP)-- Available only for DDPM variants. By default, DiffBreak performs guided purification for DDPM as in GDMP. To disable it, set dbp_params["guidance_mode"]=None. You may also change the remaining guidance parameters. See GDMP for details.
Now that you have the parameters, you should provide the score model to be used by DBP for purification. Our registry offers the following pretrained models:
- ScoreSDEModel: The Score SDE model by Song et al.. Default for cifar10 with VP-based schemes.
- HaoDDPM: The model for CIFAR-10 by Ho et al.. Default for cifar10 with DDPM schemes.
- DDPMModel: The CelebA-HQ pretrained model from SDEdit. Default for celeba-hq and youtube.
- GuidedModel: The common guided model for ImageNet by Dhariwal & Nichol. Default for imagenet.
To instantiate one of these readily available score models from the registry, we run the following command:
dm_class = Registry.dm_class(dataset_name, diffusion_type=DBP_SCHEME)
with dataset_name and DBP_SCHEME as explained above. Note that you may use a different dataset_name or DBP_SCHEME here to obtain a different score model to use with your actual dataset and chosen scheme, provided that the score model operates on images of the same dimensions. That is, ScoreSDEModel and HaoDDPM may be used interchangeably for cifar10, while DDPMModel and GuidedModel can be switched for all remaining datasets.
Importantly, you may also provide any external pretrained score model instead (run diffbreak custom dm_class
for details).
Our registry offers a variety of pretrained classifiers, which you can browse by executing diffbreak classifiers
in your terminal. You can obtain the chosen classifier via:
kwargs = {} #if using celeba-hq, kwargs = {"attribute": YOUR_TARGET_ATTRIBUTE}
classifier = Registry.classifier(
dataset_name, classifier_name=CLASSIFIER_NAME, **kwargs
)
CLASSIFIER_NAME is the chosen architecture you wish to use. You can also omit this argument to use the default classifier for your corresponding dataset: The ViT (DeiT-S) classifier by Facebook Research for imagenet, the WIDERESNET_70_16 classifier by DiffPure for cifar10, the attribute classifiers by gan-ensembling for celeba-hq, and a ResNet50 model we trained with TensorFlow for youtube.
Using pretrained classifiers that are not available in the registry is also possible. Instead of running the above code, wrap your own PyTorch classifier inside a DiffBreak classifier object:
from DiffBreak import PyTorchClassifier
classifier = PyTorchClassifier(my_torch_classifier, softamaxed)
where my_torch_classifier is any such pretrained PyTorch classifier. It is also possible to use a TensorFlow (TF) classifier by executing:
from DiffBreak import TFClassifier
classifier = TFClassifier(my_tf_classifier, softamaxed)
Here, softmaxed is a boolean indicating whether the last layer of the provided classifier applies a softmax activation or directly outputs the logits.
DiffBreak offers a variety of attacks optimized for performance with DBP:
- id: No attack. Use this to evaluate clean accuracy.
- apgd: AutoAttack (Linf). -- Adapted from auto-attack.
- pgd: The PGD attack. -- Adapted from cleverhans.
- diffattack_apgd: DiffAttack. -- Adapted from DiffAttack.
- LF: Our Low-Frequency attack.
- diffattack_LF: Our LF attack augmented with the per-step losses used by DiffAttack.
- ppgd: PerceptualPGDAttack. -- Adapted from perceptual-advex.
- lagrange: LagrangePerceptualAttack. -- Adapted from perceptual-advex.
- stadv: The StAdv attack. -- Adapted from perceptual-advex.
For each attack, the registry returns the default parameters for the corresponding dataset. We recommend LF and ppgd against DBP-defended classifiers as we found them far more effective than the commonly-used norm-based methods (e.g., pgd and apgd). Obtaining the attack parameters from the registry is done as follows:
attack_params = Registry.attack_params(dataset_name, attack_name)
where attack_name is one of the above options. For imagenet, cifar10, and celeba-hq, these are the most commonly used parameters from the literature, and you do not need to change them unless you explicitly intend to do so.
The notable exception is attack_params["eot_iters"], which is universally present in the parameters for all attacks. Changing this number will alter the effective number of EOT samples in your attack. Specifically, the total number of EOT samples will be attack_params["eot_iters"] * dbp_params["batch_size"]. That is, dbp_params["batch_size"] samples are propagated at each one of the attack_params["eot_iters"] and their gradients are obtained. These are then added to the collective sum from all samples across allattack_params["eot_iters"], which is finally divided by the total number of samples above to get the averaged gradient. You may change this number based on your GPU capabilities or stick with the default.
The remaining component for initializing the attack is the choice of the loss function to optimize. Most attacks use similar loss functions, and we provide these common choices in our registry. Specifically, the available losses are:
- CE: The cross-entropy loss. Default for pgd.
- MarginLoss: The Max-Margin loss. Default for all remaining attacks.
- DLR: The Difference of Logits Ratio loss. To obtain the default loss function for your chosen attack from the registry, run:
loss = Registry.default_loss(attack_name)
where attack_name is the attack you intend to use. Note that AutoAttack (i.e., apgd) originally uses the DLR loss function. We instead use the MarginLoss as the default for this attack as we empirically found it to yield better results.
You are not restricted to the default loss functions and may instantiate any of the available attacks with any of our provided losses. To directly obtain the loss function of your choice from DiffBreak instead, replace the above code, importing and instantiating the loss explicitly. For instance, to use DLR run:
from DiffBreak import DLR
loss = DLR()
With the necessary objects now available, one can instantiate a "Runner" object and execute experiments. This is done over the two steps described below.
Each runner requires a configuration dictionary containing the attack and DBP parameters obtained above in a specific format, in addition to several other arguments specific to the experiment itself. To construct this dictionary, the "Runner" class exposes a setup method with the below signature:
setup(
out_dir, attack_params, dbp_params=None, targeted=False,
eval_mode="batch", total_samples=256,
balanced_splits=False, verbose=2, seed=1234,
save_image_mode="originally_failed", overwrite=False
) -> dict
The parameters for this function are as follows:
- out_dir: str. The path to the output directory where the experiment results will be saved.
- attack_params: dict. The attack_params dictionary from A4.
- dbp_params: dict or None. If you wish to evaluate a DBP-defended classifier, this should be the dbp_params dictionary from A2. Otherwise, it should be None. Default: None.
- targeted: bool. Whether you wish to perform a targeted attack. If True, a target label is drawn at random for each sample, and the attack optimizes the input so that it is misclassified as belonging to this randomly chosen label. Otherwise, the standard non-targeted attack is performed with the objective of having the sample misclassified arbitrarily. Default: False.
- eval_mode: str - one of batch or single. If single, the attack is considered successful for each sample if any purified copy is misclassified as desired. Otherwise, the attack is only considered successful if the majority of samples in the batch (depending on dbp_params["batch_size"]) meet this condition. This corresponds to the more robust "majority-vote" setup we study in our paper, while single represents the standard setup. For non-defended classifiers (i.e., dbp_params=None), this argument is ignored. Default: batch.
- total_samples: int. The total number of samples in the experiment. Default: 256.
- balanced_splits: bool. Whether to include an equal number of samples for all classes. Default: False.
- verbose: int - one of 0, 1 or 2. Verbosity level for logging. Default: 2.
- seed: int. Random seed selected for reproducibility. Default: 1234.
- save_image_mode: str - one of none, successfull or originally_falied. Whether to save attack samples: none - no images will be saved. successfull: Only successful attack samples will be saved. originally_falied: Attack samples that are originally correctly classified but then misclassified with the attack will be saved, while samples that are initially misclassified (i.e., successful attacks without adversarial modifications) will be skipped. Default: originally_falied.
- overwrite: bool. Whether to overwrite existing output directories. Default: False.
Constructing the configuration dictionary for your experiment is done by running:
from DiffBreak import Runner
exp_conf = Runner.setup(
out_dir,
attack_params=attack_params,
dbp_params=dbp_params,
targeted=targeted,
eval_mode=eval_mode,
total_samples=total_samples,
balanced_splits=balanced_splits,
verbose=verbose,
seed=seed,
save_image_mode=save_image_mode,
overwrite=overwrite,
)
with all parameters matching those described above.
It is also possible to restore the experiment configuration from a previously started evaluation to resume it. This can be done with the resume method of the "Runner" class:
exp_conf = Runner.resume(out_dir)
where out_dir is the output directory of the previously started experiment.
At this stage, we have all the required components to run an experiment. First, we create a "Runner" instance as follows:
device="cuda"
runner = Runner(exp_conf, dataset, classifier, loss, dm_class).to(device).eval()
where exp_conf is the configuration dictionary for the experiment obtained in the previous step and dataset, classifier, loss and dm_class are the objects created in A1-A4. If you are evaluating a non-defended classifier, the argument dm_class can be omitted or alternatively must be set to dm_class=None. Importantly, the runner must be moved to the chosen cuda device before it is used, as shown above, and the eval() method must be invoked.
Finally, run the experiment as:
runner.execute()
The attack will be evaluated and the results will be saved to out_dir/results.txt. The output images will be saved to out_dir/images. During the evaluation, the progress bar constantly displays the portion of successful attack samples.
The runner constantly logs the "total success rate" to the screen which corresponds to the portion of samples that are misclassified in the desired manner (depending on whether the attack is targeted). For the eval_mode="batch", "single success rate" is also printed-- this represents the portion of samples for which at least a single purified copy is misclassified (see B1). That is, eval_mode="batch" effectively evaluates both setups (however, it is far more costly than running with eval_mode="single").
The results.txt file is updated after the attack terminates for each sample, adding a row with the statistics corresponding to this input. Specifically, the nth row in this file contains the stats for the nth evaluated sample. These records are of the following format:
original label: ORIG_LABEL, [target label: TARGET], originally robust: ORIG_ROBUST, result: SUCCESS, [result-single: SUCCESS_SINGLE].
The different entries in this record can be interpreted as follows:
- ORIG_LABEL: The sample's original label.
- [TARGET]: This entry only appears for targeted attacks. It represents the randomly chosen target label for the sample.
- ORIG_ROBUST: Assigned 0 or 1 depending on whether the classifier initially correctly classifies the sample (1) or not (0).
- SUCCESS: Assigned 0 or 1, indicating whether the attack was successful (1) or not (0).
- [SUCCESS_SINGLE]: This entry is only present for eval_mode="batch". It indicates whether the attack is successful for this sample under the single evaluation mode as well.
Below, we combine all the above steps to demonstrate how easily an evaluation can be created and executed. For this purpose, we show how to perform an experiment with the default classifier for cifar10 (setting classifier_name=None) using our LF attack and the vpsde DBP scheme. All parameters are left identical to the defaults obtained from the registry. Running the experiment amounts to executing the short script below (other default arguments were excluded):
from DiffBreak import Registry, Runner
device = "cuda"
out_dir = "test"
dataset_name = "cifar10"
attack_name = "LF"
dbp_scheme="vpsde"
dataset = Registry.dataset(dataset_name)
dm_class = Registry.dm_class(dataset_name, diffusion_type=dbp_scheme)
classifier = Registry.classifier(
dataset_name, classifier_name=None,
)
loss = Registry.default_loss(attack_name)
attack_params = Registry.attack_params(dataset_name, attack_name)
dbp_params = Registry.dbp_params(
dataset_name,
diffusion_type=dbp_scheme,
grad_mode="full",
)
exp_conf = Runner.setup(
out_dir,
attack_params=attack_params,
dbp_params=dbp_params,
targeted=False,
)
runner = (
Runner(
exp_conf,
dataset,
classifier,
loss,
dm_class,
)
.to(device)
.eval()
)
runner.execute()