This is the codebase for the paper "Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks" by Qiyang Li, Saminul Haque, Cem Anil, James Lucas, Roger Grosse, Jörn-Henrik Jacobsen to appear in NeurIPS 2019.
conda create -n test python=3.6
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
pip install -r requirements.txt
python -m lconvnet.run --cfg [dir to cfg.yaml] [--resume] [--test]
Put --resume
when you want to resume from the best checkpoint (and continue training if the training was interrupted)
Put --test
when you want to generate the results from the checkpoint
For example, the following command launches an adversarial robustness experiment with BCOP on CIFAR10 and large architecture:
python -m lconvnet.run --cfg runs/adv-rob-cifar10/large/BCOP-Bjorck/multi-trial-A/cfg.yaml
After the training is finished, the following command will load the best checkpoint from the experiment and generate the provable/empirical robustness results in the experiment folder:
python -m lconvnet.run --cfg runs/adv-rob-cifar10/large/BCOP-Bjorck/multi-trial-A/cfg.yaml --resume --test
To evaluate the empirical robustness of baseline models that we are comparing with, the resume flag should be ommitted as the weights are implicitly baked in. For example, the following command will generate the results for KW-Large model on CIFAR10:
python -m lconvnet.run --cfg runs/baselines/kw-cifar10/large/cfg.yaml --test
For L2Nonexpansive model, the pre-trained model needs to be downloaded from the authors' official website. This could be done by the following commands:
cd checkpoints/qian_models
chmod +x download.sh
./download.sh
Upon the completion of the downloads, the downaloded models could be used for evaluation. For example,
python -m lconvnet.run --cfg runs/baselines/qian-cifar10/model-3/cfg.yaml --test
To launch a Wasserstein distance estimation experiment with BCOP, MaxMin activation function with a learning rate of 0.0001:
python -m lconvnet.run --cfg runs/wde_stl/lr-0.0001/conv/BCOP/maxmin/multi-trial-A/cfg.yaml
To launch the same experiment on the cifar10 dataset:
python -m lconvnet.run --cfg runs/wde_cifar10/lr-0.0001/conv/BCOP/maxmin/multi-trial-A/cfg.yaml
The following command will report the lower-bound estimate of the model loaded from the checkpoint:
python -m lconvnet.run --cfg runs/wde_stl/lr-0.0001/conv/BCOP/maxmin/multi-trial-A/cfg.yaml --resume --test
We also provide a convenient script to generate all the tables in the paper (with additional dependency on pylatex
). After all the experiments have been completed, the following commands will generate a table.tex
file under runs/
(the consolidate command makes an copy of the results in each experiment folder and saves it as a unified file name: results.yaml
and the export command walk through each individual experiment folders under runs
to grab results from these files):
python -m lconvnet.consolidate --dir runs
python -m lconvnet.export --dir runs
All the model weights used in reporting can be downloaded from here (Google Drive) or here (Dropbox). It contains the runs
folder with all the model weights and evaluation results placed under the appropriate experiment folders.
./generate_experiments.sh [dir to where template.yaml is located]
To launch a batch of experiments on slurm:
sbatch [dir to where template.yaml is located]/batch_run.sh
To launch a batch of evaluation using the existing checkpoints:
sbatch [dir to where template.yaml is located]/batch_run_resume_test.sh
The repo is structured as follows
lconvnet
├── tasks
│ ├── adversarial
│ | └── attackers.py "PGD, FGSM, Pointwise, Boundary attack"
│ | └── eval_robustness.py "Robust accuracy upperbound by running the attacks
| | Robust accuracy lowerbound by certifying the Lipschitz network"
│ ├── gan "Training the GAN for Wasserstein distance estimation experiments"
│ ├── wde "GAN sampler for Wasserstein distance estimation"
│ └── common.py "Training step for different tasks (similar to train_step in pytorch lightning)"
├── layers
│ ├── bcop.py "BCOP convolution"
│ ├── rko.py "RKO convolution"
│ ├── svcm.py "SVCM convolution"
│ ├── ossn.py "OSSN convolution"
│ ├── rkl2ne.py "RK-L2NE convolution"
│ └── ...py "Other GNP components"
├── external "Baselines"
│ ├── kw.py
│ └── qian.py
├── experiment "The main training loop and experiment management"
├── networks.py "FC/Small/Large/DCGANDiscriminator"
├── run.py "Entry point"
├ ...
...
- lr: learning rate
- small/large: the small/large network that is used by https://arxiv.org/abs/1805.12514
- fc: fully connected network
- conv: convolutional neural network
x
-layer: neural network withx
hidden layers. Usually only used in describing fc network
If you use this work, please cite with
@article{li2019lconvnet,
title={Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks},
author={Li, Qiyang and Haque, Saminul and Anil, Cem and Lucas, James and Grosse, Roger and Jacobsen, J{\"o}rn-Henrik},
journal={Conference on Neural Information Processing Systems},
year={2019}
}