LANCE lets you stress-test your trained visual model before deployment by generating realistic and challenging test examples that may not be adequately captured by an IID test set. LANCE is useful to surface model bias that can inform downstream mitigation solutions. To do so, in addition to the generation pipeline, LANCE includes an analysis toolkit that surfaces class-level trends and model vulnerabilities. Further, LANCE is designed with with extensibility in mind, and can be easily extended to stress-test against additional variations not included in the original release.
Installation and setup:
# Clone repo and submodules
git clone --recurse-submodules https://github.com/virajprabhu/lance
# Pip install
pip3 install -e .
And you're set!
Running LANCE is as simple as:
import lance
# Define test dataloader, model, and output directory
# dataloader = <..>
# out_dir = <..>
# model = <..>
# Generate counterfactuals
lance.generate(dataloader, out_dir, {})
# Evaluate generated counterfactuals against model
df = lance.inspect(model, out_dir, model.class_to_idx)
# Discover systematic model sensitivity and plot
df_cluster = lance.cluster_edits(df)
plot_sensitivity(df_cluster, <model_name>, <cls_name>, x="Edit Type", y="Sensitivity", sort_by=[])
And that's it! See this notebook for a detailed walkthrough for a real example.
To run LANCE on one or more GPUs, we use the accelerate library. Just run:
accelerate launch --num_processes <num_gpus> main.py --dset_name <dset_name> --img_dir <img_dir>
Where <img_dir>
points to a ImageFolder style directory. Note that LANCE is designed to edit images that are 512x512 in resolution and will resize them accordingly.
To reproduce results on HardImageNet, run:
accelerate launch --num_processes <num_gpus> main.py --dset_name HardImageNet \
--img_dir <imagenet_dir> \
--load_captions --load_caption_edits
Given a trained model and test set, LANCE generates a textual description (from a captioning model) and perturbed caption (using a large language model or LLM), which is fed alongwith the original image to a text-to-image denoising diffusion probabilistic model (DDPM) to perform counterfactual editing. The process is repeated for multiple perturbations to generate a challenging test set. Finally, we ascertain model sensitivity to different factors of variation by reporting the change in its predictive confidence over the corresponding counterfactual test set. For more details, check out our paper: https://huggingface.co/papers/2305.19164
LANCE is being developed by graduate students in the Hoffman Lab at Georgia Tech.
If you would like to use LANCE or have questions, please reach out to virajp [at] gatech [dot] edu
.
If you use LANCE, please consider citing our paper:
@inproceedings{prabhu2023lance,
title={LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images},
author={Viraj Prabhu and Sriram Yenamandra and Prithvijit Chattopadhyay and Judy Hoffman},
booktitle={Neural Information Processing Systems (NeurIPS)}
year={2023}
}
We hope to keep LANCE up to date with the latest generative models, as well as support a wide range of analysis. Below is a tentative list of features/optimizations we hope to add (note: contributions are welcome!)
Generation
- LLAMA-2 support
- StableDiffusion-XL support – [ ] CycleDiffusion editing
- Support localized editing via masking
Analysis
- Custom stress-testing against user-defined intervention
- Object detection analysis
LANCE is built on top of several excellent research codebases, including Prompt-to-prompt, LLAMA, LiT-LLama and BLIP-2, and additionally borrows a few techniques from Instruct-Pix2Pix. This repo also borrows from meerkat and huggingface-transformers.