GitHub - virajprabhu/LANCE: LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

Stress-testing Visual Models by Generating Language-guided Counterfactual Images (NeurIPS 2023)

Website | News | Walkthrough | Contributing | Paper

LANCE lets you stress-test your trained visual model before deployment by generating realistic and challenging test examples that may not be adequately captured by an IID test set. LANCE is useful to surface model bias that can inform downstream mitigation solutions. To do so, in addition to the generation pipeline, LANCE includes an analysis toolkit that surfaces class-level trends and model vulnerabilities. Further, LANCE is designed with with extensibility in mind, and can be easily extended to stress-test against additional variations not included in the original release.

⚡️ Quickstart

Installation and setup:

# Clone repo and submodules
git clone --recurse-submodules https://github.com/virajprabhu/lance

# Pip install
pip3 install -e .

And you're set!

Usage:

Running LANCE to stress test a trained model interactively

Running LANCE is as simple as:

import lance

# Define test dataloader, model, and output directory
# dataloader = <..>
# out_dir = <..>
# model = <..>

# Generate counterfactuals
lance.generate(dataloader, out_dir, {})

# Evaluate generated counterfactuals against model
df = lance.inspect(model, out_dir, model.class_to_idx)

# Discover systematic model sensitivity and plot
df_cluster = lance.cluster_edits(df)
plot_sensitivity(df_cluster, <model_name>, <cls_name>, x="Edit Type", y="Sensitivity", sort_by=[])

And that's it! See this notebook for a detailed walkthrough for a real example.

Running LANCE at scale on a dataset

To run LANCE on one or more GPUs, we use the accelerate library. Just run:

accelerate launch --num_processes <num_gpus> main.py --dset_name <dset_name> --img_dir <img_dir>

Where <img_dir> points to a ImageFolder style directory. Note that LANCE is designed to edit images that are 512x512 in resolution and will resize them accordingly.

To reproduce results on HardImageNet, run:

accelerate launch --num_processes <num_gpus> main.py --dset_name HardImageNet \
                                                     --img_dir <imagenet_dir> \
                                                     --load_captions --load_caption_edits

✏️ Under the Hood

Given a trained model and test set, LANCE generates a textual description (from a captioning model) and perturbed caption (using a large language model or LLM), which is fed alongwith the original image to a text-to-image denoising diffusion probabilistic model (DDPM) to perform counterfactual editing. The process is repeated for multiple perturbations to generate a challenging test set. Finally, we ascertain model sensitivity to different factors of variation by reporting the change in its predictive confidence over the corresponding counterfactual test set. For more details, check out our paper: https://huggingface.co/papers/2305.19164

✉️ About

LANCE is being developed by graduate students in the Hoffman Lab at Georgia Tech.

If you would like to use LANCE or have questions, please reach out to virajp [at] gatech [dot] edu.

If you use LANCE, please consider citing our paper:

@inproceedings{prabhu2023lance,
      title={LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images}, 
      author={Viraj Prabhu and Sriram Yenamandra and Prithvijit Chattopadhyay and Judy Hoffman},
      booktitle={Neural Information Processing Systems (NeurIPS)}
      year={2023}
}

✉️ Next steps

We hope to keep LANCE up to date with the latest generative models, as well as support a wide range of analysis. Below is a tentative list of features/optimizations we hope to add (note: contributions are welcome!)

Generation

LLAMA-2 support
StableDiffusion-XL support – [ ] CycleDiffusion editing
Support localized editing via masking

Analysis

Custom stress-testing against user-defined intervention
Object detection analysis

✉️ Acknowledgements

LANCE is built on top of several excellent research codebases, including Prompt-to-prompt, LLAMA, LiT-LLama and BLIP-2, and additionally borrows a few techniques from Instruct-Pix2Pix. This repo also borrows from meerkat and huggingface-transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
checkpoints @ d04bc78		checkpoints @ d04bc78
data		data
datasets		datasets
lance		lance
lit-llama @ 03f5d5e		lit-llama @ 03f5d5e
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py
outputs		outputs
requirements.txt		requirements.txt
setup.py		setup.py
walkthrough.ipynb		walkthrough.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stress-testing Visual Models by Generating Language-guided Counterfactual Images (NeurIPS 2023)

⚡️ Quickstart

Usage:

Running LANCE to stress test a trained model interactively

Running LANCE at scale on a dataset

✏️ Under the Hood

✉️ About

✉️ Next steps

✉️ Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

virajprabhu/LANCE

Folders and files

Latest commit

History

Repository files navigation

Stress-testing Visual Models by Generating Language-guided Counterfactual Images (NeurIPS 2023)

⚡️ Quickstart

Usage:

Running LANCE to stress test a trained model interactively

Running LANCE at scale on a dataset

✏️ Under the Hood

✉️ About

✉️ Next steps

✉️ Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages