Abstract: Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (e.g., gender, ethnicity). In this paper, we propose a general framework to identify, quantify, and explain biases in an open set setting, i.e. without requiring a predefined set. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. Next, these captions are used by the target generative model for generating a set of images. Finally, Vision Question Answering (VQA) is leveraged for bias evaluation. We show two variations of this framework: OpenBias and GradBias. OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases. OpenBias effectively detects both well-known and novel biases related to people, objects, and animals and highly aligns with existing closed-set bias detection methods and human judgment. GradBias shows that neutral words can significantly influence biases and it outperforms several baselines, including state-of-the-art foundation models.
We recomand to use a virtual environment to install the required environment.
# Create a virtual environment, activate it and upgrade pip
python -m venv gradbias
source gradbias/bin/activate
pip install --upgrade pip
Before installing the required packages, please install PyTorch separately according to your system and CUDA version.
After installing PyTorch, you may install the required packages with the following commands:
# Install requirements
pip install -r requirements.txt
Please, install a spaCy trained pipeline. You may use the following command:
python -m spacy download en_core_web_sm
This code has been tested with PyTorch 2.2.1
, CUDA 11.8
, CUDA 12.1
and python 3.10.9
.
We provide code to:
- Run the introduced baselines on OpenBias extracted biases.
- Run GradBias on the same dataset (
gradbias.py
). - Run GradBias indipendently with custom prompts and biases (
prompt_gradbias.py
).
We make available:
- The dataset used in our experiments: Dataset. This file should be put under
proposed_biases/coco/3
. - The synonym file: Synonyms. This file can be downloaded and put under
data/
, otherwise it will be automatically generated by the code (it may take a while).
The results of the experiments will be saved under the methods
folder.
You may use the following command to generate images using the prompts of the above downloaded dataset:
CUDA_VISIBLE_DEVICES=0 python generate_images_gt.py --seeds 0 1 2 3 4 5 6 7 8 9 --generator <generator>
You may use multiple GPUs to speed up the generation process. The generated images will be saved in the generated_images
folder. This step is required for the methods using images (e.g., GT computation, VQA baseline and GradBias).
You may download the already computed ground truth for all three generative models at this link. Please, put this folder under methods/VQA_gt/coco/
. Otherwise, you may compute the ground truth using the following command:
CUDA_VISIBLE_DEVICES=0 python VQA_gt.py --generator <generator> --dataset coco --vqa_model blip2-flant5xxl
This script supports multi GPU.
Before running the baselines, make sure to download the models weights (e.g., Llama2-7B, Llama2-13B, etc.) using the official code and update the corresponding paths in utils/config.py
file.
To run the baselines, you can use the following commands:
# Run syntax tree baseline
CUDA_VISIBLE_DEVICES=0 python syntax_tree_baseline.py
# Run LLM baseline
CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node 1 answer_ranking_LLM.py --LLM <LLM_name> --seed 0 --dataset coco
# Run VQA baseline
CUDA_VISIBLE_DEVICES=0 python answer_ranking_VQA.py --generator <generator> --vqa_model <VQA>
For the LLM baseline, please make sure to use the appropriate CUDA_VISIBLE_DEVICES
and --nproc_per_node
arguments:
- llama2-13b: 2 GPUs
- llama3-8b: 1 GPU
- llama3-70b: 8 GPUs
Llava1.5-13B VQA answer ranking script requires 2 GPUs with enough memory. The results will be saved in the
methods
folder.
GradBias can be run on the given dataset or indipendently with custom prompts and biases.
- To run GradBias on the given dataset, you may use the following command:
CUDA_VISIBLE_DEVICES=0 python gradbias.py --generator <generator> --vqa_model <VQA> --dataset coco --loss_interval 1
- To run GradBias indipendently:
CUDA_VISIBLE_DEVICES=0 python prompt_gradbias.py --vqa_model <VQA> --generator <generator>
Please, modify the script according to the desired prompts and biases. The results will be displayed in the terminal.
The minimum number of GPUs required to run GradBias depends on the combination of the generator and VQA model used:
- SD-1.5 and CLIP: 1 GPU;
- SD-1.5 and Llava1.5-13B: 2 GPUs;
- SD-2 and CLIP: 1 GPU;
- SD-2 and Llava1.5-13B: 3 GPUs;
- SD-XL and CLIP: 2 GPUs;
- SD-XL and Llava1.5-13B: 4 GPUs; The code is optimized to run cluster of GPUs multple of the minum number of GPUs indicated above. We run this code on NVIDIA A6000 and V100 GPUs.
Please modify the compute_accuracy.py
to compute the accuracies of the specific methods you want to test. Tables will be saved under the tables
folder.
Please cite our work if you find it useful:
@misc{dincà2024gradbiasunveilingwordinfluence,
title={GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models},
author={Moreno D'Incà and Elia Peruzzo and Massimiliano Mancini and Xingqian Xu and Humphrey Shi and Nicu Sebe},
year={2024},
eprint={2408.16700},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.16700},
}