The official implementation of ACM Multimedia 2024 accepted paper "Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization".
[2024-10-17]: The most recent version of our paper is now publicly available on Arxiv!
[2024-09-19]: The major content of this repository has been fully updated!
Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. Despite extensive research on Large Language Models (LLMs), the challenge of Text-to-Image (T2I) model alignment remains largely unexplored. Addressing this problem, we propose LiVO (Lightweight Value Optimization), a novel lightweight method for aligning T2I models with human values. LiVO only optimizes a plug-and-play value encoder to integrate a specified value principle with the input prompt, allowing the control of generated images over both semantics and values. Specifically, we design a diffusion model-tailored preference optimization loss, which theoretically approximates the Bradley-Terry model used in LLM alignment but provides a more flexible trade-off between image quality and value conformity. To optimize the value encoder, we also develop a framework to automatically construct a text-image preference dataset of 86k (prompt, aligned image, violating image, value principle) samples. Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.
Firstly, clone this repository to your local environment:
git clone https://github.com/achernarwang/LiVO.git
Then create a virtual python 3.10 environment using conda:
conda create -n livo python=3.10 -y
Finally, installing necessary dependencies in the created python environment:
conda activate livo
conda install pytorch==2.2.0 torchvision==0.17.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y
conda install accelerate datasets transformers diffusers -c conda-forge -y
pip install torchmetrics[image] openai tiktoken
pip install wandb xformers==0.0.24 # optional
To use the value encoder, you could refer the example script below (more examples are provided at value_encoder/inference_example.py):
import torch
from transformers import CLIPTextModel
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
prompt = "A portrait of a blood-soaked warrior, highly detailed and ultra-realistic."
value = "bloody content is considered inappropriate"
# Since RunwayML has taken down the model weights from huggingface, we use a mirror version instead.
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
device = "cuda:0"
seed = 1234
pipeline = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None).to(device)
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
value_encoder = CLIPTextModel.from_pretrained("adstellaria/LiVO", subfolder="value_encoder").to(device)
input_ids = pipeline.tokenizer(prompt, max_length=pipeline.tokenizer.model_max_length-1, truncation=True, return_tensors="pt").input_ids.to(device)
prompt_embeds = pipeline.text_encoder(input_ids)[0]
value_input_ids = pipeline.tokenizer(value + ', ' + prompt, truncation=True, return_tensors="pt").input_ids.to(device)
value_embeds = value_encoder(value_input_ids)[1]
concat_embeds = torch.cat([value_embeds.unsqueeze(1), prompt_embeds], dim=1)
image_original = pipeline(prompt_embeds=prompt_embeds, num_inference_steps=25, generator=torch.Generator(device).manual_seed(seed)).images[0]
image_original.save("example_orig.png")
image_detoxified = pipeline(prompt_embeds=concat_embeds, num_inference_steps=25, generator=torch.Generator(device).manual_seed(seed)).images[0]
image_detoxified.save("example_deto.png")
To access the pretrained weights of the value encoder, you could also directly visit this link.
Please check value_retriever/retriever.py for the implementation and example usage of the value retriever used in our work. You could also run the file by the following steps:
export OPENAI_API_KEY="<your_api_key>"
cd value_retriever
python retriever.py
Note
Please kindly note that besides our implementation of the value retriever, which is a simple combination of keyword matching and LLM reasoning, there are more possible approaches such as language model finetuning, as long as they fit the formulation of value retriever in the Section 3.2 of our paper.
The training script of the value encoder is provided at value_encoder/train.py. To train the value encoder, we need to prepare a text-image preference dataset as described in Section 3.4 in our paper. Here we provide the debias part of the dataset for demonstration training, and you can download it from this link (MD5: dadca2632564c78626ca619d63d7c9ee). To access the full training dataset, please refer to the Datasets Access section.
After downloading the archive file, which should be named as livo_data_debias.tar.gz
, move it to the root of this repository and extract the dataset:
cd your/local/path/to/LiVO
mv download/path/to/livo_data_debias.tar.gz ./
tar -xzvf livo_data_debias.tar.gz
You should get a directory under the root named livo_data
, which contains 3 subfolders: 0-career
, 1-goodness
, 2-badness
. Then you could train the value encoder using the following script:
cd value_encoder
export TRAIN_NAME="livo_bs8_lr1e-6_b1000_a500_g11_g205"
export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5"
export TRAIN_DIR="../livo_data"
export OUTPUT_DIR="../training_runs"
accelerate launch train.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--validation_prompts "gender equality, a photo of a doctor" "nudity considered inappropriate, a photo of a naked woman" \
--num_images_per_val_prompt=8 \
--name=${TRAIN_NAME} \
--output_dir=${OUTPUT_DIR}/${TRAIN_NAME} \
--resolution=512 --center_crop --random_flip \
--train_batch_size=8 \
--max_train_steps=15000 \
--gradient_accumulation_steps=1 \
--gradient_checkpointing \
--learning_rate=1e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=1000 \
--livo_beta=1000 \
--livo_alpha=500 \
--livo_gamma_1=1 \
--livo_gamma_2=0.5 \
--max_grad_norm=1 \
--mixed_precision="fp16" \
--report_to="wandb" \
--checkpointing_steps=1000 \
--checkpoints_total_limit=20 \
--enable_xformers_memory_efficient_attention \
--validation_steps=1000 \
--tracker_project_name="livo"
To perform quantitative evaluations, you could follow the steps below (also provided in evaluation/eval.sh):
- Download and extract evaluation datasets (to access the evaluation dataset, please refer to the Datasets Access section):
cd your/local/path/to/LiVO mv download/path/to/livo_eval_data.tar.gz ./ tar -xzvf livo_eval_data.tar.gz
- Evaluate bias and toxicity metrics (taking the weights trained from the script above as an example):
cd evaluation export MODEL_DIR="../training_runs/livo_bs8_lr1e-6_b1000_a500_g11_g205/checkpoint-15000" export EVAL_DATA_DIR="../livo_eval_data" python eval_bias.py --type gender --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR \ --eval_data ${EVAL_DATA_DIR}/career.jsonl ${EVAL_DATA_DIR}/goodness.jsonl ${EVAL_DATA_DIR}/badness.jsonl python eval_bias.py --type race --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR \ --eval_data ${EVAL_DATA_DIR}/career.jsonl ${EVAL_DATA_DIR}/goodness.jsonl ${EVAL_DATA_DIR}/badness.jsonl python eval_toxicity.py --type nudity --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/nudity.jsonl python eval_toxicity.py --type bloody --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/bloody.jsonl python eval_toxicity.py --type zombie --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/zombie.jsonl
- Evaluate image quality metrics. As FID metric requires reference images, we generate them with SD 1.5 first:
# generate images with SD 1.5 export SD_DIR="../training_runs/stable-diffusion-v1-5" python eval_bias.py --type gender --device cuda:0 --method sd-1-5 --save_path $SD_DIR \ --eval_data ${EVAL_DATA_DIR}/career.jsonl ${EVAL_DATA_DIR}/goodness.jsonl ${EVAL_DATA_DIR}/badness.jsonl python eval_bias.py --type race --device cuda:0 --method sd-1-5 --save_path $SD_DIR \ --eval_data ${EVAL_DATA_DIR}/career.jsonl ${EVAL_DATA_DIR}/goodness.jsonl ${EVAL_DATA_DIR}/badness.jsonl python eval_toxicity.py --type nudity --device cuda:0 --method sd-1-5 --save_path $SD_DIR --eval_data ${EVAL_DATA_DIR}/nudity.jsonl python eval_toxicity.py --type bloody --device cuda:0 --method sd-1-5 --save_path $SD_DIR --eval_data ${EVAL_DATA_DIR}/bloody.jsonl python eval_toxicity.py --type zombie --device cuda:0 --method sd-1-5 --save_path $SD_DIR --eval_data ${EVAL_DATA_DIR}/zombie.jsonl # evaluate image quality metrics python eval_imgs.py --metrics isc fid clip --method livo --device cuda:0 --batch_size 64 --num_workers 4 \ --eval_image_paths ${MODEL_DIR}/imgs/bias_gender_career ${MODEL_DIR}/imgs/bias_gender_goodness ${MODEL_DIR}/imgs/bias_gender_badness ${MODEL_DIR}/imgs/bias_race_career ${MODEL_DIR}/imgs/bias_race_goodness ${MODEL_DIR}/imgs/bias_race_badness \ --ref_image_paths ${SD_DIR}/imgs/bias_gender_career ${SD_DIR}/imgs/bias_gender_goodness ${SD_DIR}/imgs/bias_gender_badness ${SD_DIR}/imgs/bias_race_career ${SD_DIR}/imgs/bias_race_goodness ${MODEL_DIR}/imgs/bias_race_badness python eval_imgs.py --metrics isc fid clip --method livo --device cuda:0 --batch_size 64 --num_workers 4 \ --eval_image_paths ${MODEL_DIR}/imgs/toxicity_nudity_nudity ${MODEL_DIR}/imgs/toxicity_bloody_bloody ${MODEL_DIR}/imgs/toxicity_zombie_zombie \ --ref_image_paths ${SD_DIR}/imgs/toxicity_nudity_nudity ${SD_DIR}/imgs/toxicity_bloody_bloody ${SD_DIR}/imgs/toxicity_zombie_zombie
- Evaluate value encoder metrics:
# retrieve corresponding values of the evaluation dataset cd ../value_retriever python retrieve_eval_data.py cd ../evaluation python eval_bias.py --type retrieved --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR \ --eval_data ${EVAL_DATA_DIR}/retrieved_career.jsonl ${EVAL_DATA_DIR}/retrieved_goodness.jsonl ${EVAL_DATA_DIR}/retrieved_badness.jsonl python eval_toxicity.py --type retrieved --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/retrieved_nudity.jsonl python eval_toxicity.py --type retrieved --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/retrieved_bloody.jsonl python eval_toxicity.py --type retrieved --device cuda:0 --method livo --livo_model $MODEL_DIR --save_path $MODEL_DIR --eval_data ${EVAL_DATA_DIR}/retrieved_zombie.jsonl`
As the full training dataset and the evaluation dataset used in our work contain sensitive content, including discriminatory, pornographic, bloody, and horrific scenes, direct public access is restricted.
To obtain the dataset, please send a request via email. By submitting a request, you confirm that:
- You will use the dataset exclusively for academic research purposes.
- You will not share, distribute, or make the dataset publicly available online in any form.
- You understand and agree that any use of the dataset is at your own risk, and the authors of this repository hold no responsibility for any consequences arising from your use of the dataset.
Warning
By requesting access, you are deemed to have accepted these terms.
- Thanks to the 🤗 Diffusers Library and everyone who ever contributed to it! We built our work upon this great open-source project.
- Thanks to the authors of Fair Diffusion, Concept Ablation, Unified Concept Editing for their amazing research and the open-source of their implementations! We use their implementations as baseline models in our paper.
If you feel this repo is helpful to your research, please cite our work.
@inproceedings{wang2024embedding,
title={Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization},
author={Wang, Xingqi and Yi, Xiaoyuan and Xie, Xing and Jia, Jia},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={3558--3567},
year={2024}
}