Reverse Stable Diffusion: What prompt was used to generate this image?

Introduction

This repository contains the training procedure introduced in: "Reverse Stable Diffusion: What prompt was used to generate this image?" on the image-to-text-embedding task: https://arxiv.org/abs/2308.01472

Prerequisites

Create a conda environment and run pip install:

conda create -n <name_env> python=3.9
conda activate <name_env>
pip install -r requirements.txt

The code expects a data set of image and text pairs, stored as follows:

|root_dir
  |images_part1
    |images
      |000000.png
  ...
  |images_part8
  |sentence_embeddings
    000000.npy
  metadata.csv

where sentence_embeddings is a directory and stores the target embeddings obtained from a sentence transformer. In our experiments we used the following work to extract the embeddings: https://arxiv.org/pdf/1908.10084.pdf.

Moreover, it requires a vocabulary for multi-label classification. The script compute_vocab.py computes this vocabulary.

metadata.csv contains the pairs of images and text prompts.

The path to "root_dir" should be specified in global_configs.py.

Train models

We have two scripts for each model to perform the training. The first one runs the vanilla training process, while the second one runs the curriculum learning procedure. These scripts can be invoked via main.py from each directory.

A special case is the U-Net because it expects the captions and it also works in the latent space of StableDiffusion, thus it requires a preliminary step to map the images in this latent space (not included in the repo). For captioning we used BLIP: https://github.com/salesforce/BLIP

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
clip_experiments		clip_experiments
ensemble_experiments		ensemble_experiments
stablediffusion		stablediffusion
swin_image_experiments		swin_image_experiments
transductive_kernel		transductive_kernel
unet_enc_experiments		unet_enc_experiments
utils		utils
vit_image_experiments		vit_image_experiments
.gitignore		.gitignore
README.md		README.md
compute_vocab.py		compute_vocab.py
global_configs.py		global_configs.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reverse Stable Diffusion: What prompt was used to generate this image?

Introduction

Prerequisites

Train models

About

Releases

Packages

Contributors 3

Languages

CroitoruAlin/Reverse-Stable-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Reverse Stable Diffusion: What prompt was used to generate this image?

Introduction

Prerequisites

Train models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages