Add InstructPix2Pix pipeline #2040

patil-suraj · 2023-01-19T15:55:44Z

This PR adds a StableDiffusionInstructPix2PixPipeline for the InstructPix2Pix: Learning to Follow Image Editing Instructions, a stable diffusion fine-tuned model which allows editing images using language instructions. I've included an example of how to use this pipeline below.

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None).to("cuda")

url = "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(ulr)

prompt = "turn him into cyborg"
images = pipe(prompt, image=image, num_inference_steps=10, image_guidance_scale=1).images
images[0]

HuggingFaceDocBuilderDev · 2023-01-19T16:00:36Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj · 2023-01-20T10:33:31Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

+                # check if sigmas exist in self.scheduler
+                if hasattr(self.scheduler, "sigmas"):
+                    step_index = (self.scheduler.timesteps == t).nonzero().item()
+                    sigma = self.scheduler.sigmas[step_index]
+                    noise_pred = latent_model_input + -sigma * noise_pred


hack: get the predcited_oirginal_sample for CFG.

patil-suraj · 2023-01-20T10:36:22Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

+                    if hasattr(self.scheduler, "sigmas"):
+                        noise_pred = (noise_pred - latents) / (-sigma)


hack: .step will compute predicted_oirginal_sample again but noise_pred is already predicted_oirginal_sample here. So we change noise_pred such that, when predicted_oirginal_sample is computed inside step, it'll be equal to noise_pred

I see! In that case maybe I'd suggest to use a different name rather than noise_pred.

pcuenca

Very cool!

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

pcuenca · 2023-01-20T11:40:33Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

+                    if hasattr(self.scheduler, "sigmas"):
+                        noise_pred = (noise_pred - latents) / (-sigma)


I see! In that case maybe I'd suggest to use a different name rather than noise_pred.

pcuenca · 2023-01-20T11:51:25Z

Oh, another nit: maybe move the __call__ method up as we discussed the other day, since it's more important than all the stuff that's repeated in all pipelines.

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

…to pix2pix

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx

patrickvonplaten · 2023-01-20T14:59:41Z

tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py

+        image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device)
+        image = image.cpu().permute(0, 2, 3, 1)[0]
+        image = Image.fromarray(np.uint8(image)).convert("RGB")
+        if str(device).startswith("mps"):


It's fine to leave for now, but going forward (especially once: #1924 is merged), let's make sure to always just do:

generator = torch.manual_seed(seed)

There is no need anymore to create the generator on GPU

tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py

patrickvonplaten · 2023-01-20T15:01:01Z

tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py

+        torch.cuda.empty_cache()
+
+    def get_inputs(self, device, dtype=torch.float32, seed=0):
+        generator = torch.Generator(device=device).manual_seed(seed)


Suggested change

generator = torch.Generator(device=device).manual_seed(seed)

generator = torch.manual_seed(seed)

ok to leave for now, just FYI (will adapt in the big reproducibility PR)

tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

…to pix2pix

patrickvonplaten · 2023-01-20T15:11:32Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

+                return torch.device(module._hf_hook.execution_device)
+        return self.device
+
+    def _encode_prompt(self, prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative_prompt):


(nit) we could add the "Copied from statement here" with a "ends statement right before the end when we duplicate for the third tensor. See https://github.com/huggingface/transformers/blob/7419d807ff3d2ca45757c9e3090388b721e131ce/src/transformers/models/roformer/modeling_roformer.py#L390

We can add:

# End Copy

I think

patrickvonplaten · 2023-01-20T15:12:00Z

src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

+            seq_len = uncond_embeddings.shape[1]
+            uncond_embeddings = uncond_embeddings.repeat(1, num_images_per_prompt, 1)
+            uncond_embeddings = uncond_embeddings.view(batch_size * num_images_per_prompt, seq_len, -1)
+


Suggested change

# End Copy

Think then we can use the copied from function

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten

Very nice PR! Code looks clean, docs & tests are nice .

Left some suggestions, but good to merge for me :-)

dblunk88 · 2023-01-20T17:31:48Z

Tested it out, doesn't work for me. Might need better instructions or something is broken. Tested out the original script from the other repo, that one works without an issue

pcuenca · 2023-01-22T16:51:06Z

Hello @dblunk88! It works fine for me. Note, however, that you need to install diffusers from main in order to test it. If you did, would you mind to open a new issue so we can track it down? Thanks a lot!

lsabrinax · 2023-07-11T08:31:12Z

very nice work! I want to know whether any code can convert original instructpixel2pixel model to the diffusers format

* being pix2pix * ifx * cfg image_latents * fix some docstr * fix * fix * hack * fix * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add comments to explain the hack * move __call__ to the top * doc * remove height and width * remove depreications * fix doc str * quality * fast tests * chnage model id * fast tests * fix test * address Pedro's comments * copyright * Simple doc page. * Apply suggestions from code review * style * Remove import * address some review comments * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

being pix2pix

38cc262

patil-suraj added 6 commits January 19, 2023 17:07

ifx

f6cea81

cfg image_latents

8a84549

fix some docstr

7ae9fd5

fix

b805775

fix

53f9eb5

hack

4c6d006

patil-suraj commented Jan 20, 2023

View reviewed changes

patil-suraj requested review from pcuenca and patrickvonplaten January 20, 2023 10:41

pcuenca approved these changes Jan 20, 2023

View reviewed changes

patil-suraj and others added 5 commits January 20, 2023 12:56

fix

6b84514

Apply suggestions from code review

f91e232

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

add comments to explain the hack

a5bfb04

Merge branch 'pix2pix' of https://github.com/huggingface/diffusers in…

f624b3f

…to pix2pix

move __call__ to the top

8941296

pcuenca reviewed Jan 20, 2023

View reviewed changes

patil-suraj added 11 commits January 20, 2023 13:13

doc

62ee0ee

remove height and width

3c5953a

remove depreications

3a01b54

fix doc str

74890b0

quality

c4ab485

fast tests

297b8ca

chnage model id

61b09f0

fast tests

c4b6a77

fix test

db37ae2

address Pedro's comments

c2d1088

copyright

67ba0ba

pcuenca reviewed Jan 20, 2023

View reviewed changes

docs/source/en/api/pipelines/stable_diffusion/pix2pix.mdx Outdated Show resolved Hide resolved

Remove import

93cd764