Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InstructPix2Pix pipeline #2040

Merged
merged 31 commits into from
Jan 20, 2023
Merged

Add InstructPix2Pix pipeline #2040

merged 31 commits into from
Jan 20, 2023

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Jan 19, 2023

This PR adds a StableDiffusionInstructPix2PixPipeline for the InstructPix2Pix: Learning to Follow Image Editing Instructions, a stable diffusion fine-tuned model which allows editing images using language instructions. I've included an example of how to use this pipeline below.

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None).to("cuda")

url = "https://raw.githubusercontent.com/timothybrooks/instruct-pix2pix/main/imgs/example.jpg"
def download_image(url):
    image = PIL.Image.open(requests.get(url, stream=True).raw)
    image = PIL.ImageOps.exif_transpose(image)
    image = image.convert("RGB")
    return image
image = download_image(ulr)

prompt = "turn him into cyborg"
images = pipe(prompt, image=image, num_inference_steps=10, image_guidance_scale=1).images
images[0]

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 19, 2023

The documentation is not available anymore as the PR was closed or merged.

Comment on lines 625 to 629
# check if sigmas exist in self.scheduler
if hasattr(self.scheduler, "sigmas"):
step_index = (self.scheduler.timesteps == t).nonzero().item()
sigma = self.scheduler.sigmas[step_index]
noise_pred = latent_model_input + -sigma * noise_pred
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack: get the predcited_oirginal_sample for CFG.

Comment on lines 639 to 640
if hasattr(self.scheduler, "sigmas"):
noise_pred = (noise_pred - latents) / (-sigma)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hack: .step will compute predicted_oirginal_sample again but noise_pred is already predicted_oirginal_sample here. So we change noise_pred such that, when predicted_oirginal_sample is computed inside step, it'll be equal to noise_pred

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! In that case maybe I'd suggest to use a different name rather than noise_pred.

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

Comment on lines 639 to 640
if hasattr(self.scheduler, "sigmas"):
noise_pred = (noise_pred - latents) / (-sigma)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! In that case maybe I'd suggest to use a different name rather than noise_pred.

@pcuenca
Copy link
Member

pcuenca commented Jan 20, 2023

Oh, another nit: maybe move the __call__ method up as we discussed the other day, since it's more important than all the stuff that's repeated in all pipelines.

image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device)
image = image.cpu().permute(0, 2, 3, 1)[0]
image = Image.fromarray(np.uint8(image)).convert("RGB")
if str(device).startswith("mps"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave for now, but going forward (especially once: #1924 is merged), let's make sure to always just do:

generator = torch.manual_seed(seed)

There is no need anymore to create the generator on GPU

torch.cuda.empty_cache()

def get_inputs(self, device, dtype=torch.float32, seed=0):
generator = torch.Generator(device=device).manual_seed(seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
generator = torch.Generator(device=device).manual_seed(seed)
generator = torch.manual_seed(seed)

ok to leave for now, just FYI (will adapt in the big reproducibility PR)

return torch.device(module._hf_hook.execution_device)
return self.device

def _encode_prompt(self, prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative_prompt):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) we could add the "Copied from statement here" with a "ends statement right before the end when we duplicate for the third tensor. See https://github.com/huggingface/transformers/blob/7419d807ff3d2ca45757c9e3090388b721e131ce/src/transformers/models/roformer/modeling_roformer.py#L390

We can add:

# End Copy

I think

seq_len = uncond_embeddings.shape[1]
uncond_embeddings = uncond_embeddings.repeat(1, num_images_per_prompt, 1)
uncond_embeddings = uncond_embeddings.view(batch_size * num_images_per_prompt, seq_len, -1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# End Copy

Think then we can use the copied from function

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR! Code looks clean, docs & tests are nice .

Left some suggestions, but good to merge for me :-)

@patil-suraj patil-suraj merged commit e5ff755 into main Jan 20, 2023
@patil-suraj patil-suraj deleted the pix2pix branch January 20, 2023 15:25
@dblunk88
Copy link
Contributor

Tested it out, doesn't work for me. Might need better instructions or something is broken. Tested out the original script from the other repo, that one works without an issue

@pcuenca
Copy link
Member

pcuenca commented Jan 22, 2023

Hello @dblunk88! It works fine for me. Note, however, that you need to install diffusers from main in order to test it. If you did, would you mind to open a new issue so we can track it down? Thanks a lot!

@lsabrinax
Copy link

very nice work! I want to know whether any code can convert original instructpixel2pixel model to the diffusers format

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* being pix2pix

* ifx

* cfg image_latents

* fix some docstr

* fix

* fix

* hack

* fix

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add comments to explain the hack

* move __call__ to the top

* doc

* remove height and width

* remove depreications

* fix doc str

* quality

* fast tests

* chnage model id

* fast tests

* fix test

* address Pedro's comments

* copyright

* Simple doc page.

* Apply suggestions from code review

* style

* Remove import

* address some review comments

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* style

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants