-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InstructPix2Pix pipeline #2040
Conversation
The documentation is not available anymore as the PR was closed or merged. |
# check if sigmas exist in self.scheduler | ||
if hasattr(self.scheduler, "sigmas"): | ||
step_index = (self.scheduler.timesteps == t).nonzero().item() | ||
sigma = self.scheduler.sigmas[step_index] | ||
noise_pred = latent_model_input + -sigma * noise_pred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hack: get the predcited_oirginal_sample for CFG.
if hasattr(self.scheduler, "sigmas"): | ||
noise_pred = (noise_pred - latents) / (-sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hack: .step
will compute predicted_oirginal_sample
again but noise_pred
is already predicted_oirginal_sample
here. So we change noise_pred
such that, when predicted_oirginal_sample
is computed inside step
, it'll be equal to noise_pred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! In that case maybe I'd suggest to use a different name rather than noise_pred
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool!
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
if hasattr(self.scheduler, "sigmas"): | ||
noise_pred = (noise_pred - latents) / (-sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! In that case maybe I'd suggest to use a different name rather than noise_pred
.
Oh, another nit: maybe move the |
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device) | ||
image = image.cpu().permute(0, 2, 3, 1)[0] | ||
image = Image.fromarray(np.uint8(image)).convert("RGB") | ||
if str(device).startswith("mps"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to leave for now, but going forward (especially once: #1924 is merged), let's make sure to always just do:
generator = torch.manual_seed(seed)
There is no need anymore to create the generator on GPU
tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py
Show resolved
Hide resolved
torch.cuda.empty_cache() | ||
|
||
def get_inputs(self, device, dtype=torch.float32, seed=0): | ||
generator = torch.Generator(device=device).manual_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generator = torch.Generator(device=device).manual_seed(seed) | |
generator = torch.manual_seed(seed) |
ok to leave for now, just FYI (will adapt in the big reproducibility PR)
tests/pipelines/stable_diffusion/test_stable_diffusion_instruction_pix2pix.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Show resolved
Hide resolved
src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py
Outdated
Show resolved
Hide resolved
return torch.device(module._hf_hook.execution_device) | ||
return self.device | ||
|
||
def _encode_prompt(self, prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative_prompt): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) we could add the "Copied from statement here" with a "ends statement right before the end when we duplicate for the third tensor. See https://github.com/huggingface/transformers/blob/7419d807ff3d2ca45757c9e3090388b721e131ce/src/transformers/models/roformer/modeling_roformer.py#L390
We can add:
# End Copy
I think
seq_len = uncond_embeddings.shape[1] | ||
uncond_embeddings = uncond_embeddings.repeat(1, num_images_per_prompt, 1) | ||
uncond_embeddings = uncond_embeddings.view(batch_size * num_images_per_prompt, seq_len, -1) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# End Copy |
Think then we can use the copied from function
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR! Code looks clean, docs & tests are nice .
Left some suggestions, but good to merge for me :-)
Tested it out, doesn't work for me. Might need better instructions or something is broken. Tested out the original script from the other repo, that one works without an issue |
Hello @dblunk88! It works fine for me. Note, however, that you need to install |
very nice work! I want to know whether any code can convert original instructpixel2pixel model to the diffusers format |
* being pix2pix * ifx * cfg image_latents * fix some docstr * fix * fix * hack * fix * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * add comments to explain the hack * move __call__ to the top * doc * remove height and width * remove depreications * fix doc str * quality * fast tests * chnage model id * fast tests * fix test * address Pedro's comments * copyright * Simple doc page. * Apply suggestions from code review * style * Remove import * address some review comments * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This PR adds a
StableDiffusionInstructPix2PixPipeline
for the InstructPix2Pix: Learning to Follow Image Editing Instructions, a stable diffusion fine-tuned model which allows editing images using language instructions. I've included an example of how to use this pipeline below.