Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine UnCLIPPipeline and StableDiffusionImageVariationPipeline #1808

Closed
2 tasks done
budui opened this issue Dec 22, 2022 · 7 comments
Closed
2 tasks done

Combine UnCLIPPipeline and StableDiffusionImageVariationPipeline #1808

budui opened this issue Dec 22, 2022 · 7 comments

Comments

@budui
Copy link
Contributor

budui commented Dec 22, 2022

Model/Pipeline/Scheduler description

UnCLIPPipeline("kakaobrain/karlo-v1-alpha") provide a prior model that can generate clip image embedding from text.
StableDiffusionImageVariationPipeline("lambdalabs/sd-image-variations-diffusers") provide a decoder model than can generate images from clip image embedding.

So, I test combineUnCLIPPipeline and StableDiffusionImageVariationPipeline:

    ...
    decoder_pipe = diffusers.StableDiffusionImageVariationPipeline.from_pretrained(
        "lambdalabs/sd-image-variations-diffusers",
        revision="v2.0",
        torch_dtype=torch.float16,
        local_files_only=True,
        safety_checker=None,
    )
    ...
    decoder_pipe.to(device)

    prior_pipe = diffusers.UnCLIPPipeline.from_pretrained(
        "kakaobrain/karlo-v1-alpha",
        torch_dtype=torch.float16,
        local_files_only=True,
    )
    ...
    prior_pipe.to(device)

    #  see function `karlo_prior` and `sd_image_variations_decoder` detail in gist link bellow.
    prior_pipe.text_to_image_embedding = types.MethodType(karlo_prior, prior_pipe)
    decoder_pipe.image_embedding_to_image = types.MethodType(sd_image_variations_decoder, decoder_pipe)

    random_generator = torch.Generator(device=device).manual_seed(1000)

    prompt = "a shiba inu wearing a beret and black turtleneck"
    image_embeddings = prior_pipe.text_to_image_embedding(prompt, generator=random_generator)

    image = decoder_pipe.image_embedding_to_image(image_embeddings, generator=random_generator).images[0]

It works! And I got reasonable results:

shiba-inu

this Stable DALLE2-like Diffusion is relatively lightweight: only need 6.5G GPU RAM use default code and 21s on half T4 GPU.

Open source status

  • The model implementation is available
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

full code above: my gist

@patrickvonplaten
Copy link
Contributor

Very cool! Would you maybe like to add this as a community pipeline as explained here with you being the author? This sounds like a great contribution :-)

@budui
Copy link
Contributor Author

budui commented Jan 8, 2023

Hi, I tried it as you suggested (gist here).

Now the pipeline is mostly OK, but I have a minor problems:

# https://gist.github.com/budui/416b82e489d341f2495b155cb9cb1914#file-stable_unclip_pipeline-py-L289-L300
    pipeline = StableUnCLIPPipeline.from_pretrained(
        "kakaobrain/karlo-v1-alpha",
        torch_dtype=torch.float16,
        decoder_pipe_kwargs=dict(
            image_encoder=None,
            torch_dtype=torch.float16,
        ),
    )
    pipeline.to(device)
    pipeline.decoder_pipe.to(device)

How can I make sure that the pipeline (pipeline) I create has the same dtype and device as it's attribute pipeline (pipeline.decoder_pipe) ? Now I have to set it manually.

@budui
Copy link
Contributor Author

budui commented Jan 9, 2023

I know how to do it now. I will create a PR soon..., seems that I need to write some documents.

@yabenz
Copy link

yabenz commented Jan 10, 2023

Can you please show me how can I add a parameter to specify the image resolution? Also, the seed.

@budui
Copy link
Contributor Author

budui commented Jan 10, 2023

   # https://gist.github.com/budui/416b82e489d341f2495b155cb9cb1914#file-stable_unclip_pipeline-py-L288-L314
    prompt = "a shiba inu wearing a beret and black turtleneck"
    random_generator = torch.Generator(device=device).manual_seed(1000)
    output = pipeline(prompt=prompt, generator=random_generator, width=512, height=512)

@yabenz
Copy link

yabenz commented Jan 10, 2023

Thanks a lot! Going to try that!

@yabenz
Copy link

yabenz commented Jan 10, 2023

Can I do this with Karlo because that's what I'm using. I use the following:

from diffusers import UnCLIPPipeline
import torch

pipe = UnCLIPPipeline.from_pretrained("kakaobrain/karlo-v1-alpha")
pipe = pipe.to('cpu')

prompt = "Cat on a yellow leaf."

image = pipe([prompt]).images[0]

timage.save("./pop.png")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants