Closed
Description
Model/Pipeline/Scheduler description
UnCLIPPipeline
("kakaobrain/karlo-v1-alpha") provide a prior model that can generate clip image embedding from text.
StableDiffusionImageVariationPipeline
("lambdalabs/sd-image-variations-diffusers") provide a decoder model than can generate images from clip image embedding.
So, I test combineUnCLIPPipeline
and StableDiffusionImageVariationPipeline
:
...
decoder_pipe = diffusers.StableDiffusionImageVariationPipeline.from_pretrained(
"lambdalabs/sd-image-variations-diffusers",
revision="v2.0",
torch_dtype=torch.float16,
local_files_only=True,
safety_checker=None,
)
...
decoder_pipe.to(device)
prior_pipe = diffusers.UnCLIPPipeline.from_pretrained(
"kakaobrain/karlo-v1-alpha",
torch_dtype=torch.float16,
local_files_only=True,
)
...
prior_pipe.to(device)
# see function `karlo_prior` and `sd_image_variations_decoder` detail in gist link bellow.
prior_pipe.text_to_image_embedding = types.MethodType(karlo_prior, prior_pipe)
decoder_pipe.image_embedding_to_image = types.MethodType(sd_image_variations_decoder, decoder_pipe)
random_generator = torch.Generator(device=device).manual_seed(1000)
prompt = "a shiba inu wearing a beret and black turtleneck"
image_embeddings = prior_pipe.text_to_image_embedding(prompt, generator=random_generator)
image = decoder_pipe.image_embedding_to_image(image_embeddings, generator=random_generator).images[0]
It works! And I got reasonable results:
this Stable DALLE2-like Diffusion
is relatively lightweight: only need 6.5G GPU RAM use default code and 21s on half T4 GPU.
Open source status
- The model implementation is available
- The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
full code above: my gist
Metadata
Metadata
Assignees
Labels
No labels