v0.26.0: New video pipelines, single-file checkpoint revamp, multi IP-Adapter inference with multiple images
This new release comes with two new video pipelines, a more unified and consistent experience for single-file checkpoint loading, support for multiple IP-Adapters’ inference with multiple reference images, and more.
I2VGenXL
I2VGenXL is an image-to-video pipeline, proposed in I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models.
import torch
from diffusers import I2VGenXLPipeline
from diffusers.utils import export_to_gif, load_image
repo_id = "ali-vilab/i2vgen-xl"
pipeline = I2VGenXLPipeline.from_pretrained(repo_id, torch_dtype=torch.float16).to("cuda")
pipeline.enable_model_cpu_offload()
image_url = "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/i2vgen_xl_images/img_0001.jpg"
image = load_image(image_url).convert("RGB")
prompt = "A green frog floats on the surface of the water on green lotus leaves, with several pink lotus flowers, in a Chinese painting style."
negative_prompt = "Distorted, discontinuous, Ugly, blurry, low resolution, motionless, static, disfigured, disconnected limbs, Ugly faces, incomplete arms"
generator = torch.manual_seed(8888)
frames = pipeline(
prompt=prompt,
image=image,
num_inference_steps=50,
negative_prompt=negative_prompt,
generator=generator,
).frames
export_to_gif(frames[0], "i2v.gif")
masterpiece, bestquality, sunset.
|
📜 Check out the docs here.
PIA
PIA is a Personalized Image Animator, that aligns with condition images, controls motion by text, and is compatible with various T2I models without specific tuning. PIA uses a base T2I model with temporal alignment layers for image animation. A key component of PIA is the condition module, which transfers appearance information for individual frame synthesis in the latent space, thus allowing a stronger focus on motion alignment. PIA was introduced in PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models.
import torch
from diffusers import (
EulerDiscreteScheduler,
MotionAdapter,
PIAPipeline,
)
from diffusers.utils import export_to_gif, load_image
adapter = MotionAdapter.from_pretrained("openmmlab/PIA-condition-adapter")
pipe = PIAPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", motion_adapter=adapter, torch_dtype=torch.float16)
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/pix2pix/cat_6.png?download=true"
)
image = image.resize((512, 512))
prompt = "cat in a field"
negative_prompt = "wrong white balance, dark, sketches,worst quality,low quality"
generator = torch.Generator("cpu").manual_seed(0)
output = pipe(image=image, prompt=prompt, generator=generator)
frames = output.frames[0]
export_to_gif(frames, "pia-animation.gif")
masterpiece, bestquality, sunset.
|
📜 Check out the docs here.
Multiple IP-Adapters + Multiple reference images support (“Instant LoRA” Feature)
IP-Adapters are becoming quite popular, so we have added support for performing inference multiple IP-Adapters and multiple reference images! Thanks to @asomoza for their help. Get started with the code below:
import torch
from diffusers import AutoPipelineForText2Image, DDIMScheduler
from transformers import CLIPVisionModelWithProjection
from diffusers.utils import load_image
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
"h94/IP-Adapter",
subfolder="models/image_encoder",
torch_dtype=torch.float16,
)
pipeline = AutoPipelineForText2Image.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
image_encoder=image_encoder,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([0.7, 0.3])
pipeline.enable_model_cpu_offload()
face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
generator = torch.Generator(device="cpu").manual_seed(0)
image = pipeline(
prompt="wonderwoman",
ip_adapter_image=[style_images, face_image],
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=50
generator=generator,
).images[0]
Reference face Image | Output Image |
---|---|
📜 Check out the docs here.
Single-file checkpoint loading
from_single_file()
utility has been refactored for better readability and to follow similar semantics as from_pretrained()
. Support for loading single file checkpoints and configs from URLs has also been added.
DPM scheduler fix
We introduced a fix for DPM schedulers, so now you can use it with SDXL to generate high-quality images in fewer steps than the Euler scheduler.
Apart from these, we have done a myriad of refactoring to improve the library design and will continue to do so in the coming days.
All commits
- [docs] Fix missing API function by @stevhliu in #6604
- Fix failing tests due to Posix Path by @DN6 in #6627
- Update convert_from_ckpt.py / read checkpoint config yaml contents by @spezialspezial in #6633
- [Community] Experimental AnimateDiff Image to Video (open to improvements) by @a-r-r-o-w in #6509
- refactor: extract init/forward function in UNet2DConditionModel by @ultranity in #6478
- Modularize InstructPix2Pix SDXL inferencing during and after training in examples by @sang-k in #6569
- Fixed the bug related to saving DeepSpeed models. by @HelloWorldBeginner in #6628
- fix DPM Scheduler with
use_karras_sigmas
option by @yiyixuxu in #6477 - fix SDXL-kdiffusion tests by @yiyixuxu in #6647
- add padding_mask_crop to all inpaint pipelines by @rootonchair in #6360
- add Sa-Solver by @lawrence-cj in #5975
- Add tearDown method to LoRA tests. by @DN6 in #6660
- [Diffusion DPO] apply fixes from #6547 by @sayakpaul in #6668
- Update README by @StandardAI in #6669
- [Big refactor] move unets to
unets
module 🦋 by @sayakpaul in #6630 - Standardise outputs for video pipelines by @DN6 in #6626
- fix dpm related slow test failure by @yiyixuxu in #6680
- [Tests] Test for passing local config file to
from_single_file()
by @sayakpaul in #6638 - [Refactor] Update from single file by @DN6 in #6428
- [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow by @ayushtues in #6057
- Add InstantID Pipeline by @haofanwang in #6673
- [Docs] update: tutorials ja | AUTOPIPELINE.md by @YasunaCoffee in #6629
- [Fix bugs] pipeline_controlnet_sd_xl.py by @haofanwang in #6653
- SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) by @brandostrong in #6449
- AnimateDiff Video to Video by @a-r-r-o-w in #6328
- [docs] UViT2D by @stevhliu in #6643
- Correct sigmas cpu settings by @patrickvonplaten in #6708
- [docs] AnimateDiff Video-to-Video by @a-r-r-o-w in #6712
- fix community README by @a-r-r-o-w in #6645
- fix custom diffusion training with concept list by @AIshutin in #6710
- Add IP Adapters to slow tests by @DN6 in #6714
- Move tests for SD inference variant pipelines into their own modules by @DN6 in #6707
- Add Community Example Consistency Training Script by @dg845 in #6717
- Add UFOGenScheduler to Community Examples by @dg845 in #6650
- [Hub] feat: explicitly tag to diffusers when using push_to_hub by @sayakpaul in #6678
- Correct SNR weighted loss in v-prediction case by only adding 1 to SNR on the denominator by @thuliu-yt16 in #6307
- changed to posix unet by @gzguevara in #6719
- Change os.path to pathlib Path by @Stepheni12 in #6737
- correct hflip arg by @sayakpaul in #6743
- Add unload_textual_inversion method by @fabiorigano in #6656
- [Core] move transformer scripts to
transformers
modules by @sayakpaul in #6747 - Update lora.md with a more accurate description of rank by @xhedit in #6724
- Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. by @sajadn in #6751
- udpate ip-adapter slow tests by @yiyixuxu in #6760
- Update export to video to support new
tensor_to_vid
function in video pipelines by @DN6 in #6715 - [DDPMScheduler] Load
alpha_cumprod
to device to avoid redundant data movement. by @woshiyyya in #6704 - Fix bug in ResnetBlock2D.forward where LoRA Scale gets Overwritten by @dg845 in #6736
- add note about serialization by @sayakpaul in #6764
- Update train_diffusion_dpo.py by @viettmab in #6754
- Pin torch < 2.2.0 in test runners by @DN6 in #6780
- [Kandinsky tests] add
is_flaky
to test_model_cpu_offload_forward_pass by @sayakpaul in #6762 - add ipo, hinge and cpo loss to dpo trainer by @kashif in #6788
- Fix setting scaling factor in VAE config by @DN6 in #6779
- Add PIA Model/Pipeline by @DN6 in #6698
- [docs] Add missing parameter by @stevhliu in #6775
- [IP-Adapter] Support multiple IP-Adapters by @yiyixuxu in #6573
- [sdxl k-diffusion pipeline]move sigma to device by @yiyixuxu in #6757
- [Feat] add I2VGenXL for image-to-video generation by @sayakpaul in #6665
- Release: v0.26.0 by @ (direct commit on v0.26.0-release)
- fix torchvision import by @patrickvonplaten in #6796
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @a-r-r-o-w
- @ultranity
- refactor: extract init/forward function in UNet2DConditionModel (#6478)
- @lawrence-cj
- add Sa-Solver (#5975)
- @ayushtues
- [WIP][Community Pipeline] InstaFlow! One-Step Stable Diffusion with Rectified Flow (#6057)
- @haofanwang
- @brandostrong
- SD 1.5 Support For Advanced Lora Training (train_dreambooth_lora_sdxl_advanced.py) (#6449)
- @dg845