Instruct-Pix2Pix, DiT, LoRA
🪄 Instruct-Pix2Pix
Instruct-Pix2Pix is a Stable Diffusion model fine-tuned for editing images from human instructions. Given an input image and a written instruction that tells the model what to do, the model follows these instructions to edit the image.
The model was released with the paper InstructPix2Pix: Learning to Follow Image Editing Instructions. More information about the model can be found in the paper.
pip install diffusers transformers safetensors accelerate
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
url = "https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/mountain.png"
def download_image(url):
image = PIL.Image.open(requests.get(url, stream=True).raw)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(url)
prompt = "make the mountains snowy"
edit = pipe(prompt, image=image, num_inference_steps=20, image_guidance_scale=1.5, guidance_scale=7).images[0]
images[0].save("snowy_mountains.png")
- Add InstructPix2Pix pipeline by @patil-suraj #2040
🤖 DiT
Diffusion Transformers (DiTs) is a class conditional latent diffusion model which replaces the commonly used U-Net backbone with a transformer operating on latent patches. The pretrained model is trained on the ImageNet-1K dataset and is able to generate class conditional images of 256x256 or 512x512 pixels.
The model was released with the paper Scalable Diffusion Models with Transformers.
import torch
from diffusers import DiTPipeline
model_id = "facebook/DiT-XL-2-256"
pipe = DiTPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
# pick words that exist in ImageNet
words = ["white shark", "umbrella"]
class_ids = pipe.get_label_ids(words)
output = pipe(class_labels=class_ids)
image = output.images[0] # label 'white shark'
⚡ LoRA
LoRA is a technique for performing parameter-efficient fine-tuning for large models. LoRA works by adding so-called "update matrices" to specific blocks of a pre-trained model. During fine-tuning, only these update matrices are updated while the pre-trained model parameters are kept frozen. This allows us to achieve greater memory efficiency as well as easier portability during fine-tuning.
LoRA was proposed in LoRA: Low-Rank Adaptation of Large Language Models. In the original paper, the authors investigated LoRA for fine-tuning large language models like GPT-3. cloneofsimo was the first to try out LoRA training for Stable Diffusion in the popular lora GitHub repository.
Diffusers now supports LoRA! This means you can now fine-tune a model like Stable Diffusion using consumer GPUs like Tesla T4 or RTX 2080 Ti. LoRA support was added to UNet2DConditionModel
and DreamBooth training script by @patrickvonplaten in #1884.
By using LoRA, the fine-tuned checkpoints will be just 3 MBs in size. After fine-tuning, you can use the LoRA checkpoints like so:
from diffusers import StableDiffusionPipeline
import torch
model_path = "sayakpaul/sd-model-finetuned-lora-t4"
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe.unet.load_attn_procs(model_path)
pipe.to("cuda")
prompt = "A pokemon with blue eyes."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")
You can follow these resources to know more about how to use LoRA in diffusers:
- text2image fine-tuning script (by @sayakpaul in #2031).
- Official documentation discussing how LoRA is supported (by @sayakpaul in #2086).
📐 Customizable Cross Attention
LoRA leverages a new method to customize the cross attention layers deep in the UNet. This can be useful for other creative approaches such as Prompt-to-Prompt, and it makes it easier to apply optimizers like xFormers. This new "attention processor" abstraction was created by @patrickvonplaten in #1639 after discussing the design with the community, and we have used it to rewrite our xFormers and attention slicing implementations!
🌿 Flax => PyTorch
A long requested feature, prolific community member @camenduru took up the gauntlet in #1900 and created a way to convert Flax model weights for PyTorch. This means that you can train or fine-tune models super fast using Google TPUs, and then convert the weights to PyTorch for everybody to use. Thanks @camenduru!
🌀 Flax Img2Img
Another community member, @dhruvrnaik, ported the image-to-image pipeline to Flax in #1355! Using a TPU v2-8 (available in Colab's free tier), you can generate 8 images at once in a few seconds!
🎲 DEIS Scheduler
DEIS (Diffusion Exponential Integrator Sampler) is a new fast mult step scheduler that can generate high-quality samples in fewer steps.
The scheduler was introduced in the paper Fast Sampling of Diffusion Models with Exponential Integrator. More information about the scheduler can be found in the paper.
from diffusers import StableDiffusionPipeline, DEISMultistepScheduler
import torch
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0
Reproducibility
One can now pass CPU generators to all pipelines even if the pipeline is on GPU. This ensures
much better reproducibility across GPU hardware:
import torch
from diffusers import DDIMPipeline
import numpy as np
model_id = "google/ddpm-cifar10-32"
# load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)
ddim.to("cuda")
# create a generator for reproducibility
generator = torch.manual_seed(0)
# run pipeline for just two steps and return numpy tensor
image = ddim(num_inference_steps=2, output_type="np", generator=generator).images
print(np.abs(image).sum())
See: #1902 and https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
Important New Guides
- Stable Diffusion 101: https://huggingface.co/docs/diffusers/stable_diffusion
- Reproducibility: https://huggingface.co/docs/diffusers/using-diffusers/reproducibility
- LoRA: https://huggingface.co/docs/diffusers/training/lora
Important Bug Fixes
- Don't download safetensors if library is not installed: #2057
- Make sure that
save_pretrained(...)
doesn't accidentally delete files: #2038 - Fix CPU offload docs for maximum memory gain: #1968
- Fix conversion for exotically sorted weight names: #1959
- Fix intermediate checkpointing for textual inversion, thanks @lstein #2072
All commits
- update composable diffusion for an updated diffuser library by @nanlliu in #1697
- [Tests] Fix UnCLIP cpu offload tests by @anton-l in #1769
- Bump to 0.12.0.dev0 by @anton-l in #1771
- [Dreambooth] flax fixes by @pcuenca in #1765
- update train_unconditional_ort.py by @prathikr in #1775
- Only test for xformers when enabling them #1773 by @kig in #1776
- expose polynomial:power and cosine_with_restarts:num_cycles params by @zetyquickly in #1737
- [Flax] Stateless schedulers, fixes and refactors by @skirsten in #1661
- Correct hf hub download by @patrickvonplaten in #1767
- Dreambooth docs: minor fixes by @pcuenca in #1758
- Fix num images per prompt unclip by @patil-suraj in #1787
- Add Flax stable diffusion img2img pipeline by @dhruvrnaik in #1355
- Refactor cross attention and allow mechanism to tweak cross attention function by @patrickvonplaten in #1639
- Fix OOM when using PyTorch with JAX installed. by @pcuenca in #1795
- reorder model wrap + bug fix by @prathikr in #1799
- Remove hardcoded names from PT scripts by @patrickvonplaten in #1778
- [textual_inversion] unwrap_model text encoder before accessing weights by @patil-suraj in #1816
- fix small mistake in annotation: 32 -> 64 by @Line290 in #1780
- Make safety_checker optional in more pipelines by @pcuenca in #1796
- Device to use (e.g. cpu, cuda:0, cuda:1, etc.) by @camenduru in #1844
- Avoid duplicating PyTorch + safetensors downloads. by @pcuenca in #1836
- Width was typod as weight by @Helw150 in #1800
- fix: resize transform now preserves aspect ratio by @parlance-zz in #1804
- Make xformers optional even if it is available by @kn in #1753
- Allow selecting precision to make Dreambooth class images by @kabachuha in #1832
- unCLIP image variation by @williamberman in #1781
- [Community Pipeline] MagicMix by @daspartho in #1839
- [Versatile Diffusion] Fix cross_attention_kwargs by @patrickvonplaten in #1849
- [Dtype] Align dtype casting behavior with Transformers and Accelerate by @patrickvonplaten in #1725
- [StableDiffusionInpaint] Correct test by @patrickvonplaten in #1859
- [textual inversion] add gradient checkpointing and small fixes. by @patil-suraj in #1848
- Flax: Fix img2img and align with other pipeline by @skirsten in #1824
- Make repo structure consistent by @patrickvonplaten in #1862
- [Unclip] Make sure text_embeddings & image_embeddings can directly be passed to enable interpolation tasks. by @patrickvonplaten in #1858
- Fix ema decay by @pcuenca in #1868
- [Docs] Improve docs by @patrickvonplaten in #1870
- [examples] update loss computation by @patil-suraj in #1861
- [train_text_to_image] allow using non-ema weights for training by @patil-suraj in #1834
- [Attention] Finish refactor attention file by @patrickvonplaten in #1879
- Fix typo in train_dreambooth_inpaint by @pcuenca in #1885
- Update ONNX Pipelines to use np.float64 instead of np.float by @agizmo in #1789
- [examples] misc fixes by @patil-suraj in #1886
- Fixes to the help for
report_to
in training scripts by @pcuenca in #1888 - updated doc for stable diffusion pipelines by @yiyixuxu in #1770
- Add UnCLIPImageVariationPipeline to dummy imports by @anton-l in #1897
- Add accelerate and xformers versions to
diffusers-cli env
by @anton-l in #1898 - [addresses issue #1642] add add_noise to scheduling-sde-ve by @aengusng8 in #1827
- Add condtional generation to AudioDiffusionPipeline by @teticio in #1826
- Fixes in comments in SD2 D2I by @neverix in #1903
- [Deterministic torch randn] Allow tensors to be generated on CPU by @patrickvonplaten in #1902
- [Docs] Remove duplicated API doc string by @patrickvonplaten in #1901
- fix: DDPMScheduler.set_timesteps() by @Joqsan in #1912
- Fix --resume_from_checkpoint step in train_text_to_image.py by @merfnad in #1914
- Support training SD V2 with Flax by @yasyf in #1783
- Fix lr-scaling store_true & default=True cli argument for textual_inversion training. by @aredden in #1090
- Various Fixes for Flax Dreambooth by @yasyf in #1782
- Test ResnetBlock2D by @hchings in #1850
- Init for korean docs by @seriousran in #1910
- New Pipeline: Tiled-upscaling with depth perception to avoid blurry spots by @peterwilli in #1615
- Improve reproduceability 2/3 by @patrickvonplaten in #1906
- feat : add log-rho deis multistep scheduler by @qsh-zh in #1432
- Feature/colossalai by @Fazziekey in #1793
- [Docs] Add TRANSLATING.md file by @seriousran in #1920
- [StableDiffusionimg2img] validating input type by @Shubhamai in #1913
- [dreambooth] low precision guard by @williamberman in #1916
- [Stable Diffusion Guide] 101 Stable Diffusion Guide directly into the docs by @patrickvonplaten in #1927
- [Conversion] Make sure ema weights are extracted correctly by @patrickvonplaten in #1937
- fix path to logo by @vvssttkk in #1939
- Add automatic doc sorting by @patrickvonplaten in #1940
- update to latest colossalai by @Fazziekey in #1951
- fix typo in imagic_stable_diffusion.py by @andreemic in #1956
- [Conversion SD] Make sure weirdly sorted keys work as well by @patrickvonplaten in #1959
- allow loading ddpm models into ddim by @patrickvonplaten in #1932
- [Community] Correct checkpoint merger by @patrickvonplaten in #1965
- Update CLIPGuidedStableDiffusion.feature_extractor.size to fix TypeError by @oxidase in #1938
- [CPU offload] correct cpu offload by @patrickvonplaten in #1968
- [Docs] Update README.md by @haofanwang in #1960
- Research project multi subject dreambooth by @klopsahlong in #1948
- Example tests by @patrickvonplaten in #1982
- Fix slow tests by @patrickvonplaten in #1983
- Fix unused upcast_attn flag in convert_original_stable_diffusion_to_diffusers script by @kn in #1942
- Allow converting Flax to PyTorch by adding a "from_flax" keyword by @camenduru in #1900
- Update docstring by @Warvito in #1971
- [SD Img2Img] resize source images to multiple of 8 instead of 32 by @vvsotnikov in #1571
- Update README.md to include our blog post by @sayakpaul in #1998
- Fix a couple typos in Dreambooth readme by @pcuenca in #2004
- Add tests for 2D UNet blocks by @hchings in #1945
- [Conversion] Support convert diffusers to safetensors by @hua1995116 in #1996
- [Community] Fix merger by @patrickvonplaten in #2006
- [Conversion] Improve safetensors by @patrickvonplaten in #1989
- [Black] Update black library by @patrickvonplaten in #2007
- Fix typos in ColossalAI example by @haofanwang in #2001
- Use pipeline tests mixin for UnCLIP pipeline tests + unCLIP MPS fixes by @williamberman in #1908
- Change PNDMPipeline to use PNDMScheduler by @willdalh in #2003
- [train_unconditional] fix LR scheduler init by @patil-suraj in #2010
- [Docs] No more autocast by @patrickvonplaten in #2021
- [Flax] Add Flax inpainting impl by @xvjiarui in #1966
- Check k-diffusion version is at least 0.0.12 by @pcuenca in #2022
- DiT Pipeline by @kashif in #1806
- fix dit doc header by @patil-suraj in #2027
- [LoRA] Add LoRA training script by @patrickvonplaten in #1884
- [Dit] Fix dit tests by @patrickvonplaten in #2034
- Fix typos and minor redundancies by @Joqsan in #2029
- [Lora] Model card by @patrickvonplaten in #2032
- [Save Pretrained] Remove dead code lines that can accidentally remove pytorch files by @patrickvonplaten in #2038
- Fix EMA for multi-gpu training in the unconditional example by @anton-l in #1930
- Minor fix in the documentation of LoRA by @hysts in #2045
- Add InstructPix2Pix pipeline by @patil-suraj in #2040
- Create repo before cloning in examples by @Wauplin in #2047
- Remove modelcards dependency by @Wauplin in #2050
- Module-ise "original stable diffusion to diffusers" conversion script by @damian0815 in #2019
- [StableDiffusionInstructPix2Pix] use cpu generator in slow tests by @patil-suraj in #2051
- [From pretrained] Don't download .safetensors files if safetensors is… by @patrickvonplaten in #2057
- Correct Pix2Pix example by @patrickvonplaten in #2056
- add community pipeline: StableUnCLIPPipeline by @budui in #2037
- [LoRA] Adds example on text2image fine-tuning with LoRA by @sayakpaul in #2031
- Safetensors loading in "convert_diffusers_to_original_stable_diffusion" by @cafeai in #2054
- [examples] add dataloader_num_workers argument by @patil-suraj in #2070
- Dreambooth: reduce VRAM usage by @gleb-akhmerov in #2039
- [Paint by example] Fix cpu offload for paint by example by @patrickvonplaten in #2062
- [textual_inversion] Fix resuming state when using gradient checkpointing by @pcuenca in #2072
- [lora] Log images when using tensorboard by @pcuenca in #2078
- Fix resume epoch for all training scripts except textual_inversion by @pcuenca in #2079
- [dreambooth] fix multi on gpu. by @patil-suraj in #2088
- Run inference on a specific condition and fix call of manual_seed() by @shirayu in #2074
- [Feat] checkpoint_merger works on local models as well as ones that use safetensors by @lstein in #2060
- xFormers attention op arg by @takuma104 in #2049
- [docs] [dreambooth] note random crop by @williamberman in #2085
- Remove wandb from text_to_image requirements.txt by @pcuenca in #2092
- [doc] update example for pix2pix by @patil-suraj in #2101
- Add
lora
tag to the model tags by @apolinario in #2103 - [docs] Adds a doc on LoRA support for diffusers by @sayakpaul in #2086
- Allow directly passing text embeddings to Stable Diffusion Pipeline for prompt weighting by @patrickvonplaten in #2071
- Improve transformers versions handling by @patrickvonplaten in #2104
- Reproducibility 3/3 by @patrickvonplaten in #1924
🙌 Significant community contributions 🙌
The following contributors have made significant changes to the library over the last release:
- @nanlliu
- update composable diffusion for an updated diffuser library (#1697)
- @skirsten
- @hchings
- @seriousran
- @qsh-zh
- feat : add log-rho deis multistep scheduler (#1432)
- @Fazziekey
- @klopsahlong
- Research project multi subject dreambooth (#1948)
- @xvjiarui
- [Flax] Add Flax inpainting impl (#1966)
- @damian0815
- Module-ise "original stable diffusion to diffusers" conversion script (#2019)
- @camenduru
- Allow converting Flax to PyTorch by adding a "from_flax" keyword (#1900)