siliconflow · strint · Feb 5, 2024 · Jan 30, 2024 · Jan 30, 2024 · Jan 30, 2024
diff --git a/examples/text_to_image_sdxl_lora.py b/examples/text_to_image_sdxl_lora.py
@@ -5,7 +5,7 @@
 from onediff.infer_compiler.utils import TensorInplaceAssign
 
 try:
-    from onediffx.utils.lora import load_and_fuse_lora, unfuse_lora
+    from onediffx.lora import load_and_fuse_lora, unfuse_lora
 except ImportError:
     raise RuntimeError("OneDiff onediffx is not installed. Please check onediff_diffusers_extensions/README.md to install onediffx.")
 
@@ -93,7 +93,7 @@
 
 # 4. unfuse_lora can uninstall LoRA weights and restore the weights of UNet 
 generator = torch.manual_seed(0)
-unfuse_lora(pipe.unet)
+unfuse_lora(pipe)
 images_fusion = pipe(
     "masterpiece, best quality, mountain",
     generator=generator,

diff --git a/onediff_diffusers_extensions/README.md b/onediff_diffusers_extensions/README.md
@@ -7,7 +7,7 @@ OneDiffX is a OneDiff Extension for HF diffusers. It provides some acceleration
 - [DeepCache Speedup](#deepcache-speedup)
     - [Stable Diffusion XL](#run-stable-diffusion-xl-with-onediffx)
     - [Stable Diffusion 1.5](#run-stable-diffusion-15-with-onediffx)
-- [LoRA loading and switching speed up](#lora-loading-and-switching-speed-up)
+- [Fast LoRA loading and switching](#fast-lora-loading-and-switching)
 - [Quantization](#quantization)
 - [Contact](#contact)
 
@@ -150,9 +150,42 @@ deepcache_output = pipe(
 export_to_video(deepcache_output, "generated.mp4", fps=7)
 ```
 
-## LoRA loading and switching speed up
 
-OneDiff provides a faster implementation of loading LoRA, by invoking `onediffx.utils.lora.load_and_fuse_lora` you can load and fuse LoRA to pipeline.
+## Fast LoRA loading and switching
+
+OneDiff provides a more efficient implementation of loading LoRA, by invoking `load_and_fuse_lora` you can load and fuse LoRA to pipeline, and by invoking `unfuse_lora` you can restore the weight of base model.
+
+### API
+`onediffx.utils.lora.load_and_fuse_lora(pipeline: LoraLoaderMixin, pretrained_model_name_or_path_or_dict: Union[str, Path, Dict[str, torch.Tensor]], adapter_name: Optional[str] = None, *, lora_scale: float = 1.0, offload_device="cpu", offload_weight="lora", use_cache=False, **kwargs)`:
+- pipeline (`LoraLoaderMixin`): The pipeline that will load and fuse LoRA weight.
+
+- pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`):  Can be either:
+
+    - A string, the *model id* (for example `google/ddpm-celebahq-256`) of a pretrained model hosted on the Hub.
+
+    - A path to a *directory* containing the model weights saved with [ModelMixin.save_pretrained()](https://huggingface.co/docs/diffusers/v0.25.1/en/api/models/overview#diffusers.ModelMixin.save_pretrained).
+
+    - A [torch state dict](https://pytorch.org/tutorials/beginner/saving_loading_models.html#what-is-a-state-dict).
+
+- adapter_name(`str`, *optional*): Adapter name to be used for referencing the loaded adapter model. If not specified, it will use `default_{i}` where i is the total number of adapters being loaded. **Not supported now**.
+
+- lora_scale (`float`, defaults to 1.0): Controls how much to influence the outputs with the LoRA parameters.
+
+- offload_device (`str`, must be one of "cpu" and "cuda"): The device to offload the weight of LoRA or model
+
+- offload_weight (`str`, must be one of "lora" and "weight"): The weight type to offload. If set to "lora", the weight of LoRA will be offloaded to `offload_device`, and if set to "weight", the weight of Linear or Conv2d will be offloaded.
+
+- use_cache (`bool`, optional): Whether to save LoRA to cache. If set to True, loaded LoRA will be cached in memory.
+
+- kwargs(`dict`, *optional*) — See [lora_state_dict()](https://huggingface.co/docs/diffusers/v0.25.1/en/api/loaders/lora#diffusers.loaders.LoraLoaderMixin.lora_state_dict)
+
+
+
+`onediffx.utils.lora.unfuse_lora(pipeline: LoraLoaderMixin) -> None`:
+
+- pipeline (`LoraLoaderMixin`): The pipeline that will unfuse LoRA weight.
+
+### Example
 
 ```python
 import torch
@@ -161,9 +194,7 @@ from onediffx import compile_pipe
 from onediffx.utils.lora import load_and_fuse_lora, unfuse_lora
 
 MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
-pipe = DiffusionPipeline.from_pretrained(
-    MODEL_ID, variant="fp16", torch_dtype=torch.float16
-).to("cuda")
+pipe = DiffusionPipeline.from_pretrained(MODEL_ID, variant="fp16", torch_dtype=torch.float16).to("cuda")
 
 LORA_MODEL_ID = "hf-internal-testing/sdxl-1.0-lora"
 LORA_FILENAME = "sd_xl_offset_example-lora_1.0.safetensors"
@@ -179,17 +210,40 @@ images_fusion = pipe(
     num_inference_steps=30,
 ).images[0]
 images_fusion.save("test_sdxl_lora.png")
+
+# before loading another LoRA, you need to
+# unload LoRA weights and restore base model
+unfuse_lora(pipe)
+load_and_fuse_lora(pipe, LORA_MODEL_ID, weight_name=LORA_FILENAME, lora_scale=1.0)
 ```
 
-We compared different methods of loading LoRA. The comparison of loading LoRA once is as shown in the table below.
+### Benchmark
+
+We choose 5 LoRAs to profile loading and switching speed of 3 different APIs
+
+1. `load_lora_weight`, which has high loading performance but low inference performance
+
+2. `load_lora_weight + fuse_lora`, which has high inference performance but low loading performance
+
+3. `onediffx.utils.lora.load_and_fuse_lora`, which has high loading performance and high inference performance
+
+
+The results are shown below
+
+| LoRA name                                | size  | load_lora_weight | load_lora_weight + fuse_lora | **onediffx load_and_fuse_lora** | unet cnt | te1 cnt | te2 cnt | src link                                      |
+|------------------------------------------|-------|-------------------|-----------------------------|----------------------------------|----------|---------|---------|-----------------------------------------------|
+| SDXL-Emoji-Lora-r4.safetensors           | 28M   | 1.69 s            | 2.34 s                      | **0.78 s**                       | 2166     | 216     | 576     | [Link](https://novita.ai/model/SDXL-Emoji-Lora-r4_160282)             |
+| sdxl_metal_lora.safetensors              | 23M   | 0.97 s            | 1.73 s                      | **0.19 s**                       | 1120     | 0       | 0       |                                               |
+| simple_drawing_xl_b1-000012.safetensors | 55M   | 1.67 s            | 2.57 s                      | **0.77 s**                       | 2166     | 216     | 576     | [Link](https://civitai.com/models/177820/sdxl-simple-drawing)        |
+| texta.safetensors                        | 270M  | 1.72 s            | 2.86 s                      | **0.97 s**                       | 2364     | 0       | 0       | [Link](https://civitai.com/models/221240/texta-generate-text-with-sdxl) |
+| watercolor_v1_sdxl_lora.safetensors     | 12M   | 1.54 s            | 2.01 s                      | **0.35 s**                       | 1680     | 0       | 0       |                                               |
+
+### Note
+
+1. OneDiff extensions for LoRA is currently not supported for PEFT, and only supports diffusers of at least version 0.21.0.
 
-| Method                           | Speed | Inference speed | LoRA loading speed    |
-|----------------------------------|-------|------------------|-----------------------|
-| load_lora_weight                 | 1.10s | low              | high                  |
-| load_lora_weight + fuse_lora     | 1.38s | high             | low                   |
-| onediff load_and_fuse_lora       | 0.56s | **high**         | **high**              |
+2. Diffusers (without PEFT) are limited to loading only one LoRA. Consequently, onediffx is also restricted to loading a single LoRA. We are currently developing onediffx that are compatible with PEFT, enabling onediffx to load multiple LoRAs.
 
-If you want to unload LoRA and then load a new LoRA, you only need to call `load_and_fuse_lora` again. There is no need to manually call `unfuse_lora`, cause it will be called implicitly in `load_and_fuse_lora`. You can also manually call `unfuse_lora` to restore the model's weights.
 
 ## Quantization
 

diff --git a/...ers_extensions/onediffx/utils/__init__.py → ...sers_extensions/onediffx/lora/__init__.py b/...ers_extensions/onediffx/utils/__init__.py → ...sers_extensions/onediffx/lora/__init__.py