Loading LORA weights in `diffusers` with a `peft` backend increases in latency as more paths are added to `PYTHONPATH` #1576

tisles · 2024-03-21T05:36:47Z

System Info

accelerate==0.21.0
diffusers==0.26.3
peft==0.9.0
safetensors==0.3.3
tokenizers==0.15.2
torch==2.2.1
transformers==4.36.2

Who can help?

@sayakpaul

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

from diffusers import DiffusionPipeline
import time
import torch
import sys
import os
import shutil

pipe_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(pipe_id, torch_dtype=torch.float16).to("cuda")

loras = [
    {
        "adapter_name": "anime"
        "location": "./anime",
        "weight_name": "anime.safetensors",
        "token": "my_anime"
    },
]

def run_dynamic_lora_inference(lora):
    start_load_time = time.time()
    pipe.load_lora_weights(lora["location"], weight_name=lora["weight_name"], adapter_name=lora["adapter_name"])
    end_load_time = time.time()
    prompt = f"Illustration of a dog in the style of {lora["token"]}"

    start_fuse_time = time.time()
    pipe.fuse_lora()
    end_fuse_time = time.time()

    start_set_adapter_time = time.time()
    pipe.set_adapters(lora_name)
    end_set_adapter_time = time.time()

    start_inference_time = time.time()
    image = pipe(
        prompt, num_inference_steps=30, generator=torch.manual_seed(0)
    ).images[0]
    end_inference_time = time.time()

    start_unfuse_time = time.time()
    pipe.unfuse_lora()
    end_unfuse_time = time.time()

    start_unload_time = time.time()
    pipe.unload_lora_weights()
    end_unload_time = time.time()

    image.save(f"./{lora_name}.png")

    print("Load time:", end_load_time - start_load_time)
    print("Fuse time:", end_fuse_time - start_fuse_time)
    print("Set adapter time", end_set_adapter_time - start_set_adapter_time)
    print("Inference time:", end_inference_time - start_inference_time)
    print("Unfuse time:", end_unfuse_time - start_unfuse_time)
    print("Unload time:", end_unload_time - start_unload_time)

def add_to_python_path():
    root_path = "./folders"
    shutil.rmtree(root_path)
    os.mkdir(root_path)

    folders =  [f"folder_{x}" for x in range(0, 10000)]
    for folder in folders:
        os.mkdir(os.path.join(root_path, folder))
        sys.path.append(os.path.join(root_path, folder))

def main():
    for lora in loras:
        run_dynamic_lora_inference(lora)

main()

Flamegraph:

Expected behavior

I run a system with a somewhat large PYTHONPATH that we can't truncate, and we are currently blocked from upgrading diffusers to any version that uses peft for LORA inference.

It's loosely based on this post: https://huggingface.co/blog/lora-adapters-dynamic-loading

We've observed a behavior where the time taken for load_lora_weights increases significantly with more paths added to PYTHONPATH. This can be reproduced in the example provided - with 10,000 folders added to PYTHONPATH, we get the following latencies:

Load time: 291.78441095352173
Fuse time: 0.12406659126281738
Set adapter time 0.06171250343322754
Inference time: 9.685987710952759
Unfuse time: 0.08063459396362305
Unload time: 0.15737533569335938

Benchmarking against 1, 10, 100, 1000, 10000 and 50000 entries in the PYTHONPATH, we get a pretty astounding increase in load latency:

Even at 100 entries, we're looking at an extra 4 seconds per load call which is a pretty significant increase.

We looked briefly at it and came to the conclusion that it's something to do with the way peft runs module imports, particularly repetitive calls to import modules, where the imports are not cached, eg, importlib.util.find_spec doesn't cache imports.

Instead of this behaviour, we'd expect that load_lora_weights retains a relatively constant load time, regardless of the length of our python path.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-03-21T09:49:19Z

Interesting, thanks for bringing this to our attention. My first instinct would be to add a cache to all the functions that use importlib.util.find_spec, as something like:

def is_bnb_available() -> bool:
    return importlib.util.find_spec("bitsandbytes") is not None

should be safe to cache. WDYT, would that solve your issue?

tisles · 2024-03-21T22:13:13Z

Potentially, yeah - is it possible to do this once at a higher level in the code, rather than every function call? Otherwise decorating them with @functools.cache might also help :)

BenjaminBossan · 2024-03-22T09:39:43Z

is it possible to do this once at a higher level in the code

You mean at the caller site of these functions? Very unlikely, as they can be used in many different places. However, I think that a cache on these functions should be fast enough. Do you want to give this a try?

tisles · 2024-03-25T00:36:21Z

Yup! Fix PR is at #1584, turned out to be a relatively simple one :)

tisles mentioned this issue Mar 25, 2024

[feat] Add lru_cache to import_utils calls that did not previously have it #1584

Merged

sayakpaul mentioned this issue Mar 25, 2024

[Imports] cache import checks. huggingface/diffusers#7459

Closed

BenjaminBossan closed this as completed in #1584 Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading LORA weights in `diffusers` with a `peft` backend increases in latency as more paths are added to `PYTHONPATH` #1576

Loading LORA weights in `diffusers` with a `peft` backend increases in latency as more paths are added to `PYTHONPATH` #1576

tisles commented Mar 21, 2024 •

edited

Loading

BenjaminBossan commented Mar 21, 2024

tisles commented Mar 21, 2024 •

edited

Loading

BenjaminBossan commented Mar 22, 2024

tisles commented Mar 25, 2024

Loading LORA weights in diffusers with a peft backend increases in latency as more paths are added to PYTHONPATH #1576

Loading LORA weights in diffusers with a peft backend increases in latency as more paths are added to PYTHONPATH #1576

Comments

tisles commented Mar 21, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Mar 21, 2024

tisles commented Mar 21, 2024 • edited Loading

BenjaminBossan commented Mar 22, 2024

tisles commented Mar 25, 2024

Loading LORA weights in `diffusers` with a `peft` backend increases in latency as more paths are added to `PYTHONPATH` #1576

Loading LORA weights in `diffusers` with a `peft` backend increases in latency as more paths are added to `PYTHONPATH` #1576

tisles commented Mar 21, 2024 •

edited

Loading

tisles commented Mar 21, 2024 •

edited

Loading