You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run a system with a somewhat large PYTHONPATH that we can't truncate, and we are currently blocked from upgrading diffusers to any version that uses peft for LORA inference.
We've observed a behavior where the time taken for load_lora_weights increases significantly with more paths added to PYTHONPATH. This can be reproduced in the example provided - with 10,000 folders added to PYTHONPATH, we get the following latencies:
Load time: 291.78441095352173
Fuse time: 0.12406659126281738
Set adapter time 0.06171250343322754
Inference time: 9.685987710952759
Unfuse time: 0.08063459396362305
Unload time: 0.15737533569335938
Benchmarking against 1, 10, 100, 1000, 10000 and 50000 entries in the PYTHONPATH, we get a pretty astounding increase in load latency:
Even at 100 entries, we're looking at an extra 4 seconds per load call which is a pretty significant increase.
We looked briefly at it and came to the conclusion that it's something to do with the way peft runs module imports, particularly repetitive calls to import modules, where the imports are not cached, eg, importlib.util.find_spec doesn't cache imports.
Instead of this behaviour, we'd expect that load_lora_weights retains a relatively constant load time, regardless of the length of our python path.
The text was updated successfully, but these errors were encountered:
Interesting, thanks for bringing this to our attention. My first instinct would be to add a cache to all the functions that use importlib.util.find_spec, as something like:
Potentially, yeah - is it possible to do this once at a higher level in the code, rather than every function call? Otherwise decorating them with @functools.cache might also help :)
is it possible to do this once at a higher level in the code
You mean at the caller site of these functions? Very unlikely, as they can be used in many different places. However, I think that a cache on these functions should be fast enough. Do you want to give this a try?
System Info
Who can help?
@sayakpaul
Information
Tasks
examples
folderReproduction
Flamegraph:
Expected behavior
I run a system with a somewhat large
PYTHONPATH
that we can't truncate, and we are currently blocked from upgradingdiffusers
to any version that usespeft
for LORA inference.It's loosely based on this post: https://huggingface.co/blog/lora-adapters-dynamic-loading
We've observed a behavior where the time taken for
load_lora_weights
increases significantly with more paths added toPYTHONPATH
. This can be reproduced in the example provided - with 10,000 folders added to PYTHONPATH, we get the following latencies:Benchmarking against 1, 10, 100, 1000, 10000 and 50000 entries in the
PYTHONPATH
, we get a pretty astounding increase in load latency:Even at 100 entries, we're looking at an extra 4 seconds per
load
call which is a pretty significant increase.We looked briefly at it and came to the conclusion that it's something to do with the way
peft
runs module imports, particularly repetitive calls to import modules, where the imports are not cached, eg,importlib.util.find_spec
doesn't cache imports.Instead of this behaviour, we'd expect that
load_lora_weights
retains a relatively constant load time, regardless of the length of our python path.The text was updated successfully, but these errors were encountered: