Increase speed of adapter layer initialization #896

BenjaminBossan · 2023-09-01T13:25:47Z

Feature request

We are working on increasing the speed of adapter layer initialization. The first step towards this is #887, which tackles linear LoRA layers, which are arguably the most important ones. As discussed with @pacman100, after that PR, there are still some TODOs:

apply to Embedding and Conv2d layers (Enh speed up init emb conv2d #915)
apply to compressed layer types, like Linear4bit, Linear8bit (ENH: Refactor LoRA bnb layers for faster initialization #994)
init LoRA modules (A, B) right on target device, not CPU (won't fix for the time being, seems pretty fast already with all the other changes)
same for AdaLora (Refactor base layer pattern #1106)
same for IA³ (Refactor base layer pattern #1106)
Init in merge_and_unload (Refactor base layer pattern #1106)

Motivation

Initial testing showed ~10x increase in speed with the optimized initialization.

Your contribution

#887 is the first step.

The text was updated successfully, but these errors were encountered:

Partly addresses huggingface#896 Description After speeding up normal LoRA layer initialization, this PR improves initialization speed of bnb LoRA layers. The method to achieve this is different from the one used before, namely this time the base layer is stored as a reference on the LoRA layer. This allows us to avoid calling __init__ on the bnb layer, which is what is slow. Notes We cannot use the same method as for the normal LoRA layers, (i.e. calling the super class's __init__ with meta device) because the bnb layers have extra logic that still creates unnecessary weights. However, the way used here could also be a solution to the normal layers, so if we want to have consistency, the normal layers could be refactored to use the same approach. Interestingly, even though we now save the base layer as a reference, which results in a different state_dict, the existing models can still be loaded successfully. This is because the adapter state_dict is not affected by the change, so users can still load their existing adapters. The only problem would occur if users dump the whole model, i.e. base model and adapter, using torch.save and then trying to load with torch.load. For those users, we could theoretically provide a script to convert the state_dict (i.e. renaming some keys). To ensure that the old adapters can still be loaded successfully, I'm working at the same time on adding regression tests. I'll create a separate PR for those to avoid blowing up this one. Tests I ran a test on bloomz-1b1 for how long it takes to create the PeftModel, the results are: 8bit: 1108.34 ms > 26.82 ms 4bit: 1101.96 ms > 23.69 ms

Partly addresses #896 Description After speeding up normal LoRA layer initialization, this PR improves initialization speed of bnb LoRA layers. The method to achieve this is different from the one used before, namely this time the base layer is stored as a reference on the LoRA layer. This allows us to avoid calling __init__ on the bnb layer, which is what is slow. Notes We cannot use the same method as for the normal LoRA layers, (i.e. calling the super class's __init__ with meta device) because the bnb layers have extra logic that still creates unnecessary weights. However, the way used here could also be a solution to the normal layers, so if we want to have consistency, the normal layers could be refactored to use the same approach. Interestingly, even though we now save the base layer as a reference, which results in a different state_dict, the existing models can still be loaded successfully. This is because the adapter state_dict is not affected by the change, so users can still load their existing adapters. The only problem would occur if users dump the whole model, i.e. base model and adapter, using torch.save and then trying to load with torch.load. For those users, we could theoretically provide a script to convert the state_dict (i.e. renaming some keys). To ensure that the old adapters can still be loaded successfully, I'm working at the same time on adding regression tests. I'll create a separate PR for those to avoid blowing up this one. Tests I ran a test on bloomz-1b1 for how long it takes to create the PeftModel, the results are: 8bit: 1108.34 ms > 26.82 ms 4bit: 1101.96 ms > 23.69 ms

github-actions · 2023-12-11T15:03:51Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2023-12-11T15:30:06Z

Not stale

osmalpkoras · 2024-06-17T07:04:27Z

Hi guys, I am trying to load a meta-llama/Meta-Llama-3-70B-Instruct PEFT model with

LoraConfig(
    init_lora_weights="pissa",
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules="all-linear"
)

and I noticed that it is very slow in initializing the layers (it takes hours). To be precise, it's the for key in key_list loop in peft.tuners.tuners_utils.BaseTuner.inject_adapter that is very slow.

Should this issue be actually fixed with this feature? Because if it should be, than it does not work for meta-llama/Meta-Llama-3-70B-Instruct with the above LoraConfig.

BenjaminBossan · 2024-06-17T11:47:52Z

Thanks for reporting this. Is it still slow when you remove init_lora_weights="pissa" from the config?

osmalpkoras · 2024-06-21T07:04:31Z

Actually if I use the default method, it is very fast. Switching to init_lora_weights="pissa_niter_32" also made it very fast. I just notice that if I use this initialization with a multi-gpu deepspeed setup, it again takes a couple of hours, maybe due to some concurrency issues.

BenjaminBossan · 2024-06-21T08:40:09Z

It is expected that using the default init or PiSSA with n_iter is faster than just pissa. However, a couple of hours still sounds excessive. Does it finish after a couple of hours or do you just cancel the process at that point?

Resolves huggingface/accelerate#2886 Possibly resolves huggingface#896 (comment) Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it. Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested. I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.

Resolves huggingface/accelerate#2886 Possibly resolves #896 (comment) Some LoRA init methods need to access the base layer weight. Getting this access can fail or stall in distributed settings. For DeepSpeed, the weight is now gathered before trying to access it. Note: Without DeepSpeed, this is a no-op and should thus not have any disadvantage. We don't have DS in our CI, so this is not tested. I also made some small changes to OLoRA init to use self.get_base_layer() instead of self.base_layer.

BenjaminBossan mentioned this issue Sep 7, 2023

Faster init with LoRA #872

Closed

This was referenced Oct 5, 2023

ENH: Refactor LoRA bnb layers for faster initialization #994

Merged

unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

Closed

BenjaminBossan closed this as completed Dec 18, 2023

BenjaminBossan mentioned this issue Jun 24, 2024

Can't apply LoRA's PiSSA weight init when using DeepSpeed ZeRO3 + LoRA to finetune! huggingface/accelerate#2886

Closed

4 tasks

BenjaminBossan mentioned this issue Jun 25, 2024

FIX Make special LoRA inits DeepSpeed compatible #1887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase speed of adapter layer initialization #896

Increase speed of adapter layer initialization #896

BenjaminBossan commented Sep 1, 2023 •

edited

Loading

github-actions bot commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

osmalpkoras commented Jun 17, 2024 •

edited

Loading

BenjaminBossan commented Jun 17, 2024

osmalpkoras commented Jun 21, 2024

BenjaminBossan commented Jun 21, 2024

Increase speed of adapter layer initialization #896

Increase speed of adapter layer initialization #896

Comments

BenjaminBossan commented Sep 1, 2023 • edited Loading

Feature request

Motivation

Your contribution

github-actions bot commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

osmalpkoras commented Jun 17, 2024 • edited Loading

BenjaminBossan commented Jun 17, 2024

osmalpkoras commented Jun 21, 2024

BenjaminBossan commented Jun 21, 2024

BenjaminBossan commented Sep 1, 2023 •

edited

Loading

osmalpkoras commented Jun 17, 2024 •

edited

Loading