unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

poedator · 2023-08-28T08:41:33Z

System Info

peft 0.6.0. dev0

Who can help?

Information

Tasks

Reproduction

just measure time it takes to run get_peft_model() with any large LLM and massive peft_config (guanaco style). Or run this:

import peft
import torch
from torch import nn
import time

n = 8192
device = torch.device('cuda')

model1 = nn.Sequential(
    nn.Linear(n,n, device=device),
)

config1 = peft.LoraConfig(
    target_modules=['0'],
    lora_dropout = 0.
)

start = time.perf_counter()
model2 = peft.get_peft_model(model1, config1)
elapsed = time.perf_counter() - start

print(f"elapsed {elapsed:.3f}")

Expected behavior

get_peft_model() with LoRA raking much less time than loading the original model.

poedator · 2023-08-28T08:43:29Z

I noticed that get_peft_model() takes unnecessarily long time, which is quite annoying when working with LLMs. For instance, applying guanaco adapter to Llama-2 70B takes 15+ minutes.

These are some of the functions called inside get_peft_model():

BaseTuner.inject_adapter()  # tuner_utils.py:30
	LoraModel._create_and_replace():   # lora.py:320
		_create_new_module():  # lora.py:415
			new_module = lora.Linear()  # lora.py:834   # class Linear(nn.Linear, LoraLayer)
				nn.Linear.__init__(self, in_features, out_features, **kwargs)  						# init Linear layer with kaiming call
					create empty weight matrix
					call nn.Linear.reset_parameters()  # UNNECESSARY - these weights will be scrapped later in _replace_module()
						call kaiming_init 															
	        	LoraLayer.__init__(self, in_features=in_features, out_features=out_features)        # just create lora placeholders
	        	nn.Linear.reset_parameters(self)
	        		another call of kaiming_init  # REDUNDANT
		        self.update_layer(adapter_name, r, lora_alpha, lora_dropout, init_lora_weights)
		        	2 calls of Linear init for LoRA matrices (incl kaiming_init)
		_replace_module()
			new_module.weight = OLD_MODULE.weight  # past Linear initializations are not relevant for they are overridden here
			# dispatch LoRA layer modules to correct device

Apparently the reasons for long execution are:

use of CPU to create a replacement layer (vs. original layer's device)
creation of whole new set of weights for replacement module requires call of kaiming_init() (which takes long on cpu).
unnecessary call of reset_parameters() (this is done by module constructor already)

BenjaminBossan · 2023-08-28T09:48:11Z

Thank you for reporting this and for digging into the underlying reasons. As you mentioned, there is probably room for improvement here and we'll take a look at your suggestion and what else we can do to speed things up.

Edit Just now saw your PR, we'll take a look, thanks.

poedator · 2023-09-19T10:57:31Z

How this story ended:
the PR #872 did not go through because a more elegant solution was found.
the solution for the slow init issue was implemented in these PRs: #887, #915 by @BenjaminBossan

jph00 · 2023-10-10T03:44:43Z

@BenjaminBossan @pacman100 I don't think this issue is resolved yet -- or at least there's a highly related issue. The slowness caused by unnecessary inits also occurs in merge_and_unload, via this code path:

  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 701, in merge_and_unload
    return self._unload_and_optionally_merge(progressbar=progressbar, safe_merge=safe_merge)
  File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 420, in _unload_and_optionally_merge
    new_module = torch.nn.Linear(target.in_features, target.out_features, bias=bias)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 101, in __init__
    self.reset_parameters()
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 107, in reset_parameters
    init.kaiming_uniform_(self.weight, a=math.sqrt(5))

jph00 · 2023-10-10T04:03:38Z

I discovered one possible partial fix to this, which is to add device=target.weight.device in _unload_and_optionally_merge, in the call new_module = torch.nn.Linear(target.in_features, target.out_features, bias=bias). By putting it on the device, the unnecessary init is at least accelerated.

BenjaminBossan · 2023-10-10T08:03:03Z

@jph00 Indeed there are still parts of the code that have not been optimized for faster init yet. We track the progress in #896. There is already a PR for bnb layers (#994). Also thanks for bringing up merge_and_unload, I added it to the list of TODOs.

github-actions · 2023-11-03T15:03:53Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2023-11-03T15:12:36Z

Update: The latest PEFT version released today (v0.6.0) includes the speed improvement for bnb LoRA layers.

github-actions · 2023-11-28T15:04:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2023-11-28T15:22:35Z

Note that we should now pretty much cover all cases, resulting in much faster initialization overall (bnb layers, adapters other than LoRA, merging), as can be seen from the state of progress tracked in #896. I'll thus close the issue. Feel free to re-open if you come across a case that is still slower than it should be.

jph00 · 2023-11-28T19:14:13Z

Message ID: ***@***.***>Thanks so much!

poedator mentioned this issue Aug 28, 2023

Faster init with LoRA #872

Closed

BenjaminBossan added the enhancement New feature or request label Aug 28, 2023

poedator changed the title ~~unecessarily slow get_peft_model() with LoRA (esp. in LLMs)~~ unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) Aug 28, 2023

poedator mentioned this issue Sep 19, 2023

Make _fast_init fast again (by surely skipping model weights init)! huggingface/transformers#26258

Closed

BenjaminBossan closed this as completed Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

poedator commented Aug 28, 2023 •

edited by BenjaminBossan

Loading

poedator commented Aug 28, 2023 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Aug 28, 2023 •

edited

Loading

poedator commented Sep 19, 2023

jph00 commented Oct 10, 2023

jph00 commented Oct 10, 2023

BenjaminBossan commented Oct 10, 2023

github-actions bot commented Nov 3, 2023

BenjaminBossan commented Nov 3, 2023

github-actions bot commented Nov 28, 2023

BenjaminBossan commented Nov 28, 2023

jph00 commented Nov 28, 2023 via email

unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

unnecessarily slow get_peft_model() with LoRA (esp. in LLMs) #871

Comments

poedator commented Aug 28, 2023 • edited by BenjaminBossan Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

poedator commented Aug 28, 2023 • edited by BenjaminBossan Loading

BenjaminBossan commented Aug 28, 2023 • edited Loading

poedator commented Sep 19, 2023

jph00 commented Oct 10, 2023

jph00 commented Oct 10, 2023

BenjaminBossan commented Oct 10, 2023

github-actions bot commented Nov 3, 2023

BenjaminBossan commented Nov 3, 2023

github-actions bot commented Nov 28, 2023

BenjaminBossan commented Nov 28, 2023

jph00 commented Nov 28, 2023 via email

poedator commented Aug 28, 2023 •

edited by BenjaminBossan

Loading

poedator commented Aug 28, 2023 •

edited by BenjaminBossan

Loading

BenjaminBossan commented Aug 28, 2023 •

edited

Loading