Refactor dispatching logic of LoRA layers #1319
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR's goal is to simplify the logic for deciding which LoRA layer backend is being used when LoRA is applied to a target layer.
Originally, this refactor was done in #1286 which was about adding the "fast" backend for LoRA, but since that PR was closed, I moved the refactor to this dedicated PR.
Motivation
Right, now, the
LoraModel._create_new_module
method has become quite complex and hard to read, spanning >100 lines:peft/src/peft/tuners/lora/model.py
Lines 235 to 339 in 8665e2b
The reason for this is that method contains the logic for deciding which LoRA layer backend to use for all the different types of LoRA layers that we have, i.e. normal
Linear
layer,Conv2d
layer, bnb layer, gptq, etc.This PR greatly simplifies this method (30 LOC) and should make it easier to prevent bugs. It should also simplify adding further backends in the future.
Description
I moved the logic for deciding which layer to match to the respective implementation of the layers. For example, in
lora/layer.py
, there is now a function calleddispatch_default
, whose responsibility it is to decide if anEmbedding
layer,Conv2d
layer orLinear
layer is the right match. Similarly, inlora/bnb.py
, there are now the two functionsdispatch_bnb_8bit
anddispatch_bnb_4bit
to decide what/if any bnb 8bit or 4bit layer should be matched. Same for the gptq and the megatron backend.This way, the logic to decide what layer to match now resides next to the respective layers. The only thing that
LoraModel
now needs to do is to collect all the dispatching methods and use the first layer that matches.Note that the logic to decide if a layer matches is 100% the same, just moved to a different place. Therefore, there should be no difference in the LoRA model being created.
Only LoRA was modified because the other tuners don't have different backends and thus this approach was not necessary for them. The only exception is IA³, which has normal and bnb backend. Since those are only two, it's not as complicated as for LoRA, but if this PR is accepted, I can refactor IA³ in a similar fashion.
Other changes
optional_kwargs
argument from_create_and_replace
, as it was an unnecessary indirection.bias
argument fromkwargs
, as it was not used.Backwards compatibility
This should be fully backwards compatible, as the constructed LoRA model is 100% the same. If there are users that override
_create_new_module
, their code will probably break, but since this is a private method, we should be fine.Edit: Also ran regression tests and they passed.