Added the option to use the corrected scaling factor for LoRA, based on new research. #1244

Damjan-Kalajdzievski · 2023-12-09T01:03:39Z

Hi, I am proposing to add an option to use the corrected scaling factor for LoRA, based on the recent paper A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA. Try setting use_rslora = True in your LoraConfig for ranks greater than 32 and see the increase in fine-tuning performance (same or better performance for ranks lower than 32 as well).
Please feel free to suggest or change the implementation; I tried to go for the minimum code length change that implements this option.

Summary of method

For a LoRA adapter of rank $r$, the factor $\frac{\alpha}{r}$ that scales the adapter is too aggressive as a function of $r$, and slows learning for higher ranks so that no fine-tuning performance is gained over lower ranks. The paper A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA proves theoretically and experimentally that we should be using an adapter scaling factor of $\frac{\alpha}{\sqrt{r}}$. This corrected scaling factor unlocks a compute/performance trade-off where increasing the rank increases the fine-tuning performance. This also corrects for the ongoing misconceptions that very low ranks not greater than 32 suffice for maximal performance, which entails the belief that the intrinsic dimensionality of fine-tuning is very low dimensional.

Description of changes

Added use_rslora bool in LoraConfig, which when set toTrue, corrects the scaling factor of adapters created with _create_and_replace in LoraModel. The variable use_rslora is set to False by default for backwards consistency.

… the scaling factor of adapters created with _create_and_replace in LoraModel.

HuggingFaceDocBuilderDev · 2023-12-11T11:24:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2023-12-11T15:30:59Z

Thanks @Damjan-Kalajdzievski for the PR. I haven't checked it yet, but could you please run make style so that the CI can run?

Damjan-Kalajdzievski · 2023-12-11T22:26:52Z

Thanks @Damjan-Kalajdzievski for the PR. I haven't checked it yet, but could you please run make style so that the CI can run?

Hi @BenjaminBossan, I have run the command and commited the modification to the files I've changed.

BenjaminBossan

Thanks a lot for adding this seemingly small but efficient initialization option to LoRA. I only skimmed the paper, but it seems that this options helps across the board, although it's most prominent for large ranks.

Regarding how it's implemented, I think we need to adjust the where we apply the scaling. Check my comment regarding that.

Furthermore, I would like to see two additions:

An entry in the docs about this new option here
A small unit test that checks the scaling factor after initializing a simple LoRA model. If you need help with that, let us know.

src/peft/tuners/lora/config.py

BenjaminBossan · 2023-12-13T15:39:16Z

src/peft/tuners/lora/config.py

@@ -57,6 +57,10 @@ class LoraConfig(PeftConfig):
        bias (`str`): Bias type for Lora. Can be 'none', 'all' or 'lora_only'. If 'all' or 'lora_only', the
            corresponding biases will be updated during training. Be aware that this means that, even when disabling
            the adapters, the model will not produce the same output as the base model would have without adaptation.
+        use_rslora (`bool`):
+            When set to True, uses <a href='https://doi.org/10.48550/arXiv.2312.03732'>Rank-Stabilized LoRA</a> which
+            sets the adapter scaling factor to the correct value of `lora_alpha/math.sqrt(r)`. Otherwise, it will use


I wouldn't say "correct value", maybe something like:

sets the adapter scaling factor to lora_alpha/math.sqrt(r), which was shown to work better.

BenjaminBossan · 2023-12-13T15:40:12Z

src/peft/tuners/lora/config.py

+        metadata={
+            "help": (
+                "When set to True, uses "
+                "<a href='https://doi.org/10.48550/arXiv.2312.03732'>Rank-Stabilized LoRA</a> "


Let's not use html syntax in the help.

I interpreted this to mean that I should still include the url but without the html syntax.

BenjaminBossan · 2023-12-13T15:40:23Z

src/peft/tuners/lora/config.py

+            "help": (
+                "When set to True, uses "
+                "<a href='https://doi.org/10.48550/arXiv.2312.03732'>Rank-Stabilized LoRA</a> "
+                "which sets the adapter scaling factor to the correct value "


Same argument about "correct".

BenjaminBossan · 2023-12-13T15:50:11Z

src/peft/tuners/lora/model.py

@@ -194,6 +195,13 @@ def _create_and_replace(
                new_module.requires_grad_(False)
            self._replace_module(parent, target_name, new_module, target)

+        if lora_config.use_rslora:


I think here is not the right place to control the scaling, as this leads to spreading the initialization of the LoRA parameters into different parts of the code. Instead, update_layer, update_layer_conv2d, and update_layer_embedding in tuners/lora/layer.py should be adjusted, since that's where we set the scale initially. This also requires updating the __init__ method to accept the new argument, as well as the kwargs variable here.

…odified peft/docs/source/conceptual_guides/lora.md to be consistent with the new LoraConfig and describe the use_rslora concept as suggested.

…, at peft/tests/test_initialization.py

Damjan-Kalajdzievski · 2023-12-13T23:33:06Z

An entry in the docs about this new option here

I am not sure if it belongs in the description of the initializations since I didn't want to confuse a person unfamiliar with LoRA that the scaling factor is only applied at initialization, as instead it scales every adapter output. So here I commited a small section below the initialization section, describing the scaling factor and what use_rslora does. Not sure if this is the best place so feel free to suggest a change.

A small unit test that checks the scaling factor after initializing a simple LoRA model. If you need help with that, let us know.

I added a test in the test_initialization.py on commit 871ed7d . Let me know if this makes sense for what you envisioned the test to be like.

I think here is not the right place to control the scaling, as this leads to spreading the initialization of the LoRA parameters into different parts of the code. Instead, update_layer, update_layer_conv2d, and update_layer_embedding in tuners/lora/layer.py should be adjusted, since that's where we set the scale initially. This also requires updating the __init__ method to accept the new argument, as well as the kwargs variable here.

Makes sense, I had wondered about doing it this way also and will do this now. Not sure exactly what class' __init__ method you mean, but I added the argument use_rslora to all the update_* methods, and then feed it through the kwargs in _create_new_module so that every sublcass of LoraLayer calling update_layer has the use_rslora argument to pass along. Essentially treating use_rslora in the code exactly analogous to init_lora_weights. Added this now to commit a2c2f1a.
Thanks.

…n the update_layer type methods, as suggested

BenjaminBossan

Thanks for making the adjustments. From my point of view, there are only a few small changes needed and then the PR should be good to be merged.

I am not sure if it belongs in the description of the initializations since I didn't want to confuse a person unfamiliar with LoRA that the scaling factor is only applied at initialization, as instead it scales every adapter output.

IMO, we can still consider this to be initialization, as in almost all use cases, scaling is set once, at initialization, and then remains untouched.

I added a test in the test_initialization.py

Nicely done.

Not sure exactly what class' init method you mean, but I added the argument use_rslora to all the update_* methods, and then feed it through the kwargs in _create_new_module so that every sublcass of LoraLayer calling update_layer has the use_rslora argument to pass along. Essentially treating use_rslora in the code exactly analogous to init_lora_weights.

Yes, that was exactly what I meant, thanks!

tests/test_initialization.py

src/peft/tuners/lora/config.py

…in the conceptual guide to the initialization section

BenjaminBossan

Thanks, looks very good, nice tests.

pacman100

Thank you @Damjan-Kalajdzievski for adding the support for rank-stabilized LoRA scaling, LGTM! 🚀

Damjan-Kalajdzievski added 2 commits December 8, 2023 16:21

Added use_rslora bool in LoraConfig, which when set to true, corrects…

dd53ee9

… the scaling factor of adapters created with _create_and_replace in LoraModel.

changed helptext and documentation wording for use_rslora in LoraConfig

b55de64

make style formatted on files Ive changed

44e6eab

BenjaminBossan requested changes Dec 13, 2023

View reviewed changes

Damjan-Kalajdzievski added 2 commits December 13, 2023 22:04

Changed the docstring and helptext for use_rslora as suggested, and m…

5b141ae

…odified peft/docs/source/conceptual_guides/lora.md to be consistent with the new LoraConfig and describe the use_rslora concept as suggested.

Added test test_lora_use_rslora for use_rslora argument in LoraConfig…

871ed7d

…, at peft/tests/test_initialization.py

refactored use_rslora so that setting scaling factor is always done i…

a2c2f1a

…n the update_layer type methods, as suggested

Damjan-Kalajdzievski requested a review from BenjaminBossan December 14, 2023 00:51

BenjaminBossan requested changes Dec 14, 2023

View reviewed changes

tests/test_initialization.py Show resolved Hide resolved

src/peft/tuners/lora/config.py Show resolved Hide resolved

Added rank and alpha pattern tests, and moved use_rslora description …

89ad985

…in the conceptual guide to the initialization section

Damjan-Kalajdzievski requested a review from BenjaminBossan December 14, 2023 19:06

BenjaminBossan approved these changes Dec 15, 2023

View reviewed changes

pacman100 approved these changes Dec 15, 2023

View reviewed changes

BenjaminBossan merged commit 997e6ec into huggingface:main Dec 15, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added the option to use the corrected scaling factor for LoRA, based on new research. #1244

Added the option to use the corrected scaling factor for LoRA, based on new research. #1244

Damjan-Kalajdzievski commented Dec 9, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

Damjan-Kalajdzievski commented Dec 11, 2023 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan Dec 13, 2023

BenjaminBossan Dec 13, 2023

Damjan-Kalajdzievski Dec 13, 2023

BenjaminBossan Dec 13, 2023

BenjaminBossan Dec 13, 2023

Damjan-Kalajdzievski commented Dec 13, 2023 •

edited

Loading

BenjaminBossan left a comment

BenjaminBossan left a comment

pacman100 left a comment

Added the option to use the corrected scaling factor for LoRA, based on new research. #1244

Added the option to use the corrected scaling factor for LoRA, based on new research. #1244

Conversation

Damjan-Kalajdzievski commented Dec 9, 2023 • edited Loading

Summary of method

Description of changes

HuggingFaceDocBuilderDev commented Dec 11, 2023

BenjaminBossan commented Dec 11, 2023

Damjan-Kalajdzievski commented Dec 11, 2023 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan Dec 13, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 13, 2023

Choose a reason for hiding this comment

Damjan-Kalajdzievski Dec 13, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 13, 2023

Choose a reason for hiding this comment

BenjaminBossan Dec 13, 2023

Choose a reason for hiding this comment

Damjan-Kalajdzievski commented Dec 13, 2023 • edited Loading

BenjaminBossan left a comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

Damjan-Kalajdzievski commented Dec 9, 2023 •

edited

Loading

Damjan-Kalajdzievski commented Dec 11, 2023 •

edited

Loading

Damjan-Kalajdzievski commented Dec 13, 2023 •

edited

Loading