TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792

BenjaminBossan · 2024-05-22T13:57:35Z

These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.

HuggingFaceDocBuilderDev · 2024-05-22T14:02:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Thanks a lot for adding these regression tests !

This PR moves all the DoRA functionality into a separate module class. Essentially, this is necessary because otherwise, the DoRA parameter lives on the lora.Linear layer as a parameter, not a separate module. Since FSDP auto wrap policy operates on the level of modules, not parameters, there is no way to modify the auto wrap policy to wrap the DoRA parameter, it must be its own module. If not for this reason, #1797 would be preferable, since the number of code changes is smaller overall. In this PR, there are more numerous changes, but the majority only involves moving code around, not any actual code changes. Since we introduce a new submodule, an extra steps are required to ensure that old DoRA state dicts can still be loaded correctly. This involves a fairly trivial extra remapping step in set_peft_model_state_dict. The test for this is performed via the new regression DoRA tests introduced in #1792. Similarly, there is a remapping step involved in get_peft_model_state_dict to ensure that when new state dicts with DoRA are saved, they still conform to the old format. An additional required change was to make a defensive copy of the base layer before dequantizing its weight in order to calculate the weight norm for DoRA. Without this defensive copy, some side-effect is triggered in FSDP that results in > ValueError: Cannot flatten integer dtype tensors even though the compute dtype of bnb is correctly set to float. Creating a fully functioning deepcopy does currently not work with 8bit BNB but there is a fix. Once the next BNB release is out, 8bit BNB will be tested and enabled. While working on this, I also noticed a small bug that dropout was not correctly applied when using QDoRA. This is now also fixed. This PR was tested successfully with FSDP and (Q)DoRA using the scripts in examples/sft/ with a modification to enable DoRA.

These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.

This PR moves all the DoRA functionality into a separate module class. Essentially, this is necessary because otherwise, the DoRA parameter lives on the lora.Linear layer as a parameter, not a separate module. Since FSDP auto wrap policy operates on the level of modules, not parameters, there is no way to modify the auto wrap policy to wrap the DoRA parameter, it must be its own module. If not for this reason, huggingface#1797 would be preferable, since the number of code changes is smaller overall. In this PR, there are more numerous changes, but the majority only involves moving code around, not any actual code changes. Since we introduce a new submodule, an extra steps are required to ensure that old DoRA state dicts can still be loaded correctly. This involves a fairly trivial extra remapping step in set_peft_model_state_dict. The test for this is performed via the new regression DoRA tests introduced in huggingface#1792. Similarly, there is a remapping step involved in get_peft_model_state_dict to ensure that when new state dicts with DoRA are saved, they still conform to the old format. An additional required change was to make a defensive copy of the base layer before dequantizing its weight in order to calculate the weight norm for DoRA. Without this defensive copy, some side-effect is triggered in FSDP that results in > ValueError: Cannot flatten integer dtype tensors even though the compute dtype of bnb is correctly set to float. Creating a fully functioning deepcopy does currently not work with 8bit BNB but there is a fix. Once the next BNB release is out, 8bit BNB will be tested and enabled. While working on this, I also noticed a small bug that dropout was not correctly applied when using QDoRA. This is now also fixed. This PR was tested successfully with FSDP and (Q)DoRA using the scripts in examples/sft/ with a modification to enable DoRA.

TST Add regression test for DoRA, VeRA, BOFT, LNT

2b8c1d8

These new methods were added but the regression tests were not extended yet. This PR adds regression tests for these methods. The regression artifacts have been pushed based on PEFT v0.11.1. The new tests pass locally.

BenjaminBossan requested a review from younesbelkada May 22, 2024 16:11

younesbelkada approved these changes May 27, 2024

View reviewed changes

BenjaminBossan merged commit 39c60ff into huggingface:main May 27, 2024

BenjaminBossan deleted the tst-regression-tests-dora-vera-boft-ln_tuning branch May 27, 2024 10:00

BenjaminBossan mentioned this pull request May 28, 2024

Refactor to make DoRA and QDoRA work with FSDP #1806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792

TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792

BenjaminBossan commented May 22, 2024

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2024

Uh oh!

younesbelkada left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792

TST Add regression test for DoRA, VeRA, BOFT, LN Tuning #1792

Conversation

BenjaminBossan commented May 22, 2024

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2024

Uh oh!

younesbelkada left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants