Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaLora + bnb not working #1113

Closed
4 tasks
BenjaminBossan opened this issue Nov 10, 2023 · 0 comments · Fixed by #1146
Closed
4 tasks

AdaLora + bnb not working #1113

BenjaminBossan opened this issue Nov 10, 2023 · 0 comments · Fixed by #1146

Comments

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Nov 10, 2023

System Info

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

The issue is this line:

compute_dtype = lora_A.weight.dtype

In AdaLoRA, lora_A and lora_B are not ModuleDicts but ParameterDicts, so lora_A[adapter_name].weight.dtype does not exist, it should just be lora_A[adapter_name].dtype.

Furthermore, using AdaLoRA with 8bit bnb gives NaNs for me for opt-125m.

Expected behavior

AdaLoRA + bnb should work.

@BenjaminBossan BenjaminBossan changed the title AdaLora + 4bit bnb not working AdaLora + bnb not working Nov 10, 2023
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this issue Nov 17, 2023
This PR fixes a handful of issues with AdaLora, should resolve huggingface#1113.

Description

1. lora_A.weight.device was called but for AdaLora, lora_A is a
   nn.Paramter, not an nn.Module, so the weight attribute does not
   exist. lora_A.device is sufficient.
2. For 8bit, an inplace operation failed because it was on a view. Now
   the operation is no longer inplace.
3. The loss term of the model output is not necessarily a torch tensor.
   In the test, it was a dict and did not contain an actual loss.
   Therefore, I added a check to make sure the loss is a torch tensor.
   Is there a better way?

Notes

Running pytest tests/test_gpu_examples.py -k adalora locally (with GPU)
passes. Ideally, someone else can confirm, as normal unit tests won't
catch this.

If this is merged before huggingface#1115, skipping AdaLora tests in that PR can be
removed.
BenjaminBossan added a commit that referenced this issue Nov 17, 2023
This PR fixes a handful of issues with AdaLora, should resolve #1113.

Description

1. lora_A.weight.device was called but for AdaLora, lora_A is a
   nn.Paramter, not an nn.Module, so the weight attribute does not
   exist. lora_A.device is sufficient.
2. For 8bit, an inplace operation failed because it was on a view. Now
   the operation is no longer inplace.
3. The loss term of the model output is not necessarily a torch tensor.
   In the test, it was a dict and did not contain an actual loss.
   Therefore, I added a check to make sure the loss is a torch tensor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant