-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Negative weights on add_weighted_adapter #1907
Comments
For more context on why the weights are scaled like this, please check this discussion: #1155. This should address questions 1-3.
We have not experimented with task forgetting and I'm not sure if merging with a negative weight is a possible solution. If you could explain further, what the base model is, what you want it to forget, and what the learned adapters were trained on, I might be able to make some suggestions. |
Hello @BenjaminBossan, Thank you for your quick response. I think I understood the motivation of the current implementation. However, regarding question 4, sorry, I am afraid that I cannot disclose any details of it. But, the idea behind it is, say, for example, I pre-trained LLaMA w/ LoRA on a multi-task (task a and task b) dataset. I also have a single-task dataset (task a), and now I wanna perform task forgetting via this: task_b = pre_trained + LoRA_1 (task a and b) - LoRA_2 (task a) I don't know if there are any suggestions for this functionality. Thanks! |
Thanks for explaining the general idea. If your two LoRA adapters target the same layers and have the same rank, what you could try is to load the state dict of LoRA_1 and subtract the state dict of LoRA_2 manually, then load the new LoRA weights onto the model. This should probably have the effect that you wanted to achieve with the negative weights. |
thanks! it turns out it works to some degree... But, it might not be an ideal way to perform such a task. |
Great, then I'll wish you luck with your further experiments. If you figure out a way to make forgetting tasks learned by LoRA work well, feel free to share it, maybe we can integrate that in PEFT. For now, I'll close the issue, please re-open if something new comes up. |
System Info
python=3.8
peft=0.11.1
Who can help?
@BenjaminBossan @sayakpaul
Information
Tasks
examples
folderReproduction
I would like to perform task vector on LoRA adapters in the following fashion: task = pre_trained + LoRA_1 - LoRA_2
here is the following code that I used
but, I got this error message: ValueError: math domain error
I believe it is caused by
peft/src/peft/tuners/lora/model.py
Lines 791 to 794 in 09358aa
Expected behavior
After reading the code, I have the following questions,
merge_and_unload
when there is only one LoRA adapter. It seems likemerge_and_unload
does not multiplymath.sqrt(1* target.scaling[adapter])
. It only multiplies the scaling:alpha / rank
The merged weight = original weight + BA * scaling
peft/src/peft/tuners/lora/layer.py
Line 455 in 09358aa
Thanks!
The text was updated successfully, but these errors were encountered: