-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Task Arithmetics #698
Add support for Task Arithmetics #698
Conversation
…statements in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very good overall!
Looked over everything except for the notebook and left some comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I just have some small questions
docs/adapter_composition.md
Outdated
\Phi_{merged} = \sum_{i=0}^{N} \lambda_i \Phi_i | ||
$$ | ||
|
||
2. `combine_strategy = "lora_linear_only_negate_b"` Following [Zhang et al. (2023)](https://proceedings.neurips.cc/paper_files/paper/2023/hash/299a08ee712d4752c890938da99a77c6-Abstract-Conference.html), this method only uses negative weights for the B-matrix if the weight is negative: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only
in the name is redundant. I would remove it to make it shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is best to keep it. Because if we simply call it lora_linear_negate_b
, it sounds like the B matrix is always negated. But this method means that when the weights are negative, then we only negate the B matrix and not the A matrix.
Co-authored-by: calpt <calpt@mail.de>
Co-authored-by: calpt <calpt@mail.de>
- move adapter merging to own docs page - move `average_head` method from `ModelAdaptersMixin` to `ModelWithHeadsAdaptersMixin` - In lora.py: Move SVD computation in helper function cause it was a bit to lengthy in the `average_adapter` function - test_lora.py: split test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good to me!
This PR adds support for various task arithmetic options for LoRA. Until now, our library supported averaging only by linearly combining different adapters. This may be insufficient, especially for LoRA — hence, several publications have proposed other ways to perform task arithmetic.
This PR: