-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default IA3 target modules for Mixtral #1376
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @arnavgarg1 Thanks for the contribution!
I did not noticed your PR until now .. I made #1380 few days ago that adds mixtral in the LoRA mapping. Would you be happy to convert this PR to a PR that adds mixtral to IA3 mapping instead?
@younesbelkada Yes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arnavgarg1, as Younes mentioned, please update the PR to add target modules for Mixtral when using IA3.
@pacman100 @younesbelkada Just updated with IA3 instead! I'm also going to add a separate PR for Phi with IA3 right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
No problem! |
src/peft/utils/constants.py
Outdated
@@ -93,6 +93,7 @@ def starcoder_model_postprocess_past_key_value(past_key_values): | |||
"gpt_bigcode": ["c_attn", "mlp.c_proj"], | |||
"llama": ["k_proj", "v_proj", "down_proj"], | |||
"mistral": ["k_proj", "v_proj", "down_proj"], | |||
"mixtral": ["k_proj", "v_proj", "w1", "w2", "w3"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pacman100 Would this be k_proj
, v_proj
and w2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the ffn layer should just be w2
as per the IA3 paper.
Wanted to see what's left here as next steps @pacman100 @younesbelkada |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! wdyt @pacman100 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @arnavgarg1, thank you for adding IA3 target modules for Mixtral! Please see the comment on and post addressing that we can merge this.
src/peft/utils/constants.py
Outdated
@@ -93,6 +93,7 @@ def starcoder_model_postprocess_past_key_value(past_key_values): | |||
"gpt_bigcode": ["c_attn", "mlp.c_proj"], | |||
"llama": ["k_proj", "v_proj", "down_proj"], | |||
"mistral": ["k_proj", "v_proj", "down_proj"], | |||
"mixtral": ["k_proj", "v_proj", "w1", "w2", "w3"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the ffn layer should just be w2
as per the IA3 paper.
src/peft/utils/constants.py
Outdated
@@ -115,6 +116,7 @@ def starcoder_model_postprocess_past_key_value(past_key_values): | |||
"gpt_bigcode": ["mlp.c_proj"], | |||
"llama": ["down_proj"], | |||
"mistral": ["down_proj"], | |||
"mixtral": ["w1", "w2", "w3"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, it should only have w2
Thanks @pacman100 ! Just updated |
Is it good to merge? |
Thank you @arnavgarg1! ✨ |
Thanks! |
* Add default LoRA target modules for Mixtral * Add IA3 modules for Mixtral * Address comments
Here's the mixtral model architecture with my proposed IA3 target module mapping:
These are the number of trainable parameters: