-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Add Mixtral
] Adds support for the Mixtral MoE
#27942
Conversation
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
…ace/new-model-addition into add-mixtral-alternative
src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Thanks for the integration @ArthurZucker @younesbelkada, merge once you're ready and CI is green.
A few documentation things that we can improve on, but let's do that after it has landed in the lib.
Tips: | ||
|
||
|
||
- The model needs to be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not relevant once the weights are pushed to the Hub
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Lysandre Debut <hi@lysand.re>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nit on the conversion script!
src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
Outdated
Show resolved
Hide resolved
Are there any other aux losses apart from the LM loss? |
The auxiliary loss can be computed with |
@@ -1246,7 +1241,7 @@ def forward( | |||
|
|||
aux_loss = None | |||
if output_router_logits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting output_router_logits = True should automatically add the aux_loss
* up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
* up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>
What does this PR do?
Adds the latest MoE model from mistral AI