Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Add Mixtral] Adds support for the Mixtral MoE #27942

Merged
merged 111 commits into from
Dec 11, 2023
Merged

[Add Mixtral] Adds support for the Mixtral MoE #27942

merged 111 commits into from
Dec 11, 2023

Conversation

ArthurZucker
Copy link
Collaborator

What does this PR do?

Adds the latest MoE model from mistral AI

@ArthurZucker ArthurZucker marked this pull request as ready for review December 11, 2023 09:51
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Thanks for the integration @ArthurZucker @younesbelkada, merge once you're ready and CI is green.

A few documentation things that we can improve on, but let's do that after it has landed in the lib.

docs/source/en/model_doc/mixtral.md Outdated Show resolved Hide resolved
Tips:


- The model needs to be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not relevant once the weights are pushed to the Hub

src/transformers/models/mixtral/modeling_mixtral.py Outdated Show resolved Hide resolved
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small nit on the conversion script!

@LysandreJik LysandreJik merged commit accccdd into main Dec 11, 2023
22 checks passed
@LysandreJik LysandreJik deleted the add-mixtral branch December 11, 2023 11:50
@narvind2003
Copy link

Are there any other aux losses apart from the LM loss?

@ArthurZucker
Copy link
Collaborator Author

The auxiliary loss can be computed with output_router_logits = True automatically, other losses like z_loss can be imported from Switch transformers modeling code! Custom losses should be able to use the router_logits returned by the model

@@ -1246,7 +1241,7 @@ def forward(

aux_loss = None
if output_router_logits:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting output_router_logits = True should automatically add the aux_loss

iantbutler01 pushed a commit to BismuthCloud/transformers that referenced this pull request Dec 16, 2023
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
@xkszltl xkszltl mentioned this pull request Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants