[`Add Mixtral`] Adds support for the Mixtral MoE #27942

ArthurZucker · 2023-12-11T09:43:59Z

What does this PR do?

Adds the latest MoE model from mistral AI

…ace/new-model-addition into add-mixtral-alternative

src/transformers/models/mixtral/modeling_mixtral.py

src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

LysandreJik

Looks awesome! Thanks for the integration @ArthurZucker @younesbelkada, merge once you're ready and CI is green.

A few documentation things that we can improve on, but let's do that after it has landed in the lib.

docs/source/en/model_doc/mixtral.md

LysandreJik · 2023-12-11T10:09:53Z

docs/source/en/model_doc/mixtral.md

+Tips:
+
+
+- The model needs to be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py).


Maybe not relevant once the weights are pushed to the Hub

src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

younesbelkada

A small nit on the conversion script!

src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

narvind2003 · 2023-12-11T11:56:59Z

Are there any other aux losses apart from the LM loss?

ArthurZucker · 2023-12-11T12:25:38Z

The auxiliary loss can be computed with output_router_logits = True automatically, other losses like z_loss can be imported from Switch transformers modeling code! Custom losses should be able to use the router_logits returned by the model

narvind2003 · 2023-12-11T13:35:25Z

src/transformers/models/mixtral/modeling_mixtral.py

@@ -1246,7 +1241,7 @@ def forward(

        aux_loss = None
        if output_router_logits:


Setting output_router_logits = True should automatically add the aux_loss

* up * up * test * logits ok * up * up * few fixes * conversion script * up * nits * nits * update * nuke * more updates * nites * fix many issues * nit * scatter * nit * nuke megablocks * nits * fix conversion script * nit * remove * nits * nit * update * oupsssss * change * nits device * nits * fixup * update * merge * add copied from * fix the copy mentions * update tests * more fixes * nits * conversion script * add parts of the readme * Update tests/models/mixtral/test_modeling_mixtral.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * new test + conversion script * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Apply suggestions from code review * fix * fix copies * fix copies * ooops * fix config * Apply suggestions from code review * fix nits * nit * add copies * add batched tests * docs * fix flash attention * let's add more verbose * add correct outputs * support router ouptus * ignore copies where needed * fix * cat list if list is given for now * nits * Update docs/source/en/model_doc/mixtral.md * finish router refactoring * fix forward * fix expected values * nits * fixup * fix * fix bug * fix * fix dtype mismatch * fix * grrr grrr I support item assignment * fix CI * docs * fixup * remove some copied form * fix weird diff * skip doctest fast on the config and modeling * mark that is supports flash attention in the doc * update * Update src/transformers/models/mixtral/modeling_mixtral.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update docs/source/en/model_doc/mixtral.md Co-authored-by: Lysandre Debut <hi@lysand.re> * revert router logits config issue * update doc accordingly * Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py * nits * use torch testing asssert close * fixup * doc nits --------- Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Lysandre Debut <hi@lysand.re>

younesbelkada and others added 30 commits December 8, 2023 11:46

up

15ef543

up

3367d25

test

f9da444

logits ok

f59eacc

up

7e0968a

up

0bfcd75

few fixes

6b84e42

conversion script

2896e2f

up

92d143f

nits

d3261c1

Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…

407f8a8

…ace/new-model-addition into add-mixtral-alternative

nits

65bbd30

update

6afc8f3

Merge branch 'main' into add-mixtral-alternative

bfef811

nuke

7a54d1a

Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…

b9f3fc0

…ace/new-model-addition into add-mixtral-alternative

Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…

f8513e8

…ace/new-model-addition into add-mixtral-alternative

more updates

0d31424

nites

c8987cb

fix many issues

d82c8ee

nit

ccc6011

scatter

356d484

nit

e858c01

nuke megablocks

82037ca

nits

e66d1a9

fix conversion script

49eb7f0

Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…

ffc8463

…ace/new-model-addition into add-mixtral-alternative

Merge branch 'add-mixtral-alternative' of https://github.com/huggingf…

82e4a1b

…ace/new-model-addition into add-mixtral-alternative

nit

0b1ca52

remove

3616d3b

ArthurZucker and others added 2 commits December 11, 2023 10:45

remove some copied form

80d49aa

fix weird diff

fbde97b

ArthurZucker marked this pull request as ready for review December 11, 2023 09:51

casper-hansen reviewed Dec 11, 2023

View reviewed changes

src/transformers/models/mixtral/modeling_mixtral.py Show resolved Hide resolved

152334H reviewed Dec 11, 2023

View reviewed changes

src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py Outdated Show resolved Hide resolved

ArthurZucker added 3 commits December 11, 2023 05:16

skip doctest fast on the config and modeling

20091dc

mark that is supports flash attention in the doc

adc7113

update

c6ddca8

LysandreJik approved these changes Dec 11, 2023

View reviewed changes

ArthurZucker and others added 4 commits December 11, 2023 11:25

Update src/transformers/models/mixtral/modeling_mixtral.py

3f62433

Co-authored-by: Lysandre Debut <hi@lysand.re>

Update docs/source/en/model_doc/mixtral.md

6c6df4e

Co-authored-by: Lysandre Debut <hi@lysand.re>

revert router logits config issue

d4e826f

update doc accordingly

d17b756

younesbelkada reviewed Dec 11, 2023

View reviewed changes

src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py Outdated Show resolved Hide resolved

younesbelkada and others added 5 commits December 11, 2023 11:46

Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

e86facd

nits

2c85405

use torch testing asssert close

bb88c76

fixup

6624e9c

doc nits

c26aaa4

LysandreJik merged commit accccdd into main Dec 11, 2023
22 checks passed

LysandreJik deleted the add-mixtral branch December 11, 2023 11:50

CyberTimon mentioned this pull request Dec 11, 2023

Mixtral turboderp-org/exllamav2#223

Closed

psinger mentioned this pull request Dec 11, 2023

[FEATURE] Support Mixtral 7Bx8E on H2O LLM Studio h2oai/h2o-llmstudio#536

Closed

narvind2003 reviewed Dec 11, 2023

View reviewed changes

surak mentioned this pull request Dec 11, 2023

Mixtral-8x7b-32kseqlen lm-sys/FastChat#2804

Closed

xkszltl mentioned this pull request Jan 26, 2024

Fix typo of Block. #28727

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Add Mixtral`] Adds support for the Mixtral MoE #27942

[`Add Mixtral`] Adds support for the Mixtral MoE #27942

ArthurZucker commented Dec 11, 2023

LysandreJik left a comment

LysandreJik Dec 11, 2023

younesbelkada left a comment

narvind2003 commented Dec 11, 2023

ArthurZucker commented Dec 11, 2023

narvind2003 Dec 11, 2023

		Tips:


		- The model needs to be converted using the [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py).

		@@ -1246,7 +1241,7 @@ def forward(

		aux_loss = None
		if output_router_logits:

[Add Mixtral] Adds support for the Mixtral MoE #27942

[Add Mixtral] Adds support for the Mixtral MoE #27942

Conversation

ArthurZucker commented Dec 11, 2023

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Dec 11, 2023

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

narvind2003 commented Dec 11, 2023

ArthurZucker commented Dec 11, 2023

narvind2003 Dec 11, 2023

Choose a reason for hiding this comment

[`Add Mixtral`] Adds support for the Mixtral MoE #27942

[`Add Mixtral`] Adds support for the Mixtral MoE #27942