Add GPT OSS model from OpenAI #39923

ArthurZucker · 2025-08-05T16:01:44Z

ADD THE MODEL!!!!!

Fix attention

Tensor parallel training: Vocab Parallel embedding

…del-addition-openai into add-fast-flash-kernel

Add flex attention support

…ai into add-oai

bring chat template back up to speed.

Update modelling to work with new checkpoints, exposes output_router_logits

* fix pad/eos/bos * base model maybe one day

Update telemetry for mxfp4

Add additional integration tests

…penai into add-oai

LysandreJik

LGTM!

github-actions · 2025-08-05T16:02:49Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, gpt_oss, granitemoe, granitemoehybrid, granitemoeshared, jamba

leonardtang · 2025-08-05T16:50:00Z

LGTM!

LFG

* fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (huggingface#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(*args, **kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` * mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (huggingface#39840)" This reverts commit 6dfd561. * update * Revert "remove dtensors, not explicit (huggingface#39840)" This reverts commit 6dfd561. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (huggingface#51) * Add oai fix fast tests (huggingface#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>

qgallouedec and others added 30 commits July 10, 2025 03:59

fix

546efee

nice

7b077ae

where i am at

ebcae9a

Bro this works

528b3c8

Merge pull request #16 from huggingface/fix-attention

9c61a8c

Fix attention

Update src/transformers/integrations/tensor_parallel.py

297e47e

Merge pull request #11 from huggingface/tp_embed_parallel

2f852e2

Tensor parallel training: Vocab Parallel embedding

cleanups

3d25cf7

Merge branch 'add-oai' into add-fast-flash-kernel

ff0544b

yups that was breaking

29454d2

Merge branch 'add-fast-flash-kernel' of github.com:huggingface/new-mo…

b3582fc

…del-addition-openai into add-fast-flash-kernel

Merge pull request #15 from huggingface/add-fast-flash-kernel

f33a74d

Add flex attention support

Merge branch 'main' of github.com:huggingface/new-model-addition-open…

1f3ae2b

…ai into add-oai

Update src/transformers/models/openai_moe/modeling_openai_moe.py

15c85e0

merge

0c7379a

gather on experts and not mlp

ad0fc38

add changes for latest convert branch

4fb7345

adds options to get output_router_logits from config

968238c

bring chat temlate + special tokens back into the script.

4bc5557

Merge pull request #22 from huggingface/vb/special-tok

68fd833

bring chat template back up to speed.

Merge pull request #21 from huggingface/ed-fix-modeling

410435a

Update modelling to work with new checkpoints, exposes output_router_logits

initial commmit

07bd34d

update

b7987d2

working with shards

2c0fd4d

add model.safetensors.index.json

1d03f3a

fix

40e379d

fix

b68aa6b

mxfp4 flag

a87db4f

rm print

c3c01f0

Fix PAD/EOS/BOS (#18)

863630d

* fix pad/eos/bos * base model maybe one day

SunMarc and others added 13 commits August 5, 2025 12:14

Merge pull request #56 from huggingface/better-stats

768b582

Update telemetry for mxfp4

Revert fixup

d881a20

Merge pull request #62 from huggingface/add-new-integration-tests

0c7db23

Add additional integration tests

skip 2 test remove todo

6698004

Merge branch 'add-oai' of github.com:huggingface/new-model-addition-o…

208b83c

…penai into add-oai

merge

54cf55f

padding side should be left for integration tests

f19e04b

fix modular wrt to changes made to modeling

1f7cad0

style

6973ba4

Merge branch 'main' of github.com:huggingface/transformers into add-oai

9ab5897

isort

1f47841

fix opies for the loss

865b368

mmmm

75f13d0

LysandreJik approved these changes Aug 5, 2025

View reviewed changes

ArthurZucker merged commit 7c38d8f into main Aug 5, 2025
8 of 24 checks passed

ArthurZucker deleted the add-oai branch August 5, 2025 16:02

ArthurZucker added New model Model Parallel Model Parallelilsm Implementations Mixture of Experts Flash Attention labels Aug 5, 2025

gante mentioned this pull request Aug 5, 2025

[CI] post-GptOss fixes for green CI #39929

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPT OSS model from OpenAI #39923

Add GPT OSS model from OpenAI #39923

Uh oh!

ArthurZucker commented Aug 5, 2025 •

edited

Loading

Uh oh!

LysandreJik left a comment

Uh oh!

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

leonardtang commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

Add GPT OSS model from OpenAI #39923

Add GPT OSS model from OpenAI #39923

Uh oh!

Conversation

ArthurZucker commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

leonardtang commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

16 participants

ArthurZucker commented Aug 5, 2025 •

edited

Loading