Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

marksverdhei · 2025-10-22T16:29:55Z

What does this PR do?

This PR replaces hardcoded values num_local_experts and hidden_size in MXFP4Config for GPT-OSS type models.

I discovered this when experimenting with non-standard configs of GPT-OSS architecture but i'm pretty sure it'll break for openai/gpt-oss-120b as well since it's number of experts is different from the hardcoded value.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts.

Changes:

Read num_local_experts and hidden_size from model.config
Use dynamic values in reshape operations instead of hardcoded constants
Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.
Passed all tests that I was able to run locally on 24gb of vram.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - no
Did you read the contributor guideline,
Pull Request section? - yes
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. - I looked and didn't find an issue
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - likely not necessary
Did you write any new necessary tests? - no, unsure if needed

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts (e.g., averaged single-expert models). Changes: - Read num_local_experts and hidden_size from model.config - Use dynamic values in reshape operations instead of hardcoded constants - Defaults to 32 and 2880 for backward compatibility This enables quantizing averaged/merged MoE models with fewer experts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Rocketknight1 · 2025-10-23T12:20:46Z

cc @MekkCyber for quantization

MekkCyber · 2025-10-23T13:15:15Z

run-slow: mxfp4

github-actions · 2025-10-23T13:16:43Z

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

MekkCyber

Sounds good Thanks!

github-actions · 2025-10-23T13:27:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

MekkCyber · 2025-10-23T14:15:27Z

run-slow: mxfp4

github-actions · 2025-10-23T14:16:55Z

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

marksverdhei · 2025-10-23T15:00:12Z

run-slow: mxfp4

Regarding the tests: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
To me it looks like it failed because of issues with the CI infra. I wasn't able to see any logs in the gh actions logs.
Otherwise, I'm curious where I'm supposed to find the result of the actual pytest run

marksverdhei and others added 2 commits October 22, 2025 14:59

Merge branch 'main' into fix-mxfp4-hardcodes

efc0a0f

MekkCyber approved these changes Oct 23, 2025

View reviewed changes

Merge branch 'main' into fix-mxfp4-hardcodes

a780089

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

marksverdhei commented Oct 22, 2025

Uh oh!

Rocketknight1 commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

marksverdhei commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

Are you sure you want to change the base?

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

Conversation

marksverdhei commented Oct 22, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

marksverdhei commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants