Skip to content

Conversation

marksverdhei
Copy link

What does this PR do?

This PR replaces hardcoded values num_local_experts and hidden_size in MXFP4Config for GPT-OSS type models.

I discovered this when experimenting with non-standard configs of GPT-OSS architecture but i'm pretty sure it'll break for openai/gpt-oss-120b as well since it's number of experts is different from the hardcoded value.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts.

Changes:

  • Read num_local_experts and hidden_size from model.config
  • Use dynamic values in reshape operations instead of hardcoded constants
  • Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.
Passed all tests that I was able to run locally on 24gb of vram.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - no
  • Did you read the contributor guideline,
    Pull Request section? - yes
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case. - I looked and didn't find an issue
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings. - likely not necessary
  • Did you write any new necessary tests? - no, unsure if needed

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

marksverdhei and others added 2 commits October 22, 2025 14:59
The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape
operations. This caused failures when quantizing models with different
numbers of experts (e.g., averaged single-expert models).

Changes:
- Read num_local_experts and hidden_size from model.config
- Use dynamic values in reshape operations instead of hardcoded constants
- Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@Rocketknight1
Copy link
Member

cc @MekkCyber for quantization

@MekkCyber
Copy link
Contributor

run-slow: mxfp4

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good Thanks!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

@MekkCyber
Copy link
Contributor

run-slow: mxfp4

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

@marksverdhei
Copy link
Author

run-slow: mxfp4

Regarding the tests: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
To me it looks like it failed because of issues with the CI infra. I wasn't able to see any logs in the gh actions logs.
Otherwise, I'm curious where I'm supposed to find the result of the actual pytest run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants