[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

jinzhen-lin · 2025-08-16T17:31:02Z

Fix #22881 .
The origin issue is introduced by #22017 . The gate layer is initilized with fp8 quantization, but the origin weight is bf16.

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

github-actions · 2025-08-16T17:31:10Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request addresses an FP8 accuracy issue for MoE models like Qwen3. The root cause was that layers intended to be skipped were being quantized because the configuration key modules_to_not_convert was not being checked. The fix correctly adds a fallback to check for this key if ignored_layers is not found or is empty. This change ensures compatibility with Hugging Face's quantization configuration format and resolves the accuracy problem. The implementation is correct and well-targeted.

yewentao256

Could you also add some lm-eval results to show the problem has already fixed?

simon-mo

verified locally as well. merging.

simon-mo · 2025-08-17T00:41:06Z

qwen3-30b-fp8

Before

+ curl http://localhost:8000/v1/completions -H 'Content-Type: application/json' -d '{
        "prompt": "What is the capital of France?",
        "max_tokens": 20,
        "temperature": 0
    }'
{"id":"cmpl-f570666360b0494faee6c8390c923980","object":"text_completion","created":1755391260,"model":"/mnt/localdisk/qwen3-30b-fp8","choices":[{"index":0,"text":"!!!!!!!!!!!!!!!!!!!!","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":27,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}+ echo ''

After

+ curl http://localhost:8000/v1/completions -H 'Content-Type: application/json' -d '{
        "prompt": "What is the capital of France?",
        "max_tokens": 20,
        "temperature": 0
    }'
{"id":"cmpl-9b06470dce2d4f7fb42a946bbb4d2725","object":"text_completion","created":1755391074,"model":"/mnt/localdisk/qwen3-30b-fp8","choices":[{"index":0,"text":" The capital of France is Paris. Paris has been the capital since the 3rd century and is","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":7,"total_tokens":27,"completion_tokens":20,"prompt_tokens_details":null},"kv_transfer_params":null}+ echo ''

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

optimize fp8 ignored_layers

5e4e4f4

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

jinzhen-lin requested review from mgoin, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners August 16, 2025 17:31

mergify bot added the qwen Related to Qwen models label Aug 16, 2025

jinzhen-lin mentioned this pull request Aug 16, 2025

[Bug]: Model outputs are always '!!!!!!!!!!!!!!' #22881

Closed

1 task

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

yewentao256 reviewed Aug 16, 2025

View reviewed changes

simon-mo approved these changes Aug 17, 2025

View reviewed changes

mgoin approved these changes Aug 17, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Aug 17, 2025

simon-mo merged commit a258ad8 into vllm-project:main Aug 17, 2025
18 of 27 checks passed

simon-mo mentioned this pull request Aug 17, 2025

[Bug]: FlashInfer Sampler is broken on nightly vLLM #23023

Closed

1 task

666even666 pushed a commit to 666even666/vllm that referenced this pull request Aug 18, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

a2b466b

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Yiwen Chen <yiwen66@berkeley.edu>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

7a3b649

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

1330105

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Duncan Moss <djm.moss@gmail.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

97a9e26

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

8295ab7

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Bugfix] fix qwen3 moe fp8 accuracy issue (vllm-project#23031)

52eee46

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

Uh oh!

jinzhen-lin commented Aug 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

simon-mo left a comment

Uh oh!

simon-mo commented Aug 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

[Bugfix] fix qwen3 moe fp8 accuracy issue #23031

Uh oh!

Conversation

jinzhen-lin commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jinzhen-lin commented Aug 16, 2025 •

edited by github-actions bot

Loading

simon-mo commented Aug 17, 2025 •

edited

Loading