[Misc][DP] Guard mxfp4 implementation selection #27484

varun-sundar-rabindranath · 2025-10-24T17:41:45Z

Purpose

Guard MxFP4 implementation selection in the case of DP

Machine : H100
command : VLLM_USE_FLASHINFER_MOE_MXFP4_BF16=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-20b --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching --port 9010

Error on main:

EngineCore_DP0 pid=1017522)     assert quant_config.use_mxfp4_w4a16, "Supports only mxfp4_w4a16"
(EngineCore_DP0 pid=1017522) AssertionError: Supports only mxfp4_w4a16

Here, despite explicitly selecting SM90_FI_MXFP4_BF16 backend via VLLM_USE_FLASHINFER_MOE_MXFP4_BF16, we choose the OAITritonExperts (TRITON backend) for DP and we fail somewhere deep in the codebase.

Error on PR:

ERROR 10-24 17:09:17 [core.py:779] NotImplementedError: Incompatible Mxfp4 backend for EP batched experts format

Much more straightforward error.

Test Plan

VLLM_ALL2ALL_BACKEND="deepep_high_throughput"  vllm serve openai/gpt-oss-20b  --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

and

VLLM_MXFP4_USE_MARLIN=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput"  vllm serve openai/gpt-oss-20b  --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

Test Result

The commands work as expected. on H100, The first defaults to using Triton backend and the second uses the Marlin backend.

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath · 2025-10-24T17:42:42Z

cc @zyongye @mgoin PTAL! Thanks 🙌

gemini-code-assist

Code Review

This pull request introduces a guard to prevent incorrect MxFP4 implementation selection in data-parallel setups. The change adds an explicit error to be raised when an incompatible MxFP4 backend is selected for EP batched experts format, improving error handling and providing more informative messages. The review focuses on ensuring the correctness and clarity of the added error handling.

vllm/model_executor/layers/quantization/mxfp4.py

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

mgoin

Nice find!

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Guard mxfp4 dp/ep impl

6576711

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

varun-sundar-rabindranath requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners October 24, 2025 17:41

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Show resolved Hide resolved

better error message

f13275f

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

mgoin approved these changes Oct 24, 2025

View reviewed changes

mgoin enabled auto-merge (squash) October 24, 2025 18:08

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Oct 24, 2025

mgoin merged commit 269c4db into vllm-project:main Oct 24, 2025
55 of 56 checks passed

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Misc][DP] Guard mxfp4 implementation selection (vllm-project#27484)

588a026

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc][DP] Guard mxfp4 implementation selection #27484

[Misc][DP] Guard mxfp4 implementation selection #27484

Uh oh!

varun-sundar-rabindranath commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

varun-sundar-rabindranath commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Misc][DP] Guard mxfp4 implementation selection #27484

[Misc][DP] Guard mxfp4 implementation selection #27484

Uh oh!

Conversation

varun-sundar-rabindranath commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

varun-sundar-rabindranath commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

varun-sundar-rabindranath commented Oct 24, 2025 •

edited by github-actions bot

Loading