Skip to content

Conversation

@varun-sundar-rabindranath
Copy link
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath commented Oct 24, 2025

Purpose

Guard MxFP4 implementation selection in the case of DP

Machine : H100
command : VLLM_USE_FLASHINFER_MOE_MXFP4_BF16=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-20b --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching --port 9010

Error on main:

EngineCore_DP0 pid=1017522)     assert quant_config.use_mxfp4_w4a16, "Supports only mxfp4_w4a16"
(EngineCore_DP0 pid=1017522) AssertionError: Supports only mxfp4_w4a16

Here, despite explicitly selecting SM90_FI_MXFP4_BF16 backend via VLLM_USE_FLASHINFER_MOE_MXFP4_BF16, we choose the OAITritonExperts (TRITON backend) for DP and we fail somewhere deep in the codebase.

Error on PR:

ERROR 10-24 17:09:17 [core.py:779] NotImplementedError: Incompatible Mxfp4 backend for EP batched experts format

Much more straightforward error.

Test Plan

VLLM_ALL2ALL_BACKEND="deepep_high_throughput"  vllm serve openai/gpt-oss-20b  --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

and

VLLM_MXFP4_USE_MARLIN=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput"  vllm serve openai/gpt-oss-20b  --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching

Test Result

The commands work as expected. on H100, The first defaults to using Triton backend and the second uses the Marlin backend.

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
@varun-sundar-rabindranath
Copy link
Contributor Author

cc @zyongye @mgoin PTAL! Thanks 🙌

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a guard to prevent incorrect MxFP4 implementation selection in data-parallel setups. The change adds an explicit error to be raised when an incompatible MxFP4 backend is selected for EP batched experts format, improving error handling and providing more informative messages. The review focuses on ensuring the correctness and clarity of the added error handling.

Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find!

@mgoin mgoin enabled auto-merge (squash) October 24, 2025 18:08
@mgoin mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed labels Oct 24, 2025
@mgoin mgoin merged commit 269c4db into vllm-project:main Oct 24, 2025
55 of 56 checks passed
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
rohin-garg pushed a commit to rohin-garg/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants