-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Misc][DP] Guard mxfp4 implementation selection #27484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a guard to prevent incorrect MxFP4 implementation selection in data-parallel setups. The change adds an explicit error to be raised when an incompatible MxFP4 backend is selected for EP batched experts format, improving error handling and providing more informative messages. The review focuses on ensuring the correctness and clarity of the added error handling.
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find!
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
Guard MxFP4 implementation selection in the case of DP
Machine : H100
command :
VLLM_USE_FLASHINFER_MOE_MXFP4_BF16=1 VLLM_ALL2ALL_BACKEND="deepep_high_throughput" vllm serve openai/gpt-oss-20b --data-parallel-size 2 --enable-expert-parallel --no-enable-prefix-caching --port 9010Error on
main:Here, despite explicitly selecting
SM90_FI_MXFP4_BF16backend via VLLM_USE_FLASHINFER_MOE_MXFP4_BF16, we choose the OAITritonExperts (TRITONbackend) for DP and we fail somewhere deep in the codebase.Error on
PR:Much more straightforward error.
Test Plan
and
Test Result
The commands work as expected. on H100, The first defaults to using Triton backend and the second uses the Marlin backend.