[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. #24538

bnellnm · 2025-09-09T22:21:19Z

Purpose

Fix shared expert overlap with naive all2all backend. Fixes #24530. We were running the shared experts with the dispatched hidden states instead of the original hidden states.

Test Plan

Ran deepseek with VLLM_ALL2ALL_BACKEND=naive

Test Result

works

cc @yewentao256 , @tlrmchlsmth , @simon-mo

Signed-off-by: Bill Nell <bnell@redhat.com>

gemini-code-assist

Code Review

This pull request addresses a bug concerning shared expert overlap when using the naive all-to-all backend. The core of the fix involves reordering the dispatch operation to execute after the shared expert computation, which correctly ensures that shared experts operate on the original hidden_states. Additionally, a do_combine flag has been introduced to the reduce_output function, allowing the combine operation to be skipped for shared expert outputs, which is appropriate since they are not part of the dispatch-combine communication flow. The changes are logical, well-implemented, and include a new assertion for improved robustness.

yewentao256

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.96|±  |0.0197|
|     |       |strict-match    |     5|exact_match|↑  | 0.96|±  |0.0197|

Verified that this fixes the issue, thanks for the work!

…m-project#24538)

…m-project#24538) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap.

69c4ecf

Signed-off-by: Bill Nell <bnell@redhat.com>

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

yewentao256 approved these changes Sep 9, 2025

View reviewed changes

tlrmchlsmth approved these changes Sep 9, 2025

View reviewed changes

yewentao256 enabled auto-merge (squash) September 9, 2025 23:55

simon-mo merged commit b23fb78 into vllm-project:main Sep 10, 2025
51 of 53 checks passed

yewentao256 deleted the fix-24530 branch September 10, 2025 14:26

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (vll…

01fd594

…m-project#24538)

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (vll…

57777e2

…m-project#24538)

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (vll…

1224c5e

…m-project#24538) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. (vll…

f68f581

…m-project#24538) Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. #24538

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. #24538

Uh oh!

bnellnm commented Sep 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. #24538

[Bugfix] Fix for 24530. Fix naive all2all shared expert overlap. #24538

Uh oh!

Conversation

bnellnm commented Sep 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bnellnm commented Sep 9, 2025 •

edited by github-actions bot

Loading