[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

SageMoore · 2025-09-25T18:01:44Z

Purpose

#23991 appears to break Deepseek V2 Lite, and presumably other models that use shared experts. This PR should fix this issue.

Test Results

Deepseek V2 Lite

VLLM_ALL2ALL_BACKEND=deepep_low_latency vllm serve --model="deepseek-ai/DeepSeek-V2-Lite" --data-parallel-size 2 --enable-expert-parallel --gpu-memory-utilization 0.75 -compilation_config '{"cudagraph_mode": "full_decode_only"}'

Before

(EngineCore_DP0 pid=3298771)     process_chunk(chunk_start,
(EngineCore_DP0 pid=3298771)   File "/home/sagemoore/git/nm-vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1943, in process_chunk
(EngineCore_DP0 pid=3298771)     chunk_start:chunk_end, :].copy_(final_hidden_states[1],
(EngineCore_DP0 pid=3298771) IndexError: index 1 is out of bounds for dimension 0 with size 1

After

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3733|±  |0.0280|
|     |       |strict-match    |     5|exact_match|↑  |0.3700|±  |0.0279|

Longcat Flash

Signed-off-by: Sage Moore <sage@neuralmagic.com>

gemini-code-assist

Code Review

This pull request aims to fix an issue with shared experts and zero experts in FusedMoE.process_chunk. The changes correctly separate the logic for handling outputs from zero experts and shared experts, which was previously conflated, leading to incorrect behavior. Specifically, the logic now correctly handles the tuple returned when using shared experts. However, a new assertion assert len(final_hidden_states) == 1 has been introduced which is incorrect and will likely lead to runtime assertion errors. I have provided a critical comment with a suggested fix for this issue.

vllm/model_executor/layers/fused_moe/layer.py

Signed-off-by: Sage Moore <sage@neuralmagic.com>

SageMoore · 2025-09-25T19:29:55Z

CC @OftenDream. I'm trying to get setup on a server large enough to run meituan-longcat/LongCat-Flash-Chat-FP8 but can you take a look at this fix?

…vllm-project#25698) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

…#25698) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…vllm-project#25698) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…vllm-project#25698) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

…vllm-project#25698) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

init

ff1343a

Signed-off-by: Sage Moore <sage@neuralmagic.com>

SageMoore requested a review from mgoin as a code owner September 25, 2025 18:01

gemini-code-assist bot reviewed Sep 25, 2025

View reviewed changes

vllm/model_executor/layers/fused_moe/layer.py Outdated Show resolved Hide resolved

remove asserts

83f34fa

Signed-off-by: Sage Moore <sage@neuralmagic.com>

tlrmchlsmth approved these changes Sep 25, 2025

View reviewed changes

tlrmchlsmth added this to the v0.11.0 milestone Sep 25, 2025

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 25, 2025

tlrmchlsmth enabled auto-merge (squash) September 25, 2025 22:31

Merge branch 'main' into sage/fix-zero-experts

3d56f1f

vllm-bot merged commit dfb9af2 into vllm-project:main Sep 26, 2025
41 of 44 checks passed

SageMoore deleted the sage/fix-zero-experts branch September 26, 2025 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

Uh oh!

SageMoore commented Sep 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

SageMoore commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk #25698

Uh oh!

Conversation

SageMoore commented Sep 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Results

Deepseek V2 Lite

Before

After

Longcat Flash

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

SageMoore commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SageMoore commented Sep 25, 2025 •

edited by github-actions bot

Loading