Skip to content

Conversation

@SageMoore
Copy link
Contributor

@SageMoore SageMoore commented Sep 25, 2025

Purpose

#23991 appears to break Deepseek V2 Lite, and presumably other models that use shared experts. This PR should fix this issue.

Test Results

Deepseek V2 Lite

VLLM_ALL2ALL_BACKEND=deepep_low_latency vllm serve --model="deepseek-ai/DeepSeek-V2-Lite" --data-parallel-size 2 --enable-expert-parallel --gpu-memory-utilization 0.75 -compilation_config '{"cudagraph_mode": "full_decode_only"}'

Before

(EngineCore_DP0 pid=3298771)     process_chunk(chunk_start,
(EngineCore_DP0 pid=3298771)   File "/home/sagemoore/git/nm-vllm/vllm/model_executor/layers/fused_moe/layer.py", line 1943, in process_chunk
(EngineCore_DP0 pid=3298771)     chunk_start:chunk_end, :].copy_(final_hidden_states[1],
(EngineCore_DP0 pid=3298771) IndexError: index 1 is out of bounds for dimension 0 with size 1

After

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3733|±  |0.0280|
|     |       |strict-match    |     5|exact_match|↑  |0.3700|±  |0.0279|

Longcat Flash

Signed-off-by: Sage Moore <sage@neuralmagic.com>
@SageMoore SageMoore requested a review from mgoin as a code owner September 25, 2025 18:01
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with shared experts and zero experts in FusedMoE.process_chunk. The changes correctly separate the logic for handling outputs from zero experts and shared experts, which was previously conflated, leading to incorrect behavior. Specifically, the logic now correctly handles the tuple returned when using shared experts. However, a new assertion assert len(final_hidden_states) == 1 has been introduced which is incorrect and will likely lead to runtime assertion errors. I have provided a critical comment with a suggested fix for this issue.

Signed-off-by: Sage Moore <sage@neuralmagic.com>
@SageMoore
Copy link
Contributor Author

CC @OftenDream. I'm trying to get setup on a server large enough to run meituan-longcat/LongCat-Flash-Chat-FP8 but can you take a look at this fix?

@tlrmchlsmth tlrmchlsmth added this to the v0.11.0 milestone Sep 25, 2025
@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 25, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) September 25, 2025 22:31
@vllm-bot vllm-bot merged commit dfb9af2 into vllm-project:main Sep 26, 2025
41 of 44 checks passed
@SageMoore SageMoore deleted the sage/fix-zero-experts branch September 26, 2025 13:54
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…vllm-project#25698)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants