[Bugfix] Fix the Eagle3 inference failure issue #4559

sunchendd · 2025-11-29T03:11:19Z

What this PR does / why we need it?

Fix the Eagle3 inference failure issue.
error message: "EngineCore encountered an issue. See stack trace (above) for the root cause."

Fixes #4323

How was this patch tested?

vllm serve /nfs/1_AscendPackage/05_weights_public/Qwen3-32B \ --served-model-name Qwen3-32B \ -tp 4 \ --host "0.0.0.0" \ --port "8000" \ --trust-remote-code \ --speculative-config '{"method":"eagle3","model":"/home/scd/qwen3_32b_eagle3/","num_speculative_tokens":4,"draft_tensor_parallel_size":1}' \ --max-num-batched-tokens 4096 \ --max-model-len 4096

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen3-32B",
        "prompt": "hi, where is the capital of France?",
        "max_tokens": 10,
        "temperature": 0
    }' | python3 -m json.tool

vLLM version: v0.11.0
vLLM-ascend version: v0.11.0rc2

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: sunchendd <sunchendong@xfusion.com>

github-actions · 2025-11-29T03:12:53Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request fixes an inference failure with Eagle3 speculative decoding. The changes primarily involve updating the attention mask generation logic in get_splitfuse_attn_mask to handle different scenarios correctly, adjusting the attention state for Eagle3 in the model runner, and modifying how the attention mask is obtained in the Eagle proposer. The core logic of the fix seems sound. I've identified a critical issue in vllm_ascend/worker/model_runner_v1.py where the logic for determining the attention state for Eagle3 speculative decoding could be incorrect, potentially leading to the use of a wrong attention mechanism. I've provided a detailed comment and a suggested fix for this.

gemini-code-assist · 2025-11-29T03:18:45Z

vllm_ascend/worker/model_runner_v1.py

+            if self.drafter and self.drafter.name in (SpecDcodeType.EAGLE,
+                                                      SpecDcodeType.EAGLE3):
                attn_state = AscendAttentionState.ChunkedPrefill
+            else:
+                attn_state = AscendAttentionState.SpecDecoding


The logic to determine the attention state for Eagle3 speculative decoding appears to be incorrect. Currently, it sets attn_state to AscendAttentionState.ChunkedPrefill for Eagle and Eagle3, and AscendAttentionState.SpecDecoding for other drafters. However, ChunkedPrefill is typically used for prefill stages, not for speculative decoding which happens after prefill. For speculative decoding, AscendAttentionState.SpecDecoding should be used to ensure the correct attention mechanism is applied. Using ChunkedPrefill here could lead to incorrect attention masks and potential failures during the decoding phase.

Suggested change

if self.drafter and self.drafter.name in (SpecDcodeType.EAGLE,

SpecDcodeType.EAGLE3):

attn_state = AscendAttentionState.ChunkedPrefill

else:

attn_state = AscendAttentionState.SpecDecoding

if self.drafter and self.drafter.name in (SpecDcodeType.EAGLE,

SpecDcodeType.EAGLE3):

attn_state = AscendAttentionState.SpecDecoding

else:

attn_state = AscendAttentionState.SpecDecoding

[Bugfix] Fix the Eagle3 inference failure issue

def02d1

Signed-off-by: sunchendd <sunchendong@xfusion.com>

gemini-code-assist bot reviewed Nov 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix the Eagle3 inference failure issue #4559

[Bugfix] Fix the Eagle3 inference failure issue #4559

sunchendd commented Nov 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Bugfix] Fix the Eagle3 inference failure issue #4559

Are you sure you want to change the base?

[Bugfix] Fix the Eagle3 inference failure issue #4559

Conversation

sunchendd commented Nov 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sunchendd commented Nov 29, 2025 •

edited by github-actions bot

Loading