[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1416

yiz-liu · 2025-06-25T02:40:10Z

What this PR does / why we need it?

Reset all unused positions in NPUModelRunner to prevent out-of-bounds asserts in the GatherV3 operator.

Currently, in get_splitfuse_attn_mask, the position tensor may contain values that exceed the dimensions of the attention mask, triggering a GatherV3 boundary check failure. These invalid indices originate from stale “dirty” entries left over in position due to padding logic in the ACL graph. Specifically, in _process_reqs, the variable num_input_tokens is always greater than or equal to total_num_scheduled_tokens, so any positions not explicitly cleared from a previous batch will persist and cause this sporadic error.

BTW, in the original vLLM implementation, masks are constructed internally using other args, so these lingering values do not surface. However, on the Ascend platform—where split-fuse attention requires externally supplied masks—these residual indices become critical and lead to this elusive, hard-to-reproduce failure.

The fix is to explicitly reset or zero out all unused entries in the position tensor before passing it to GatherV3, ensuring that every index lies within the valid range of the attention mask.

Closes: #1038

Does this PR introduce any user-facing change?

No

How was this patch tested?

…rror Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

codecov · 2025-06-25T03:11:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 27.21%. Comparing base (c30ddb8) to head (0f3fa7b).
⚠️ Report is 550 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1416      +/-   ##
==========================================
- Coverage   27.39%   27.21%   -0.19%     
==========================================
  Files          56       56              
  Lines        6191     6214      +23     
==========================================
- Hits         1696     1691       -5     
- Misses       4495     4523      +28

Flag	Coverage Δ
unittests	`27.21% <ø> (-0.19%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Yikun

Considering that we have been digg this problem for a long long long time, it would be great if there is a regression test (ut or e2e test) to check and prevent the break again.

Feel free to add test in a separate PR.

wangxiyuan · 2025-06-26T01:27:25Z

Let's add the ut later once the test framwork is strong enough.

yiz-liu · 2025-06-26T02:40:50Z

Considering that we have been digg this problem for a long long long time, it would be great if there is a regression test (ut or e2e test) to check and prevent the break again.

Feel free to add test in a separate PR.

Maybe we should re-evaluate the code that generates the various attention masks to identify any conflicts with the padding logic, and add some tests for masks, as these do not exist in vLLM.

…rV3 (vllm-project#1416) ### What this PR does / why we need it? Reset all unused positions in `NPUModelRunner` to prevent out-of-bounds asserts in the `GatherV3` operator. Currently, in [`get_splitfuse_attn_mask`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/attention/attention.py#L124), the `position` tensor may contain values that exceed the dimensions of the attention mask, triggering a `GatherV3` boundary check failure. These invalid indices originate from stale “dirty” entries left over in `position` due to padding logic in the ACL graph. Specifically, in [`_process_reqs`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L989), the variable `num_input_tokens` is always greater than or equal to `total_num_scheduled_tokens`, so any positions not explicitly cleared from a previous batch will persist and cause this sporadic error. BTW, in the original vLLM implementation, masks are constructed internally using other args, so these lingering values do not surface. However, on the Ascend platform—where split-fuse attention requires externally supplied masks—these residual indices become critical and lead to this elusive, hard-to-reproduce failure. The fix is to explicitly reset or zero out all unused entries in the `position` tensor before passing it to `GatherV3`, ensuring that every index lies within the valid range of the attention mask. Closes: vllm-project#1038 ### Does this PR introduce _any_ user-facing change? No Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

…f-bounds in GatherV3 (vllm-project#1416) Merge branch wengang/cherry-pick-1416 of git@code.alipay.com:Theta/vllm-ascend.git into dev-v0.9.1.0622 https://code.alipay.com/Theta/vllm-ascend/pull_requests/222 Reviewed-by: 子宏 <tanzhiqiang.tzq@antgroup.com> * [Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 (vllm-project#1416)

…rV3 (vllm-project#1416) ### What this PR does / why we need it? Reset all unused positions in `NPUModelRunner` to prevent out-of-bounds asserts in the `GatherV3` operator. Currently, in [`get_splitfuse_attn_mask`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/attention/attention.py#L124), the `position` tensor may contain values that exceed the dimensions of the attention mask, triggering a `GatherV3` boundary check failure. These invalid indices originate from stale “dirty” entries left over in `position` due to padding logic in the ACL graph. Specifically, in [`_process_reqs`](https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/worker/model_runner_v1.py#L989), the variable `num_input_tokens` is always greater than or equal to `total_num_scheduled_tokens`, so any positions not explicitly cleared from a previous batch will persist and cause this sporadic error. BTW, in the original vLLM implementation, masks are constructed internally using other args, so these lingering values do not surface. However, on the Ascend platform—where split-fuse attention requires externally supplied masks—these residual indices become critical and lead to this elusive, hard-to-reproduce failure. The fix is to explicitly reset or zero out all unused entries in the `position` tensor before passing it to `GatherV3`, ensuring that every index lies within the valid range of the attention mask. Closes: vllm-project#1038 ### Does this PR introduce _any_ user-facing change? No Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

[Bugfix] Reset unused positions in NPUModelRunner to avoid GatherV3 e…

0f3fa7b

…rror Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

Yikun changed the title ~~[main][Bugfix] Fix GatherV3 bug~~ [Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 Jun 25, 2025

Yikun approved these changes Jun 25, 2025

View reviewed changes

Yikun added the ready read for review label Jun 25, 2025

Yikun reviewed Jun 25, 2025

View reviewed changes

wangxiyuan approved these changes Jun 26, 2025

View reviewed changes

wangxiyuan merged commit 2690697 into vllm-project:main Jun 26, 2025
28 checks passed

yiz-liu deleted the fix-gatherv3-main branch June 26, 2025 02:34

Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 26, 2025

MengqingCao mentioned this pull request Jun 30, 2025

[Bug]: qwen3 moe failed with aclgraph #1324

Closed

Yikun mentioned this pull request Jul 3, 2025

[Bug]: vllm-ascend v0.9.0rc2 may crash when executing parallel processing for multiple requests #1603

Closed

MengqingCao mentioned this pull request Jul 9, 2025

[Bug]: Assertion `(0 <= val && val < this->gxSize_)' Index 2938 out of range[0 2938)! #1682

Closed

Yikun mentioned this pull request Jul 11, 2025

[Bug]: V1 LLM Engine，Qwen2.5-VL-7B-Instruct, 算子报越界错误，但V0 Engine正常 #1731

Closed

Yikun mentioned this pull request Jul 18, 2025

[Bug]: Failed to complete vllm benchmark after enable VLLM_USE_V1=1 due to gather_v3 error #1038

Closed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1416

[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1416

Uh oh!

yiz-liu commented Jun 25, 2025 •

edited by Yikun

Loading

Uh oh!

codecov bot commented Jun 25, 2025 •

edited

Loading

Uh oh!

Yikun left a comment

Uh oh!

wangxiyuan commented Jun 26, 2025

Uh oh!

Uh oh!

yiz-liu commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1416

[Bugfix] Reset all unused positions to prevent out-of-bounds in GatherV3 #1416

Uh oh!

Conversation

yiz-liu commented Jun 25, 2025 • edited by Yikun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

codecov bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Jun 26, 2025

Uh oh!

Uh oh!

yiz-liu commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yiz-liu commented Jun 25, 2025 •

edited by Yikun

Loading

codecov bot commented Jun 25, 2025 •

edited

Loading