[BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1498

whx-sjtu · 2025-06-28T06:58:27Z

When use AscendScheduler with prefix-cache enabled and chunk-prefill disabled, there will be accuray problem because there is no branch in mla_v1 to process this scenario. This PR fixes it.

Signed-off-by: whx-sjtu <2952154980@qq.com>

codecov · 2025-06-28T07:13:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 31.65%. Comparing base (c30ddb8) to head (e6e7178).
⚠️ Report is 585 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1498      +/-   ##
==========================================
+ Coverage   27.39%   31.65%   +4.25%     
==========================================
  Files          56       60       +4     
  Lines        6191     6638     +447     
==========================================
+ Hits         1696     2101     +405     
- Misses       4495     4537      +42

Flag	Coverage Δ
unittests	`31.65% <ø> (+4.25%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Yikun · 2025-06-28T07:25:11Z

Thanks for fixing it, how can we prevent this precision regresion break again? And please fulfill commit msg

MengqingCao · 2025-06-28T10:23:55Z

Add e2e test in #1505

…uler (#1505) ### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after #1498 and #1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com>

…uler (vllm-project#1505) ### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after vllm-project#1498 and vllm-project#1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: ZhengWG <zwg0606@gmail.com>

…ug (vllm-project#1498) When use AscendScheduler with prefix-cache enabled and chunk-prefill disabled, there will be accuray problem because there is no branch in mla_v1 to process this scenario. This PR fixes it. Signed-off-by: whx-sjtu <2952154980@qq.com>

…uler (vllm-project#1505) ### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after vllm-project#1498 and vllm-project#1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com>

…ug (vllm-project#1498) When use AscendScheduler with prefix-cache enabled and chunk-prefill disabled, there will be accuray problem because there is no branch in mla_v1 to process this scenario. This PR fixes it. Signed-off-by: whx-sjtu <2952154980@qq.com>

…uler (vllm-project#1505) ### What this PR does / why we need it? Add test for chunked prefill and prefix cache on v1/AscendScheduler Covered scenarios: - `Qwen/Qwen3-0.6B-Base` and `deepseek-ai/DeepSeek-V2-Lite-Chat` --- multicard CI time increased by 19 min - `V1 + default scheduler` vs `V1 + default scheduler + enable prefix cache` - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable prefix cache` vs `V1 + Ascend scheduler + enable prefix cache + enable chunked prefill` - `Qwen/Qwen3-0.6B-Base` --- singlecard CI time increased by 8 min - `V1 + Ascend scheduler` vs `V1 + Ascend scheduler + enable chunked prefill` should rebase after vllm-project#1498 and vllm-project#1446 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added test. Signed-off-by: MengqingCao <cmq0113@163.com>

fix prefix cache accuracy bug

e6e7178

Signed-off-by: whx-sjtu <2952154980@qq.com>

Yikun added the no-test label Jun 28, 2025

MengqingCao mentioned this pull request Jun 28, 2025

[CI/UT] Add test for chunk prefill and prefix cache on v1/AscendScheduler #1505

Merged

Yikun changed the title ~~[BugFix] Fix prefix cache accuracy bug~~ [BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug Jun 29, 2025

Yikun added the ready read for review label Jun 29, 2025

Yikun mentioned this pull request Jun 29, 2025

[V0.9.1][BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1492

Merged

wangxiyuan approved these changes Jun 30, 2025

View reviewed changes

wangxiyuan merged commit f286265 into vllm-project:main Jun 30, 2025
33 checks passed

whx-sjtu deleted the fix_prefix_cache_main branch October 20, 2025 11:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1498

[BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1498

Uh oh!

whx-sjtu commented Jun 28, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 28, 2025 •

edited

Loading

Uh oh!

Yikun commented Jun 28, 2025 •

edited

Loading

Uh oh!

MengqingCao commented Jun 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1498

[BugFix] Address PrefillCacheHit state to fix prefix cache accuracy bug #1498

Uh oh!

Conversation

whx-sjtu commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yikun commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MengqingCao commented Jun 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

whx-sjtu commented Jun 28, 2025 •

edited

Loading

codecov bot commented Jun 28, 2025 •

edited

Loading

Yikun commented Jun 28, 2025 •

edited

Loading