[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

rjg-lyh · 2025-09-12T10:05:24Z

What this PR does / why we need it?

This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Related: #2659

vLLM version: main
vLLM main: vllm-project/vllm@d21a36f

github-actions · 2025-09-12T10:05:32Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request aims to disable chunked prefill for Non-MLA models by enabling the AscendScheduler, which has this feature off by default. The overall logic is sound. I've provided two high-severity comments. The first addresses a misleading log message that could cause confusion. The second suggests a refactoring to simplify a verbose and potentially brittle boolean check, improving code clarity and robustness.

gemini-code-assist · 2025-09-12T10:06:59Z

vllm_ascend/platform.py

+                "Non-MLA LLMs forcibly disable the chunked prefill feature,"
+                "as the performance of operators supporting this feature "
+                "functionality is currently suboptimal.")


The log message is misleading. It states that chunked prefill is "forcibly disabled", but the code only enables a scheduler that has it disabled by default and warns the user if they have explicitly enabled it. This can be confusing for users. I suggest a more accurate and concise message.

Suggested change

"Non-MLA LLMs forcibly disable the chunked prefill feature,"

"as the performance of operators supporting this feature "

"functionality is currently suboptimal.")

"For Non-MLA models, chunked prefill is disabled by default for performance reasons."

gemini-code-assist · 2025-09-12T10:06:59Z

vllm_ascend/platform.py

+                chunked_prefill_enabled_in_ascend_scheduler = False
+                if hasattr(ascend_scheduler_config, "enable_chunked_prefill") and \
+                    ascend_scheduler_config.enable_chunked_prefill == True:
+                    chunked_prefill_enabled_in_ascend_scheduler = True
+                    logger.warning(
+                        "Chunked prefill feature is enabled in ascend_scheduler,"
+                        "but note that the operator supporting this feature "
+                        "would lead to performance degradation.")


The logic to check if chunked prefill is enabled is verbose and uses a brittle == True comparison, which can lead to unexpected behavior if the configuration value is not a strict boolean. This can be simplified and made more robust by using getattr and a direct boolean check, which improves readability and correctness.

chunked_prefill_enabled_in_ascend_scheduler = getattr( ascend_scheduler_config, "enable_chunked_prefill", False) if chunked_prefill_enabled_in_ascend_scheduler: logger.warning( "Chunked prefill feature is enabled in ascend_scheduler," "but note that the operator supporting this feature " "would lead to performance degradation.")

Signed-off-by: rjg-lyh <1318825571@qq.com>

codecov · 2025-09-12T11:01:44Z

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.97%. Comparing base (1bbb20e) to head (65013d9).
⚠️ Report is 31 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/platform.py	83.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2894      +/-   ##
==========================================
+ Coverage   74.76%   74.97%   +0.20%     
==========================================
  Files         150      154       +4     
  Lines       20891    21308     +417     
==========================================
+ Hits        15620    15976     +356     
- Misses       5271     5332      +61

Flag	Coverage Δ
unittests	`74.97% <83.33%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Yikun · 2025-09-12T15:23:19Z

The round2 perf has perf regression which is very wired, but compare to v0.10.1rc1 it's OK, so merge this first.

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

### What this PR does / why we need it? PR #2894 make ascend_scheduler_config.enabled always be `True` for non-mla models，when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

### What this PR does / why we need it? PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for non-mla models，when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: huangdong2022 <huangdong51@huawei.com>

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

### What this PR does / why we need it? PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for non-mla models，when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

### What this PR does / why we need it? This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: rjg-lyh <1318825571@qq.com>

…llm-project#3967) This PR reverts the changes introduced in PR vllm-project#2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. No. CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: rjg-lyh <1318825571@qq.com>

…4094) ### What this PR does / why we need it? Cherry-pick #3967 from main branch. This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions bot added the module:core label Sep 12, 2025

gemini-code-assist bot reviewed Sep 12, 2025

View reviewed changes

rjg-lyh force-pushed the pr-bugfix-mask branch 3 times, most recently from fc1d9a3 to a6fd68c Compare September 12, 2025 10:36

[main][bugfix] disable the chunked prefill feature in Non-MLA LLMs

65013d9

Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-bugfix-mask branch from a6fd68c to 65013d9 Compare September 12, 2025 10:44

github-actions bot added the module:tests label Sep 12, 2025

rjg-lyh added ready read for review ready-for-test start test by label for PR labels Sep 12, 2025

Yikun added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 12, 2025

Yikun mentioned this pull request Sep 12, 2025

[Release]: Release checklist for v0.10.2rc1 #2859

Closed

42 tasks

wangxiyuan approved these changes Sep 12, 2025

View reviewed changes

Yikun changed the title ~~[main][bugfix] disable the chunked prefill feature in Non-MLA LLMs~~ [Core] Disable the chunked prefill feature in Non-MLA LLMs Sep 12, 2025

Yikun merged commit 585a494 into vllm-project:main Sep 12, 2025
60 of 67 checks passed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Ronald1995 mentioned this pull request Sep 23, 2025

fix error async_scheduler can't be enabled #3127

Merged

rjg-lyh mentioned this pull request Nov 4, 2025

[Core] Restore scheduling logic under default configuration #3967

Merged

rjg-lyh mentioned this pull request Nov 10, 2025

[V0.11.0][Core] Restore scheduling logic under default configuration #4094

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

Uh oh!

rjg-lyh commented Sep 12, 2025 •

edited by Yikun

Loading

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

gemini-code-assist bot Sep 12, 2025

Uh oh!

codecov bot commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Yikun commented Sep 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

Uh oh!

Conversation

rjg-lyh commented Sep 12, 2025 • edited by Yikun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Yikun commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rjg-lyh commented Sep 12, 2025 •

edited by Yikun

Loading

codecov bot commented Sep 12, 2025 •

edited

Loading

Yikun commented Sep 12, 2025 •

edited

Loading