Skip to content

Conversation

@rjg-lyh
Copy link
Collaborator

@rjg-lyh rjg-lyh commented Sep 12, 2025

What this PR does / why we need it?

This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Related: #2659

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to disable chunked prefill for Non-MLA models by enabling the AscendScheduler, which has this feature off by default. The overall logic is sound. I've provided two high-severity comments. The first addresses a misleading log message that could cause confusion. The second suggests a refactoring to simplify a verbose and potentially brittle boolean check, improving code clarity and robustness.

Comment on lines +137 to +139
"Non-MLA LLMs forcibly disable the chunked prefill feature,"
"as the performance of operators supporting this feature "
"functionality is currently suboptimal.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The log message is misleading. It states that chunked prefill is "forcibly disabled", but the code only enables a scheduler that has it disabled by default and warns the user if they have explicitly enabled it. This can be confusing for users. I suggest a more accurate and concise message.

Suggested change
"Non-MLA LLMs forcibly disable the chunked prefill feature,"
"as the performance of operators supporting this feature "
"functionality is currently suboptimal.")
"For Non-MLA models, chunked prefill is disabled by default for performance reasons."

Comment on lines 147 to 152
chunked_prefill_enabled_in_ascend_scheduler = False
if hasattr(ascend_scheduler_config, "enable_chunked_prefill") and \
ascend_scheduler_config.enable_chunked_prefill == True:
chunked_prefill_enabled_in_ascend_scheduler = True
logger.warning(
"Chunked prefill feature is enabled in ascend_scheduler,"
"but note that the operator supporting this feature "
"would lead to performance degradation.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to check if chunked prefill is enabled is verbose and uses a brittle == True comparison, which can lead to unexpected behavior if the configuration value is not a strict boolean. This can be simplified and made more robust by using getattr and a direct boolean check, which improves readability and correctness.

                chunked_prefill_enabled_in_ascend_scheduler = getattr(
                    ascend_scheduler_config, "enable_chunked_prefill", False)
                if chunked_prefill_enabled_in_ascend_scheduler:
                    logger.warning(
                        "Chunked prefill feature is enabled in ascend_scheduler,"
                        "but note that the operator supporting this feature "
                        "would lead to performance degradation.")

@rjg-lyh rjg-lyh force-pushed the pr-bugfix-mask branch 3 times, most recently from fc1d9a3 to a6fd68c Compare September 12, 2025 10:36
@rjg-lyh rjg-lyh added ready read for review ready-for-test start test by label for PR labels Sep 12, 2025
@Yikun Yikun added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR and removed ready-for-test start test by label for PR labels Sep 12, 2025
@codecov
Copy link

codecov bot commented Sep 12, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.97%. Comparing base (1bbb20e) to head (65013d9).
⚠️ Report is 31 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/platform.py 83.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2894      +/-   ##
==========================================
+ Coverage   74.76%   74.97%   +0.20%     
==========================================
  Files         150      154       +4     
  Lines       20891    21308     +417     
==========================================
+ Hits        15620    15976     +356     
- Misses       5271     5332      +61     
Flag Coverage Δ
unittests 74.97% <83.33%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Yikun Yikun changed the title [main][bugfix] disable the chunked prefill feature in Non-MLA LLMs [Core] Disable the chunked prefill feature in Non-MLA LLMs Sep 12, 2025
@Yikun Yikun merged commit 585a494 into vllm-project:main Sep 12, 2025
60 of 67 checks passed
@Yikun
Copy link
Collaborator

Yikun commented Sep 12, 2025

image

The round2 perf has perf regression which is very wired, but compare to v0.10.1rc1 it's OK, so merge this first.

offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
wangxiyuan pushed a commit that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
PR #2894 make ascend_scheduler_config.enabled always be `True` for
non-mla models,when `ascend_scheduler_config.enabled=True `, it will
always initialize `AscendScheduler` which is a subclass of `Scheduler`,
but when we enbale async_scheduling,we need to initialize
`AsyncScheduler` in vllm, this will make async_scheduling can't be
enabled.

### Does this PR introduce _any_ user-facing change?
not-related

### How was this patch tested?
when user set `async_scheduling`, it means user don't want to use
`AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled
= True`

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@f225ea7

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
huangdong2022 pushed a commit to huangdong2022/vllm-ascend that referenced this pull request Sep 26, 2025
### What this PR does / why we need it?
PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for
non-mla models,when `ascend_scheduler_config.enabled=True `, it will
always initialize `AscendScheduler` which is a subclass of `Scheduler`,
but when we enbale async_scheduling,we need to initialize
`AsyncScheduler` in vllm, this will make async_scheduling can't be
enabled.

### Does this PR introduce _any_ user-facing change?
not-related

### How was this patch tested?
when user set `async_scheduling`, it means user don't want to use
`AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled
= True`

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@f225ea7

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: huangdong2022 <huangdong51@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for
non-mla models,when `ascend_scheduler_config.enabled=True `, it will
always initialize `AscendScheduler` which is a subclass of `Scheduler`,
but when we enbale async_scheduling,we need to initialize
`AsyncScheduler` in vllm, this will make async_scheduling can't be
enabled.

### Does this PR introduce _any_ user-facing change?
not-related

### How was this patch tested?
when user set `async_scheduling`, it means user don't want to use
`AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled
= True`

- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@f225ea7

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
wangxiyuan pushed a commit that referenced this pull request Nov 10, 2025
### What this PR does / why we need it?
This PR reverts the changes introduced in PR #2894 Initially, due to
performance issues with the older version of the chunked prefill ops,
the default behavior was to use the Ascend scheduler to disable the
chunked prefill feature. However, with the improvements in the
performance of the new chunked prefill ops, this interception strategy
has been removed. This change also aligns with the community's default
configuration behavior.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

Signed-off-by: rjg-lyh <1318825571@qq.com>
rjg-lyh added a commit to rjg-lyh/vllm-ascend that referenced this pull request Nov 10, 2025
…llm-project#3967)

This PR reverts the changes introduced in PR vllm-project#2894 Initially, due to
performance issues with the older version of the chunked prefill ops,
the default behavior was to use the Ascend scheduler to disable the
chunked prefill feature. However, with the improvements in the
performance of the new chunked prefill ops, this interception strategy
has been removed. This change also aligns with the community's default
configuration behavior.

No.

CI passed with new added/existing test.

- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@83f478b

Signed-off-by: rjg-lyh <1318825571@qq.com>
wangxiyuan pushed a commit that referenced this pull request Nov 10, 2025
…4094)

### What this PR does / why we need it?
Cherry-pick #3967 from main branch. This PR reverts the changes
introduced in PR #2894 Initially, due to performance issues with the
older version of the chunked prefill ops, the default behavior was to
use the Ascend scheduler to disable the chunked prefill feature.
However, with the improvements in the performance of the new chunked
prefill ops, this interception strategy has been removed. This change
also aligns with the community's default configuration behavior.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Signed-off-by: rjg-lyh <1318825571@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accuracy-test enable all accuracy test for PR module:core module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants