-
Notifications
You must be signed in to change notification settings - Fork 548
[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to disable chunked prefill for Non-MLA models by enabling the AscendScheduler, which has this feature off by default. The overall logic is sound. I've provided two high-severity comments. The first addresses a misleading log message that could cause confusion. The second suggests a refactoring to simplify a verbose and potentially brittle boolean check, improving code clarity and robustness.
| "Non-MLA LLMs forcibly disable the chunked prefill feature," | ||
| "as the performance of operators supporting this feature " | ||
| "functionality is currently suboptimal.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message is misleading. It states that chunked prefill is "forcibly disabled", but the code only enables a scheduler that has it disabled by default and warns the user if they have explicitly enabled it. This can be confusing for users. I suggest a more accurate and concise message.
| "Non-MLA LLMs forcibly disable the chunked prefill feature," | |
| "as the performance of operators supporting this feature " | |
| "functionality is currently suboptimal.") | |
| "For Non-MLA models, chunked prefill is disabled by default for performance reasons." |
vllm_ascend/platform.py
Outdated
| chunked_prefill_enabled_in_ascend_scheduler = False | ||
| if hasattr(ascend_scheduler_config, "enable_chunked_prefill") and \ | ||
| ascend_scheduler_config.enable_chunked_prefill == True: | ||
| chunked_prefill_enabled_in_ascend_scheduler = True | ||
| logger.warning( | ||
| "Chunked prefill feature is enabled in ascend_scheduler," | ||
| "but note that the operator supporting this feature " | ||
| "would lead to performance degradation.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic to check if chunked prefill is enabled is verbose and uses a brittle == True comparison, which can lead to unexpected behavior if the configuration value is not a strict boolean. This can be simplified and made more robust by using getattr and a direct boolean check, which improves readability and correctness.
chunked_prefill_enabled_in_ascend_scheduler = getattr(
ascend_scheduler_config, "enable_chunked_prefill", False)
if chunked_prefill_enabled_in_ascend_scheduler:
logger.warning(
"Chunked prefill feature is enabled in ascend_scheduler,"
"but note that the operator supporting this feature "
"would lead to performance degradation.")fc1d9a3 to
a6fd68c
Compare
Signed-off-by: rjg-lyh <1318825571@qq.com>
a6fd68c to
65013d9
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2894 +/- ##
==========================================
+ Coverage 74.76% 74.97% +0.20%
==========================================
Files 150 154 +4
Lines 20891 21308 +417
==========================================
+ Hits 15620 15976 +356
- Misses 5271 5332 +61
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
### What this PR does / why we need it? PR #2894 make ascend_scheduler_config.enabled always be `True` for non-mla models,when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
### What this PR does / why we need it? PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for non-mla models,when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com> Signed-off-by: huangdong2022 <huangdong51@huawei.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
### What this PR does / why we need it? PR vllm-project#2894 make ascend_scheduler_config.enabled always be `True` for non-mla models,when `ascend_scheduler_config.enabled=True `, it will always initialize `AscendScheduler` which is a subclass of `Scheduler`, but when we enbale async_scheduling,we need to initialize `AsyncScheduler` in vllm, this will make async_scheduling can't be enabled. ### Does this PR introduce _any_ user-facing change? not-related ### How was this patch tested? when user set `async_scheduling`, it means user don't want to use `AscendScheduler`, so we shouldn't set `ascend_scheduler_config.enabled = True` - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@f225ea7 Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
### What this PR does / why we need it? This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: rjg-lyh <1318825571@qq.com>
…llm-project#3967) This PR reverts the changes introduced in PR vllm-project#2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. No. CI passed with new added/existing test. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@83f478b Signed-off-by: rjg-lyh <1318825571@qq.com>
…4094) ### What this PR does / why we need it? Cherry-pick #3967 from main branch. This PR reverts the changes introduced in PR #2894 Initially, due to performance issues with the older version of the chunked prefill ops, the default behavior was to use the Ascend scheduler to disable the chunked prefill feature. However, with the improvements in the performance of the new chunked prefill ops, this interception strategy has been removed. This change also aligns with the community's default configuration behavior. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Signed-off-by: rjg-lyh <1318825571@qq.com>

What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI passed with new added/existing test.
Related: #2659