-
Notifications
You must be signed in to change notification settings - Fork 547
[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to disable the chunked prefill feature for Non-MLA models due to performance issues. The changes correctly modify the configuration to disable this feature. However, there is a critical bug in the implementation that will cause a NameError when running with Non-MLA models on the v1 engine. I've provided a suggestion to fix this issue.
vllm_ascend/platform.py
Outdated
| vllm_config.scheduler_config.chunked_prefill_enabled = False | ||
| if envs.VLLM_USE_V1: | ||
| ascend_config.ascend_scheduler_config.enabled = True | ||
| ascend_scheduler_config["enable_chunked_prefill"] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable ascend_scheduler_config is not defined at this point, which will lead to a NameError when this code path is executed. It seems you intended to modify the ascend_scheduler_config object within the ascend_config. The correct way to do this would be to access it via ascend_config.ascend_scheduler_config.
| ascend_scheduler_config["enable_chunked_prefill"] = False | |
| ascend_config.ascend_scheduler_config.enable_chunked_prefill = False |
4263288 to
1d6e568
Compare
1d6e568 to
4628665
Compare
4628665 to
1a5343e
Compare
e34ddcb to
0523844
Compare
1c6d26e to
0b2f01c
Compare
0b2f01c to
065d2f0
Compare
Signed-off-by: rjg-lyh <1318825571@qq.com>
065d2f0 to
85098c1
Compare
Current 0.9.1-dev lastest commit: 5926225Case 1 git revert 9dc23b6, test
There are still stable 0.5 ms perf regression compare with your PR and revert #2326 |
### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: #2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>
…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal.
At the same time, in engine v1 mode, the ascend scheduler is forcibly enabled, and the
enable_chunked_prefillspecified by the user in additional_config is disabled.Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI passed with new added/existing test.