[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659

rjg-lyh · 2025-08-30T12:39:45Z

What this PR does / why we need it?

This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal.
At the same time, in engine v1 mode, the ascend scheduler is forcibly enabled, and the enable_chunked_prefill specified by the user in additional_config is disabled.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

gemini-code-assist

Code Review

This pull request aims to disable the chunked prefill feature for Non-MLA models due to performance issues. The changes correctly modify the configuration to disable this feature. However, there is a critical bug in the implementation that will cause a NameError when running with Non-MLA models on the v1 engine. I've provided a suggestion to fix this issue.

gemini-code-assist · 2025-08-30T12:41:11Z

vllm_ascend/platform.py

+            vllm_config.scheduler_config.chunked_prefill_enabled = False
+            if envs.VLLM_USE_V1:
+                ascend_config.ascend_scheduler_config.enabled = True
+                ascend_scheduler_config["enable_chunked_prefill"] = False


The variable ascend_scheduler_config is not defined at this point, which will lead to a NameError when this code path is executed. It seems you intended to modify the ascend_scheduler_config object within the ascend_config. The correct way to do this would be to access it via ascend_config.ascend_scheduler_config.

Suggested change

ascend_scheduler_config["enable_chunked_prefill"] = False

ascend_config.ascend_scheduler_config.enable_chunked_prefill = False

Signed-off-by: rjg-lyh <1318825571@qq.com>

Yikun · 2025-09-03T00:58:32Z

Current 0.9.1-dev lastest commit: `5926225`

Case 1 git revert 9dc23b6, test
Case 2 apply #2659, test (green part latency +0.5 ms)

There are still stable 0.5 ms perf regression compare with your PR and revert #2326

# Start vLLM V1
export MODEL=Qwen/Qwen3-8B
VLLM_USE_V1=1 VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server --model $MODEL \
         --tensor-parallel-size 1 --swap-space 16 --disable-log-stats \
         --disable-log-requests  --load-format dummy

# Benchmark
docker exec -it  yikun-091 bash
export MODEL=Qwen/Qwen3-8B
export VLLM_USE_MODELSCOPE=true
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r /vllm-workspace/vllm-ascend/benchmarks/requirements-bench.txt
python3 /vllm-workspace/vllm/benchmarks/benchmark_serving.py --model $MODEL --dataset-name random \
         --random-input-len 200 --num-prompts 200 --request-rate 1 \
         --save-result --result-dir ./

### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: #2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

…ect#2894) ### What this PR does / why we need it? This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal. Unless the user has enabled chunked prefill in the ascend_scheduler_config, we would allow this feature. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI passed with new added/existing test. Related: vllm-project#2659 - vLLM version: main - vLLM main: vllm-project/vllm@d21a36f Signed-off-by: rjg-lyh <1318825571@qq.com>

github-actions bot added the module:core label Aug 30, 2025

gemini-code-assist bot reviewed Aug 30, 2025

View reviewed changes

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 4 times, most recently from 4263288 to 1d6e568 Compare August 30, 2025 12:57

rjg-lyh changed the title ~~[main][bugfix] disable the chunked prefill feature in Non-MLA models~~ [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA models Aug 30, 2025

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 1d6e568 to 4628665 Compare August 30, 2025 13:26

wangxiyuan mentioned this pull request Aug 30, 2025

[Release]: Release checklist for v0.9.1 #2585

Closed

39 tasks

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 4628665 to 1a5343e Compare September 1, 2025 01:18

rjg-lyh changed the title ~~[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA models~~ [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs Sep 1, 2025

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 4 times, most recently from e34ddcb to 0523844 Compare September 1, 2025 08:12

github-actions bot added documentation Improvements or additions to documentation module:tests labels Sep 1, 2025

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 13 times, most recently from 1c6d26e to 0b2f01c Compare September 2, 2025 13:17

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 0b2f01c to 065d2f0 Compare September 2, 2025 14:15

[main][bugfix] disable the chunked prefill feature in Non-MLA models

85098c1

Signed-off-by: rjg-lyh <1318825571@qq.com>

rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 065d2f0 to 85098c1 Compare September 2, 2025 15:53

wangxiyuan approved these changes Sep 3, 2025

View reviewed changes

wangxiyuan merged commit 47eaf62 into vllm-project:v0.9.1-dev Sep 3, 2025
17 checks passed

Yikun mentioned this pull request Sep 12, 2025

[Core] Disable the chunked prefill feature in Non-MLA LLMs #2894

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659

[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659

Uh oh!

rjg-lyh commented Aug 30, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 30, 2025

Uh oh!

Yikun commented Sep 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	ascend_scheduler_config["enable_chunked_prefill"] = False
	ascend_config.ascend_scheduler_config.enable_chunked_prefill = False

[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659

[v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs #2659

Uh oh!

Conversation

rjg-lyh commented Aug 30, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

Yikun commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current 0.9.1-dev lastest commit: 5926225

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yikun commented Sep 3, 2025 •

edited

Loading

Current 0.9.1-dev lastest commit: `5926225`