Skip to content

Conversation

@rjg-lyh
Copy link
Collaborator

@rjg-lyh rjg-lyh commented Aug 30, 2025

What this PR does / why we need it?

This PR enforces the forcible disabling of the chunked prefill feature in Non-MLA models, as the performance of operators supporting this functionality is currently suboptimal.
At the same time, in engine v1 mode, the ascend scheduler is forcibly enabled, and the enable_chunked_prefill specified by the user in additional_config is disabled.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI passed with new added/existing test.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to disable the chunked prefill feature for Non-MLA models due to performance issues. The changes correctly modify the configuration to disable this feature. However, there is a critical bug in the implementation that will cause a NameError when running with Non-MLA models on the v1 engine. I've provided a suggestion to fix this issue.

vllm_config.scheduler_config.chunked_prefill_enabled = False
if envs.VLLM_USE_V1:
ascend_config.ascend_scheduler_config.enabled = True
ascend_scheduler_config["enable_chunked_prefill"] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable ascend_scheduler_config is not defined at this point, which will lead to a NameError when this code path is executed. It seems you intended to modify the ascend_scheduler_config object within the ascend_config. The correct way to do this would be to access it via ascend_config.ascend_scheduler_config.

Suggested change
ascend_scheduler_config["enable_chunked_prefill"] = False
ascend_config.ascend_scheduler_config.enable_chunked_prefill = False

@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 4 times, most recently from 4263288 to 1d6e568 Compare August 30, 2025 12:57
@rjg-lyh rjg-lyh changed the title [main][bugfix] disable the chunked prefill feature in Non-MLA models [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA models Aug 30, 2025
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 1d6e568 to 4628665 Compare August 30, 2025 13:26
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 4628665 to 1a5343e Compare September 1, 2025 01:18
@rjg-lyh rjg-lyh changed the title [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA models [v0.9.1][bugfix] disable the chunked prefill feature in Non-MLA LLMs Sep 1, 2025
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 4 times, most recently from e34ddcb to 0523844 Compare September 1, 2025 08:12
@github-actions github-actions bot added documentation Improvements or additions to documentation module:tests labels Sep 1, 2025
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch 13 times, most recently from 1c6d26e to 0b2f01c Compare September 2, 2025 13:17
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 0b2f01c to 065d2f0 Compare September 2, 2025 14:15
@rjg-lyh rjg-lyh force-pushed the pr-not-mla-ascend-scheduler branch from 065d2f0 to 85098c1 Compare September 2, 2025 15:53
@Yikun
Copy link
Collaborator

Yikun commented Sep 3, 2025

Current 0.9.1-dev lastest commit: 5926225

Case 1 git revert 9dc23b6, test
Case 2 apply #2659, test (green part latency +0.5 ms)

image

There are still stable 0.5 ms perf regression compare with your PR and revert #2326

# Start vLLM V1
export MODEL=Qwen/Qwen3-8B
VLLM_USE_V1=1 VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server --model $MODEL \
         --tensor-parallel-size 1 --swap-space 16 --disable-log-stats \
         --disable-log-requests  --load-format dummy

# Benchmark
docker exec -it  yikun-091 bash
export MODEL=Qwen/Qwen3-8B
export VLLM_USE_MODELSCOPE=true
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
pip install -r /vllm-workspace/vllm-ascend/benchmarks/requirements-bench.txt
python3 /vllm-workspace/vllm/benchmarks/benchmark_serving.py --model $MODEL --dataset-name random \
         --random-input-len 200 --num-prompts 200 --request-rate 1 \
         --save-result --result-dir ./

@wangxiyuan wangxiyuan merged commit 47eaf62 into vllm-project:v0.9.1-dev Sep 3, 2025
17 checks passed
Yikun pushed a commit that referenced this pull request Sep 12, 2025
### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: #2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
offline893 pushed a commit to offline893/vllm-ascend that referenced this pull request Sep 16, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Signed-off-by: offline0806 <z00858301@china.huawei.com>
wangxiaoteng888 pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Sep 25, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Sep 26, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ect#2894)

### What this PR does / why we need it?
This PR enforces the forcible disabling of the chunked prefill feature
in Non-MLA models, as the performance of operators supporting this
functionality is currently suboptimal. Unless the user has enabled
chunked prefill in the ascend_scheduler_config, we would allow this
feature.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed with new added/existing test.

Related: vllm-project#2659

- vLLM version: main
- vLLM main:
vllm-project/vllm@d21a36f

Signed-off-by: rjg-lyh <1318825571@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants