[CI]enable chunked prefill by default #4569

wangxiyuan · 2025-11-29T08:23:47Z

set enable_chunked_prefill to True for e2e test by default to keep the same behavior with vLLM

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

gemini-code-assist

Code Review

This pull request enables chunked prefill by default in the test suite by changing the default value of enable_chunked_prefill in VllmRunner. This is accompanied by un-skipping a prefix caching test, presumably now fixed, and cleaning up a redundant parameter in another test.

My main concern is regarding the handling of chunked prefill for MLA models. The change enables chunked prefill for all models in tests, but there's an unused configuration parameter chunked_prefill_for_mla that suggests this behavior should be conditional for MLA models. This inconsistency could be confusing and lead to unexpected behavior. Please see my detailed comment on this.

Otherwise, the changes look consistent with the goal of enabling chunked prefill in CI.

gemini-code-assist · 2025-11-29T08:25:46Z

tests/e2e/conftest.py

        tensor_parallel_size: int = 1,
        block_size: int = 16,
-        enable_chunked_prefill: bool = False,
+        enable_chunked_prefill: bool = True,


Enabling enable_chunked_prefill by default will affect MLA (Multi-Layer Attention) models in tests. The logic in vllm_ascend/worker/model_runner_v1.py's _build_attn_state function will now set the attention state to ChunkedPrefill for all models, including MLA.

However, I found a configuration chunked_prefill_for_mla in vllm_ascend/ascend_config.py which seems intended to control this behavior for MLA models, but it is currently not used. This is confusing and could lead to unexpected behavior for developers trying to configure this feature.

If chunked prefill is now fully supported for MLA models and should be on by default, please consider removing the unused chunked_prefill_for_mla configuration to avoid confusion.

If chunked prefill for MLA is experimental or should be opt-in, the logic in _build_attn_state should be updated to respect this configuration. For example:

# In vllm_ascend/worker/model_runner_v1.py _build_attn_state ... elif self.scheduler_config.enable_chunked_prefill: if self.vllm_config.model_config.use_mla and not self.ascend_config.chunked_prefill_for_mla: attn_state = AscendAttentionState.PrefillCacheHit else: attn_state = AscendAttentionState.ChunkedPrefill ...

Given this PR changes the default for all tests, clarifying the intended behavior for MLA models is important for maintainability.

github-actions · 2025-11-29T08:39:46Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

wangxiyuan force-pushed the enable_cp branch from 55385aa to e2e4c4a Compare November 29, 2025 08:24

gemini-code-assist bot reviewed Nov 29, 2025

View reviewed changes

github-actions bot added the module:tests label Nov 29, 2025

MengqingCao added ready read for review ready-for-test start test by label for PR labels Nov 29, 2025

MengqingCao approved these changes Nov 29, 2025

View reviewed changes

[CI]enable chunked prefill by default

b67517e

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

wangxiyuan force-pushed the enable_cp branch from e2e4c4a to b67517e Compare December 1, 2025 07:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI]enable chunked prefill by default #4569

[CI]enable chunked prefill by default #4569

wangxiyuan commented Nov 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 29, 2025

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CI]enable chunked prefill by default #4569

Are you sure you want to change the base?

[CI]enable chunked prefill by default #4569

Conversation

wangxiyuan commented Nov 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangxiyuan commented Nov 29, 2025 •

edited by github-actions bot

Loading