Skip to content

Conversation

@hl475
Copy link
Contributor

@hl475 hl475 commented Oct 1, 2025

Purpose

Recent changes from #25075 breaks two tests from tests/models/language/pooling/test_auto_prefix_cache_support.py (https://buildkite.com/vllm/ci/builds/33031/steps/canvas?sid=01999dee-5c62-422e-9c31-054b1091dc6b).

Per suggestion, the breaking change from previous PR is from https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L399-L402 or

        if (self.scheduler_config.chunked_prefill_enabled is False
                or disable_chunked_prefill_reasons):

This PR is changing this predicate by considering self.cache_config.enable_prefix_caching as well.

Test Plan

pytest -v -s tests/models/language/pooling/test_auto_prefix_cache_support.py

pytest -v -s tests/v1/core/test_scheduler.py

Test Result

both succeed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@hl475 hl475 marked this pull request as ready for review October 1, 2025 22:29
@DarkLight1337
Copy link
Member

cc @noooop can you review this?

@noooop
Copy link
Collaborator

noooop commented Oct 2, 2025

cc @noooop can you review this?

Sorry. I’m currently traveling and can only run the test locally when I return in a few days.

@DarkLight1337
Copy link
Member

This is due to recent V1 config finalization ties prefix caching (APC) to the scheduler’s chunked prefill (CP).

For reference, which PR is this referring to?

@noooop
Copy link
Collaborator

noooop commented Oct 2, 2025

This is due to recent V1 config finalization ties prefix caching (APC) to the scheduler’s chunked prefill (CP).

For reference, which PR is this referring to?

Should these models have enable_prefix_caching=True by default? ??Start from #20930

I’ll need some time to catch up.

I think it would be best to fix this default behavior.

@noooop
Copy link
Collaborator

noooop commented Oct 2, 2025

https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py

Did one of these three PRs break it?

@hl475
Copy link
Contributor Author

hl475 commented Oct 2, 2025

@noooop @DarkLight1337

my guess will be e23cacd#diff-bee6813076031d3ca1edc903c1b02b81e4676519afc562ce3fefe37f20c7b650 which is changing some logic about chunked_prefill_enabled config

@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 2, 2025

The enable_chunked_prefill flag should be enabled by default for these models. I think we should fix the previous PR to avoid this regression instead of updating the tests.

@hl475 hl475 force-pushed the fix_test_auto_prefix_cache_support branch from d1fa2f2 to 426d0e0 Compare October 2, 2025 18:12
@hl475 hl475 marked this pull request as draft October 2, 2025 18:12
@mergify mergify bot added the v1 label Oct 2, 2025
@hl475 hl475 changed the title [CI Failure] fix_test_auto_prefix_cache_support Add a helper function _finalize_cp_apc in VllmConfig to finalize CP/APC coherently Oct 2, 2025
@noooop
Copy link
Collaborator

noooop commented Oct 3, 2025

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here?

@hl475
Copy link
Contributor Author

hl475 commented Oct 3, 2025

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here?

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same

(Pdb) self.enable_chunked_prefill
False

@noooop
Copy link
Collaborator

noooop commented Oct 3, 2025

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here?

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same

(Pdb) self.enable_chunked_prefill
False

Will the test pass at this point?

@hl475
Copy link
Contributor Author

hl475 commented Oct 3, 2025

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here?

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same

(Pdb) self.enable_chunked_prefill
False

Will the test pass at this point?

Yes!

@noooop
Copy link
Collaborator

noooop commented Oct 3, 2025

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here?

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same

(Pdb) self.enable_chunked_prefill
False

Will the test pass at this point?

Yes!

I can’t think of why it would work.

╮(╯_╰)╭

we’ve got to refactor this piece of logic.

@hl475
Copy link
Contributor Author

hl475 commented Oct 3, 2025

@noooop

i am thinking some other alternative without introducing this complex helper function - as we already identified the root cause PR, and the key change from that PR which breaks the two model tests are from

if (self.scheduler_config.chunked_prefill_enabled is False
                or disable_chunked_prefill_reasons):

(before that PR, it is just if disable_chunked_prefill_reasons:), I am thinking maybe I can update the condition to something like

apc_requested = (self.cache_config is not None and self.cache_config.enable_prefix_caching)
if (
     disable_chunked_prefill_reasons
     or (self.scheduler_config.chunked_prefill_enabled is False and not apc_requested)
):

I think the test will pass but not sure which way is more appropriate. What do you think?

@noooop
Copy link
Collaborator

noooop commented Oct 3, 2025

without introducing this complex helper function

I am thinking maybe I can update the condition to something like

apc_requested = (self.cache_config is not None and self.cache_config.enable_prefix_caching)
if (
     disable_chunked_prefill_reasons
     or (self.scheduler_config.chunked_prefill_enabled is False and not apc_requested)
):

I think the test will pass but not sure which way is more appropriate. What do you think?

I prefer the approach below. The helper function feels too complex.

@hl475 hl475 force-pushed the fix_test_auto_prefix_cache_support branch from 426d0e0 to 306b678 Compare October 3, 2025 09:54
@hl475 hl475 changed the title Add a helper function _finalize_cp_apc in VllmConfig to finalize CP/APC coherently [CI Failure] fix_test_auto_prefix_cache_support Oct 3, 2025
@hl475
Copy link
Contributor Author

hl475 commented Oct 3, 2025

Thanks @noooop and @DarkLight1337 !

I updated this PR again based on the discussion. Please take another look when you have a chance.

apc_requested = (self.cache_config is not None
and self.cache_config.enable_prefix_caching)
if (disable_chunked_prefill_reasons
or (self.scheduler_config.enable_chunked_prefill is False
Copy link
Collaborator

@noooop noooop Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, since I can only look at the code on my phone, I have a ton of questions about this part of the logic.

LGTM if @DarkLight1337 approves. Or, I’ll give it a closer look when I’m back on the 8th.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's unblock the CI first. @noooop can clean it up later if needed

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 3, 2025 14:28
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 3, 2025
@noooop
Copy link
Collaborator

noooop commented Oct 3, 2025

@ maxdebayser

Sorry. I’m currently traveling, I can only look at the code on my phone.

Please look into this issue. The logic for enabling chunked prefill in the pooling model and the encode-decode model is somewhat coupled. As more and more PRs modify this logic, it has become a complete mess.

The strangest thing is, self.enable_chunked_prefill == false is on line 1575. (maybe unrelated)

vllm/vllm/engine/arg_utils.py

Lines 1549 to 1580 in 47b9339

if model_config.runner_type != "pooling":
self.enable_chunked_prefill = True
# TODO: When prefix caching supports prompt embeds inputs, this
# check can be removed.
if (self.enable_prompt_embeds
and self.enable_prefix_caching is not False):
logger.warning(
"--enable-prompt-embeds and --enable-prefix-caching "
"are not supported together in V1. Prefix caching has "
"been disabled.")
self.enable_prefix_caching = False
if self.enable_prefix_caching is None:
self.enable_prefix_caching = True
else:
pooling_type = model_config.pooler_config.pooling_type
is_causal = getattr(model_config.hf_config, "is_causal", True)
incremental_prefill_supported = (pooling_type is not None
and pooling_type.lower() == "last"
and is_causal)
action = "Enabling" if \
incremental_prefill_supported else "Disabling"
if self.enable_chunked_prefill is None:
self.enable_chunked_prefill = incremental_prefill_supported
logger.info("(%s) chunked prefill by default", action)
if self.enable_prefix_caching is None:
self.enable_prefix_caching = incremental_prefill_supported
logger.info("(%s) prefix caching by default", action)

The function is named “Set Default Argument,” but the value may come from elsewhere, and it just makes the logic a more confusing.


I feel the next release is just around the corner, so we need to fix this issue ASAP.

@noooop
Copy link
Collaborator

noooop commented Oct 4, 2025

@hl475

Current fix broke the encode-decode model. Let’s find a way to fix it.

Signed-off-by: Huamin Li <3ericli@gmail.com>
auto-merge was automatically disabled October 4, 2025 06:37

Head branch was pushed to by a user without write access

@hl475 hl475 force-pushed the fix_test_auto_prefix_cache_support branch from 306b678 to ce0cf9b Compare October 4, 2025 06:37
@hl475
Copy link
Contributor Author

hl475 commented Oct 4, 2025

@hl475

Current fix broke the encode-decode model. Let’s find a way to fix it.

Thanks! Just submitted a new version. Let's wait for CI

@noooop
Copy link
Collaborator

noooop commented Oct 4, 2025

test_full_cudagraph.py fail is not related to this PR

Thanks for the fix

cc @DarkLight1337

@vllm-bot vllm-bot merged commit 7d6b033 into vllm-project:main Oct 4, 2025
43 of 45 checks passed
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants