[CI Failure] fix_test_auto_prefix_cache_support #26053

hl475 · 2025-10-01T22:28:33Z

Purpose

Recent changes from #25075 breaks two tests from tests/models/language/pooling/test_auto_prefix_cache_support.py (https://buildkite.com/vllm/ci/builds/33031/steps/canvas?sid=01999dee-5c62-422e-9c31-054b1091dc6b).

Per suggestion, the breaking change from previous PR is from https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L399-L402 or

        if (self.scheduler_config.chunked_prefill_enabled is False
                or disable_chunked_prefill_reasons):

This PR is changing this predicate by considering self.cache_config.enable_prefix_caching as well.

Test Plan

pytest -v -s tests/models/language/pooling/test_auto_prefix_cache_support.py

pytest -v -s tests/v1/core/test_scheduler.py

Test Result

both succeed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

DarkLight1337 · 2025-10-02T02:31:06Z

cc @noooop can you review this?

noooop · 2025-10-02T05:54:12Z

cc @noooop can you review this?

Sorry. I’m currently traveling and can only run the test locally when I return in a few days.

DarkLight1337 · 2025-10-02T06:03:35Z

This is due to recent V1 config finalization ties prefix caching (APC) to the scheduler’s chunked prefill (CP).

For reference, which PR is this referring to?

noooop · 2025-10-02T06:06:44Z

This is due to recent V1 config finalization ties prefix caching (APC) to the scheduler’s chunked prefill (CP).

For reference, which PR is this referring to?

Should these models have enable_prefix_caching=True by default? ？？Start from #20930

I’ll need some time to catch up.

I think it would be best to fix this default behavior.

noooop · 2025-10-02T06:15:48Z

https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py

Did one of these three PRs break it?

hl475 · 2025-10-02T06:28:46Z

@noooop @DarkLight1337

my guess will be e23cacd#diff-bee6813076031d3ca1edc903c1b02b81e4676519afc562ce3fefe37f20c7b650 which is changing some logic about chunked_prefill_enabled config

DarkLight1337 · 2025-10-02T06:37:35Z

The enable_chunked_prefill flag should be enabled by default for these models. I think we should fix the previous PR to avoid this regression instead of updating the tests.

noooop · 2025-10-03T06:04:25Z

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here？

hl475 · 2025-10-03T06:22:24Z

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here？

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same

(Pdb) self.enable_chunked_prefill
False

noooop · 2025-10-03T06:37:26Z

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here？

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same
(Pdb) self.enable_chunked_prefill
False

Will the test pass at this point？

hl475 · 2025-10-03T06:38:28Z

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here？

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same
(Pdb) self.enable_chunked_prefill
False
Will the test pass at this point？

Yes!

noooop · 2025-10-03T06:49:57Z

before line if self.enable_chunked_prefill is None: so it will not going into the if branch on L1576

Was it also self.enable_chunked_prefill = false before #25075 here？

Yeah, just checked with commit 2e1b8bc2 (right before commit e23cacda3 or #25075 ), I am seeing the same
(Pdb) self.enable_chunked_prefill
False
Will the test pass at this point？
Yes!

I can’t think of why it would work.

╮(╯_╰)╭

we’ve got to refactor this piece of logic.

hl475 · 2025-10-03T07:04:58Z

@noooop

i am thinking some other alternative without introducing this complex helper function - as we already identified the root cause PR, and the key change from that PR which breaks the two model tests are from

if (self.scheduler_config.chunked_prefill_enabled is False
                or disable_chunked_prefill_reasons):

(before that PR, it is just if disable_chunked_prefill_reasons:), I am thinking maybe I can update the condition to something like

apc_requested = (self.cache_config is not None and self.cache_config.enable_prefix_caching)
if (
     disable_chunked_prefill_reasons
     or (self.scheduler_config.chunked_prefill_enabled is False and not apc_requested)
):

I think the test will pass but not sure which way is more appropriate. What do you think?

noooop · 2025-10-03T09:25:42Z

without introducing this complex helper function

I am thinking maybe I can update the condition to something like
apc_requested = (self.cache_config is not None and self.cache_config.enable_prefix_caching)
if (
     disable_chunked_prefill_reasons
     or (self.scheduler_config.chunked_prefill_enabled is False and not apc_requested)
):
I think the test will pass but not sure which way is more appropriate. What do you think?

I prefer the approach below. The helper function feels too complex.

hl475 · 2025-10-03T09:59:28Z

Thanks @noooop and @DarkLight1337 !

I updated this PR again based on the discussion. Please take another look when you have a chance.

noooop · 2025-10-03T14:25:37Z

vllm/config/vllm.py

+        apc_requested = (self.cache_config is not None
+                         and self.cache_config.enable_prefix_caching)
+        if (disable_chunked_prefill_reasons
+                or (self.scheduler_config.enable_chunked_prefill is False


Honestly, since I can only look at the code on my phone, I have a ton of questions about this part of the logic.

LGTM if @DarkLight1337 approves. Or, I’ll give it a closer look when I’m back on the 8th.

DarkLight1337

Let's unblock the CI first. @noooop can clean it up later if needed

noooop · 2025-10-03T17:22:41Z

@ maxdebayser

Sorry. I’m currently traveling, I can only look at the code on my phone.

Please look into this issue. The logic for enabling chunked prefill in the pooling model and the encode-decode model is somewhat coupled. As more and more PRs modify this logic, it has become a complete mess.

The strangest thing is, self.enable_chunked_prefill == false is on line 1575. （maybe unrelated）

vllm/vllm/engine/arg_utils.py

Lines 1549 to 1580 in 47b9339

    
           if model_config.runner_type != "pooling": 
        
               self.enable_chunked_prefill = True 
        
               # TODO: When prefix caching supports prompt embeds inputs, this 
        
               # check can be removed. 
        
               if (self.enable_prompt_embeds 
        
                       and self.enable_prefix_caching is not False): 
        
                   logger.warning( 
        
                       "--enable-prompt-embeds and --enable-prefix-caching " 
        
                       "are not supported together in V1. Prefix caching has " 
        
                       "been disabled.") 
        
                   self.enable_prefix_caching = False 
        
               if self.enable_prefix_caching is None: 
        
                   self.enable_prefix_caching = True 
        
           else: 
        
               pooling_type = model_config.pooler_config.pooling_type 
        
               is_causal = getattr(model_config.hf_config, "is_causal", True) 
        
               incremental_prefill_supported = (pooling_type is not None 
        
                                                and pooling_type.lower() == "last" 
        
                                                and is_causal) 
        
               action = "Enabling" if \ 
        
                   incremental_prefill_supported else "Disabling" 
        
               if self.enable_chunked_prefill is None: 
        
                   self.enable_chunked_prefill = incremental_prefill_supported 
        
                   logger.info("(%s) chunked prefill by default", action) 
        
               if self.enable_prefix_caching is None: 
        
                   self.enable_prefix_caching = incremental_prefill_supported 
        
                   logger.info("(%s) prefix caching by default", action)

The function is named “Set Default Argument,” but the value may come from elsewhere, and it just makes the logic a more confusing.

I feel the next release is just around the corner, so we need to fix this issue ASAP.

noooop · 2025-10-04T01:13:57Z

@hl475

Current fix broke the encode-decode model. Let’s find a way to fix it.

Signed-off-by: Huamin Li <3ericli@gmail.com>

hl475 · 2025-10-04T06:38:13Z

@hl475

Current fix broke the encode-decode model. Let’s find a way to fix it.

Thanks! Just submitted a new version. Let's wait for CI

noooop · 2025-10-04T09:41:53Z

test_full_cudagraph.py fail is not related to this PR

Thanks for the fix

cc @DarkLight1337

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

Signed-off-by: Huamin Li <3ericli@gmail.com>

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Huamin Li <3ericli@gmail.com>

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

hl475 marked this pull request as ready for review October 1, 2025 22:29

hl475 requested review from DarkLight1337 and ywang96 as code owners October 1, 2025 22:29

hl475 mentioned this pull request Oct 2, 2025

fix_doc_only_skip vllm-project/ci-infra#185

Merged

hl475 force-pushed the fix_test_auto_prefix_cache_support branch from d1fa2f2 to 426d0e0 Compare October 2, 2025 18:12

hl475 requested review from ApostaC, ProExpertProg, WoosukKwon, alexm-redhat, comaniac, heheda12345, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 2, 2025 18:12

hl475 marked this pull request as draft October 2, 2025 18:12

mergify bot added the v1 label Oct 2, 2025

hl475 changed the title ~~[CI Failure] fix_test_auto_prefix_cache_support~~ Add a helper function _finalize_cp_apc in VllmConfig to finalize CP/APC coherently Oct 2, 2025

hl475 force-pushed the fix_test_auto_prefix_cache_support branch from 426d0e0 to 306b678 Compare October 3, 2025 09:54

hl475 changed the title ~~Add a helper function _finalize_cp_apc in VllmConfig to finalize CP/APC coherently~~ [CI Failure] fix_test_auto_prefix_cache_support Oct 3, 2025

noooop reviewed Oct 3, 2025

View reviewed changes

DarkLight1337 approved these changes Oct 3, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 3, 2025 14:28

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 3, 2025

fix_test_auto_prefix_cache_support

ce0cf9b

Signed-off-by: Huamin Li <3ericli@gmail.com>

auto-merge was automatically disabled October 4, 2025 06:37
Head branch was pushed to by a user without write access

hl475 force-pushed the fix_test_auto_prefix_cache_support branch from 306b678 to ce0cf9b Compare October 4, 2025 06:37

vllm-bot merged commit 7d6b033 into vllm-project:main Oct 4, 2025
43 of 45 checks passed

tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

abe8a61

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

c8db6bc

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

43e1a70

Signed-off-by: Huamin Li <3ericli@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

74b10d3

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

noooop mentioned this pull request Oct 11, 2025

[Model][6/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. #26623

Draft

5 tasks

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

cae5172

Signed-off-by: Huamin Li <3ericli@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

ab82593

Signed-off-by: Huamin Li <3ericli@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[CI Failure] fix_test_auto_prefix_cache_support (vllm-project#26053)

34991b0

Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Uh oh!

[CI Failure] fix_test_auto_prefix_cache_support #26053

[CI Failure] fix_test_auto_prefix_cache_support #26053

Uh oh!

Conversation

hl475 commented Oct 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

DarkLight1337 commented Oct 2, 2025

Uh oh!

noooop commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Oct 2, 2025

Uh oh!

noooop commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 2, 2025

Uh oh!

hl475 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hl475 commented Oct 3, 2025

Uh oh!

noooop commented Oct 3, 2025

Uh oh!

hl475 commented Oct 3, 2025

Uh oh!

noooop commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hl475 commented Oct 3, 2025

Uh oh!

noooop commented Oct 3, 2025

Uh oh!

hl475 commented Oct 3, 2025

Uh oh!

noooop Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 4, 2025

Uh oh!

hl475 commented Oct 4, 2025

Uh oh!

noooop commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hl475 commented Oct 1, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 2, 2025 •

edited

Loading

noooop commented Oct 2, 2025 •

edited

Loading

hl475 commented Oct 2, 2025 •

edited

Loading

DarkLight1337 commented Oct 2, 2025 •

edited

Loading

noooop commented Oct 3, 2025 •

edited

Loading

noooop commented Oct 3, 2025 •

edited

Loading

noooop Oct 3, 2025 •

edited

Loading

noooop commented Oct 3, 2025 •

edited

Loading

noooop commented Oct 4, 2025 •

edited

Loading