-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[CI Failure] fix_test_auto_prefix_cache_support #26053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI Failure] fix_test_auto_prefix_cache_support #26053
Conversation
|
cc @noooop can you review this? |
Sorry. I’m currently traveling and can only run the test locally when I return in a few days. |
For reference, which PR is this referring to? |
Should these models have enable_prefix_caching=True by default? ??Start from #20930 I’ll need some time to catch up. I think it would be best to fix this default behavior. |
|
https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py Did one of these three PRs break it? |
|
my guess will be e23cacd#diff-bee6813076031d3ca1edc903c1b02b81e4676519afc562ce3fefe37f20c7b650 which is changing some logic about chunked_prefill_enabled config |
|
The |
d1fa2f2 to
426d0e0
Compare
Was it also self.enable_chunked_prefill = false before #25075 here? |
Yeah, just checked with commit |
Will the test pass at this point? |
Yes! |
I can’t think of why it would work. ╮(╯_╰)╭ we’ve got to refactor this piece of logic. |
|
i am thinking some other alternative without introducing this complex helper function - as we already identified the root cause PR, and the key change from that PR which breaks the two model tests are from (before that PR, it is just I think the test will pass but not sure which way is more appropriate. What do you think? |
I prefer the approach below. The helper function feels too complex. |
426d0e0 to
306b678
Compare
|
Thanks @noooop and @DarkLight1337 ! I updated this PR again based on the discussion. Please take another look when you have a chance. |
| apc_requested = (self.cache_config is not None | ||
| and self.cache_config.enable_prefix_caching) | ||
| if (disable_chunked_prefill_reasons | ||
| or (self.scheduler_config.enable_chunked_prefill is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, since I can only look at the code on my phone, I have a ton of questions about this part of the logic.
LGTM if @DarkLight1337 approves. Or, I’ll give it a closer look when I’m back on the 8th.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's unblock the CI first. @noooop can clean it up later if needed
|
@ maxdebayser Sorry. I’m currently traveling, I can only look at the code on my phone. Please look into this issue. The logic for enabling chunked prefill in the pooling model and the encode-decode model is somewhat coupled. As more and more PRs modify this logic, it has become a complete mess. The strangest thing is, self.enable_chunked_prefill == false is on line 1575. (maybe unrelated) Lines 1549 to 1580 in 47b9339
The function is named “Set Default Argument,” but the value may come from elsewhere, and it just makes the logic a more confusing. I feel the next release is just around the corner, so we need to fix this issue ASAP. |
|
Current fix broke the encode-decode model. Let’s find a way to fix it. |
Signed-off-by: Huamin Li <3ericli@gmail.com>
Head branch was pushed to by a user without write access
306b678 to
ce0cf9b
Compare
Thanks! Just submitted a new version. Let's wait for CI |
|
test_full_cudagraph.py fail is not related to this PR Thanks for the fix |
Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: Huamin Li <3ericli@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Recent changes from #25075 breaks two tests from
tests/models/language/pooling/test_auto_prefix_cache_support.py(https://buildkite.com/vllm/ci/builds/33031/steps/canvas?sid=01999dee-5c62-422e-9c31-054b1091dc6b).Per suggestion, the breaking change from previous PR is from https://github.com/vllm-project/vllm/blob/main/vllm/config/vllm.py#L399-L402 or
This PR is changing this predicate by considering
self.cache_config.enable_prefix_cachingas well.Test Plan
Test Result
both succeed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.