-
-
Couldn't load subscription status.
- Fork 10.9k
[Core] Fix abrupt request abort #18485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@NickLucche Does #18829 fix this bug? |
|
Quickly tried to run it with my setup but it's crashing. |
|
I didn't try. I'm asking because #18829 (comment) by @Abatom says that PR can fix this problem. |
|
I will address asap to try and cover your PD separation scheme |
Thanks |
|
I've addressed the PR so that it should be compatible with your connector as well as latest changes to |
|
@NickLucche I don't think this has anything to do with the connector really or P/D lifecycle, it's a general problem related to aborting a request before it's been scheduled. It's just we only hit that path when a connector is in use. I actually agree with @Abatom and I think it would make sense to change the KVCacheManager I have pushed my own version of this here: main...njhill:vllm:fix-pd-abort (also moved the single cache group assertion to the constructor) |
I am still not 100% sure that changing the interface from a And yeah this has nothing to do with pd my bad, it just pops up more easily given the higher latency. |
|
Thanks @NickLucche, I pushed my update as you agreed to offline :) I've also now opened a related PR #19223 which I think helps shrink our P/D abort race condition window, which relies on the behaviour we've kept here to always notify the connector when requests are finished (it sounds similar to what @Abatom needs it for). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NickLucche!
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: nicklucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
|
I've rebased in light of the recent hybrid allocator changes, ptal when you get the chance @njhill |
|
This test |
vllm/v1/core/kv_cache_coordinator.py
Outdated
| manager.req_to_blocks[request_id] | ||
| for manager in self.single_type_managers | ||
| if request_id in manager.req_to_blocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickLucche I think we'll want to actually return a list of empty lists in this case. I didn't update it yet since we might as well do it after rebasing on the other fix.
| manager.req_to_blocks[request_id] | |
| for manager in self.single_type_managers | |
| if request_id in manager.req_to_blocks | |
| manager.req_to_blocks.get(request_id) or [] | |
| for manager in self.single_type_managers |
Signed-off-by: Nick Hill <nhill@redhat.com>
There's currently an issue when a KV Connector is in use and requests are aborted before they have been scheduled (and so before any blocks have been allocated in the kv cache manager). The scheduler
_connector_finishedmethod is invoked and it callskv_cache_manager.get_block_ids(request_id)which raises an exception because the kv cache manager doesn't recognize the request.We do still want to invoke the connector in this case, so that it can perform external cleanup if needed, but it makes sense to pass it empty blocks.
It's not always 100% guaranteed due to timings, but this setup makes it pretty reproducible:
simulate_abort.py
Client gets 500, D crashes with logs:
The following is a straightforward fix to the issue, let me know if you see a better solution to this.