-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Bugfix] Fix _reqs_to_process leak on abort
#26012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a memory leak in the NixlConnector where request IDs of aborted requests were not being removed from the _reqs_to_process set on the worker. The fix introduces a new mechanism, reqs_not_processed, to explicitly communicate aborted requests from the scheduler to the worker, ensuring they are cleaned up correctly. The changes are well-targeted, and the logic appears sound. A new unit test has been added that effectively reproduces the bug scenario and verifies the fix. The implementation is clean and correctly resolves the identified leak.
|
@NickLucche as discussed, seeing metadata Here's a WIP commit - markmc@8c0a6bd Several FIXMEs to deal with there yet, but I'm reasonably convinced now that this is doable and is simpler? lmkwyt |
In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Iterated some more and submitted as #26172 |
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NickLucche! LGTM
I agree with @markmc that it's getting a bit complicated with all of the sets/dicts, I feel like we could consolidate these into a smaller number of dicts.
I'll try to look at @markmc's simplification proposal PR soon too.
In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
In trying to (rather artificially) reproduce #25067 I hit this: This looks like another |
In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
I think we're missing this? |
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
@markmc what commit were you running when you hit this? I can't find the version of the code which has
I don't think so... |
Yeah, sorry. It's commit 59f30d0 with this extra debugging:
The rare The way I see it:
|
Signed-off-by: NickLucche <nlucches@redhat.com>
We have observed a rare scenario with AsyncLLM where a client disconnect triggers an abort request after the request has finished, but before AsyncLLM has processed the request output. See vllm-project#26012, vllm-project#25067, vllm-project#25844, and llm-d/llm-d#187. Without the fix, the unit test fails with: ``` logger.warning( "Releasing expired KV blocks for request %s which were " "retrieved by %d decode worker(s) within %d seconds.", req_id, count, envs.VLLM_NIXL_ABORT_REQUEST_TIMEOUT, ) > self._reqs_to_process.remove(req_id) E KeyError: '0' vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:1238: KeyError ``` Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Submitted as #26351 |
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Fix a small leak that can happen whenever a request is aborted while prefilling on P, causing its id to not be removed from the set.
I've added a unit test to clarify the scenario.
The test should fail on main with