[Bugfix] Fix `_reqs_to_process` leak on abort #26012

NickLucche · 2025-10-01T10:27:51Z

Fix a small leak that can happen whenever a request is aborted while prefilling on P, causing its id to not be removed from the set.
I've added a unit test to clarify the scenario.
The test should fail on main with

>       assert req.request_id not in connector.connector_worker._reqs_to_process
E       AssertionError: assert 'id-1' not in {'id-1'}
E        +  where 'id-1' = <vllm.v1.request.Request object at 0x7f58fc128a10>.request_id
E        +  and   {'id-1'} = <tests.v1.kv_connector.unit.test_nixl_connector.FakeNixlConnectorWorker object at 0x7f58fc1289b0>._reqs_to_process
E        +    where <tests.v1.kv_connector.unit.test_nixl_connector.FakeNixlConnectorWorker object at 0x7f58fc1289b0> = <vllm.distributed.kv_transfer.kv_connector.v1.nixl_connector.NixlConnector object at 0x7f5901481760>.connector_worker

Signed-off-by: NickLucche <nlucches@redhat.com>

gemini-code-assist

Code Review

This pull request addresses a memory leak in the NixlConnector where request IDs of aborted requests were not being removed from the _reqs_to_process set on the worker. The fix introduces a new mechanism, reqs_not_processed, to explicitly communicate aborted requests from the scheduler to the worker, ensuring they are cleaned up correctly. The changes are well-targeted, and the logic appears sound. A new unit test has been added that effectively reproduces the bug scenario and verifies the fix. The implementation is clean and correctly resolves the identified leak.

markmc · 2025-10-02T13:34:06Z

@NickLucche as discussed, seeing metadata reqs_to_send, reqs_in_batch, reqs_not_processed, etc. made me wonder whether we can simplify this greatly by dealing with the expiry timeout on the scheduler side

Here's a WIP commit - markmc@8c0a6bd

Several FIXMEs to deal with there yet, but I'm reasonably convinced now that this is doable and is simpler? lmkwyt

In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc · 2025-10-03T14:41:01Z

@NickLucche as discussed, seeing metadata reqs_to_send, reqs_in_batch, reqs_not_processed, etc. made me wonder whether we can simplify this greatly by dealing with the expiry timeout on the scheduler side

Here's a WIP commit - markmc@8c0a6bd

Several FIXMEs to deal with there yet, but I'm reasonably convinced now that this is doable and is simpler? lmkwyt

Iterated some more and submitted as #26172

njhill

Thanks @NickLucche! LGTM

I agree with @markmc that it's getting a bit complicated with all of the sets/dicts, I feel like we could consolidate these into a smaller number of dicts.

I'll try to look at @markmc's simplification proposal PR soon too.

In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

markmc · 2025-10-06T13:23:35Z

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm-venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/gpu_worker.py", line 473, in execute_model
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm-venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/gpu_model_runner.py", line 2558, in execute_model
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     with (
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/usr/lib64/python3.12/contextlib.py", line 144, in __exit__
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     next(self.gen)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/kv_connector_model_runner_mixin.py", line 132, in _get_kv_connector_output
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     kv_connector.get_finished(scheduler_output.finished_req_ids)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See vllm-project#20139. In vllm-project#26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes moving all of this to the scheduler side, hopefully making the logic simpler. Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc · 2025-10-06T14:00:16Z

I think we're missing this?

        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

njhill · 2025-10-06T23:26:30Z

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
...
...
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

@markmc what commit were you running when you hit this? I can't find the version of the code which has self._reqs_to_process.remove(req_id) on line 1240 of nixl_connector.py.

I think we're missing this?

        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)

I don't think so... reqs_not_processed are the ones which haven't been / won't be added to _reqs_to_send, and hence why we need to clean them up here. Any other requests will be added to _reqs_to_send and need to remain in there until the timeout (or nixl notification, whichever comes sooner...).

markmc · 2025-10-07T10:26:11Z

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
...
...
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

@markmc what commit were you running when you hit this? I can't find the version of the code which has self._reqs_to_process.remove(req_id) on line 1240 of nixl_connector.py.

Yeah, sorry. It's commit 59f30d0 with this extra debugging:

--- a/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
@@ -1222,6 +1222,12 @@ class NixlConnectorWorker:
 
         # Handle timeout to avoid stranding blocks on remote.
         now = time.perf_counter()
+        if self._reqs_to_send:
+            logger.debug(
+                "Checking expiry of %d requests: %s",
+                len(self._reqs_to_send),
+                self._reqs_to_send.keys(),
+            )
         while self._reqs_to_send:
             req_id, expires = next(iter(self._reqs_to_send.items()))
             # Sorted dict, oldest requests are put first so we can exit early.

I think we're missing this?
        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)
I don't think so... reqs_not_processed are the ones which haven't been / won't be added to _reqs_to_send, and hence why we need to clean them up here. Any other requests will be added to _reqs_to_send and need to remain in there until the timeout (or nixl notification, whichever comes sooner...).

The rare abort-after-finish scenario means the request could be in reqs_to_send at this point ... so this is really what I meant:

        for req_id in metadata.reqs_not_processed:
+          self._reqs_to_send.pop(req_id, None)
            self._reqs_to_process.discard(req_id)

The way I see it:

reqs_to_process:

added when a remote decode request is scheduled in a batch
deleted when we have received finished notifications
deleted when the expiry timer fires
deleted when a request is aborted

reqs_to_send:

reqs added when prefill finishes and we set an expiry timer, only if already in reqs_to_process
deleted when we have received finished notifications
deleted when the expiry timer fires
(currently missing) deleted when a request is aborted

Signed-off-by: NickLucche <nlucches@redhat.com>

We have observed a rare scenario with AsyncLLM where a client disconnect triggers an abort request after the request has finished, but before AsyncLLM has processed the request output. See vllm-project#26012, vllm-project#25067, vllm-project#25844, and llm-d/llm-d#187. Without the fix, the unit test fails with: ``` logger.warning( "Releasing expired KV blocks for request %s which were " "retrieved by %d decode worker(s) within %d seconds.", req_id, count, envs.VLLM_NIXL_ABORT_REQUEST_TIMEOUT, ) > self._reqs_to_process.remove(req_id) E KeyError: '0' vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:1238: KeyError ``` Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc · 2025-10-07T11:27:09Z

The rare abort-after-finish scenario means the request could be in reqs_to_send at this point ... so this is really what I meant:
        for req_id in metadata.reqs_not_processed:
+          self._reqs_to_send.pop(req_id, None)
            self._reqs_to_process.discard(req_id)

Submitted as #26351

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: NickLucche <nlucches@redhat.com>

fix leak

413591c

Signed-off-by: NickLucche <nlucches@redhat.com>

NickLucche requested a review from ApostaC as a code owner October 1, 2025 10:27

mergify bot added v1 kv-connector labels Oct 1, 2025

gemini-code-assist bot reviewed Oct 1, 2025

View reviewed changes

markmc mentioned this pull request Oct 3, 2025

[NIXL] refactor scheduler->worker request state synchronization #26172

Open

njhill approved these changes Oct 3, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 3, 2025

njhill enabled auto-merge (squash) October 3, 2025 22:32

Merge branch 'main' into fix-async-pd-leak

f648172

njhill merged commit 2a6dc67 into vllm-project:main Oct 4, 2025
50 checks passed

tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

516f106

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

markmc mentioned this pull request Oct 6, 2025

[NIXL] Ignore abort on already-finished request #25067

Merged

karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

ae92997

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>

southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

61d9964

Signed-off-by: NickLucche <nlucches@redhat.com>

markmc mentioned this pull request Oct 7, 2025

[NIXL] Fix KeyError on abort-after-finished #26351

Closed

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

50390a3

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

4f70ad9

Signed-off-by: NickLucche <nlucches@redhat.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

e6ef7f9

Signed-off-by: NickLucche <nlucches@redhat.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

7aa5ad6

Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Fix _reqs_to_process leak on abort (vllm-project#26012)

5b3f210

Signed-off-by: NickLucche <nlucches@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix `_reqs_to_process` leak on abort #26012

[Bugfix] Fix `_reqs_to_process` leak on abort #26012

Uh oh!

NickLucche commented Oct 1, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

markmc commented Oct 2, 2025

Uh oh!

markmc commented Oct 3, 2025

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

njhill commented Oct 6, 2025

Uh oh!

markmc commented Oct 7, 2025

Uh oh!

markmc commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Fix _reqs_to_process leak on abort #26012

[Bugfix] Fix _reqs_to_process leak on abort #26012

Uh oh!

Conversation

NickLucche commented Oct 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

markmc commented Oct 2, 2025

Uh oh!

markmc commented Oct 3, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

markmc commented Oct 6, 2025

Uh oh!

njhill commented Oct 6, 2025

Uh oh!

markmc commented Oct 7, 2025

Uh oh!

markmc commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix] Fix `_reqs_to_process` leak on abort #26012

[Bugfix] Fix `_reqs_to_process` leak on abort #26012

NickLucche commented Oct 1, 2025 •

edited by github-actions bot

Loading