Skip to content

Conversation

@NickLucche
Copy link
Collaborator

@NickLucche NickLucche commented Oct 1, 2025

Fix a small leak that can happen whenever a request is aborted while prefilling on P, causing its id to not be removed from the set.
I've added a unit test to clarify the scenario.
The test should fail on main with

>       assert req.request_id not in connector.connector_worker._reqs_to_process
E       AssertionError: assert 'id-1' not in {'id-1'}
E        +  where 'id-1' = <vllm.v1.request.Request object at 0x7f58fc128a10>.request_id
E        +  and   {'id-1'} = <tests.v1.kv_connector.unit.test_nixl_connector.FakeNixlConnectorWorker object at 0x7f58fc1289b0>._reqs_to_process
E        +    where <tests.v1.kv_connector.unit.test_nixl_connector.FakeNixlConnectorWorker object at 0x7f58fc1289b0> = <vllm.distributed.kv_transfer.kv_connector.v1.nixl_connector.NixlConnector object at 0x7f5901481760>.connector_worker

Signed-off-by: NickLucche <nlucches@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a memory leak in the NixlConnector where request IDs of aborted requests were not being removed from the _reqs_to_process set on the worker. The fix introduces a new mechanism, reqs_not_processed, to explicitly communicate aborted requests from the scheduler to the worker, ensuring they are cleaned up correctly. The changes are well-targeted, and the logic appears sound. A new unit test has been added that effectively reproduces the bug scenario and verifies the fix. The implementation is clean and correctly resolves the identified leak.

@markmc
Copy link
Member

markmc commented Oct 2, 2025

@NickLucche as discussed, seeing metadata reqs_to_send, reqs_in_batch, reqs_not_processed, etc. made me wonder whether we can simplify this greatly by dealing with the expiry timeout on the scheduler side

Here's a WIP commit - markmc@8c0a6bd

Several FIXMEs to deal with there yet, but I'm reasonably convinced now that this is doable and is simpler? lmkwyt

markmc added a commit to markmc/vllm that referenced this pull request Oct 3, 2025
In a prefill instance, we need to free KV blocks that have
not been fetched after a timeout. See vllm-project#20139.

In vllm-project#26012, we're trying to deal with corner cases involved
with doing this request timeout tracking on the worker side.
This PR proposes moving all of this to the scheduler side,
hopefully making the logic simpler.

Note the expiry timer is switched back to monotonic time
because the timestamp is no longer sent across process
boundaries.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc
Copy link
Member

markmc commented Oct 3, 2025

@NickLucche as discussed, seeing metadata reqs_to_send, reqs_in_batch, reqs_not_processed, etc. made me wonder whether we can simplify this greatly by dealing with the expiry timeout on the scheduler side

Here's a WIP commit - markmc@8c0a6bd

Several FIXMEs to deal with there yet, but I'm reasonably convinced now that this is doable and is simpler? lmkwyt

Iterated some more and submitted as #26172

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NickLucche! LGTM

I agree with @markmc that it's getting a bit complicated with all of the sets/dicts, I feel like we could consolidate these into a smaller number of dicts.

I'll try to look at @markmc's simplification proposal PR soon too.

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 3, 2025
@njhill njhill enabled auto-merge (squash) October 3, 2025 22:32
@njhill njhill merged commit 2a6dc67 into vllm-project:main Oct 4, 2025
50 checks passed
markmc added a commit to markmc/vllm that referenced this pull request Oct 4, 2025
In a prefill instance, we need to free KV blocks that have
not been fetched after a timeout. See vllm-project#20139.

In vllm-project#26012, we're trying to deal with corner cases involved
with doing this request timeout tracking on the worker side.
This PR proposes moving all of this to the scheduler side,
hopefully making the logic simpler.

Note the expiry timer is switched back to monotonic time
because the timestamp is no longer sent across process
boundaries.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
@markmc
Copy link
Member

markmc commented Oct 6, 2025

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm-venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/gpu_worker.py", line 473, in execute_model
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm-venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/gpu_model_runner.py", line 2558, in execute_model
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     with (
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/usr/lib64/python3.12/contextlib.py", line 144, in __exit__
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     next(self.gen)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/worker/kv_connector_model_runner_mixin.py", line 132, in _get_kv_connector_output
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     kv_connector.get_finished(scheduler_output.finished_req_ids)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

markmc added a commit to markmc/vllm that referenced this pull request Oct 6, 2025
In a prefill instance, we need to free KV blocks that have
not been fetched after a timeout. See vllm-project#20139.

In vllm-project#26012, we're trying to deal with corner cases involved
with doing this request timeout tracking on the worker side.
This PR proposes moving all of this to the scheduler side,
hopefully making the logic simpler.

Note the expiry timer is switched back to monotonic time
because the timestamp is no longer sent across process
boundaries.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc
Copy link
Member

markmc commented Oct 6, 2025

I think we're missing this?

        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)

karan pushed a commit to karan/vllm that referenced this pull request Oct 6, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
@njhill
Copy link
Member

njhill commented Oct 6, 2025

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
...
...
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

@markmc what commit were you running when you hit this? I can't find the version of the code which has self._reqs_to_process.remove(req_id) on line 1240 of nixl_connector.py.

I think we're missing this?

        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)

I don't think so... reqs_not_processed are the ones which haven't been / won't be added to _reqs_to_send, and hence why we need to clean them up here. Any other requests will be added to _reqs_to_send and need to remain in there until the timeout (or nixl notification, whichever comes sooner...).

@markmc
Copy link
Member

markmc commented Oct 7, 2025

In trying to (rather artificially) reproduce #25067 I hit this:

(Worker pid=1288566) WARNING 10-06 07:51:24 [distributed/.../v1/nixl_connector.py:1233] Releasing expired KV blocks for request cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0 which were retrieved by 0 decode worker(s) within 480 seconds.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] WorkerProc hit an exception.
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] Traceback (most recent call last):
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     output = func(*args, **kwargs)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
...
...
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 235, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     return self.connector_worker.get_finished()
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]   File "/home/markmc/vllm-project/vllm/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1240, in get_finished
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706]     self._reqs_to_process.remove(req_id)
(Worker pid=1288566) ERROR 10-06 07:51:24 [v1/executor/multiproc_executor.py:706] KeyError: 'cmpl-f83a7b2e-0b55-4796-94f4-d4ad700df5b9-0'

This looks like another KeyError-on-abort-after-finish scenario

@markmc what commit were you running when you hit this? I can't find the version of the code which has self._reqs_to_process.remove(req_id) on line 1240 of nixl_connector.py.

Yeah, sorry. It's commit 59f30d0 with this extra debugging:

--- a/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
@@ -1222,6 +1222,12 @@ class NixlConnectorWorker:
 
         # Handle timeout to avoid stranding blocks on remote.
         now = time.perf_counter()
+        if self._reqs_to_send:
+            logger.debug(
+                "Checking expiry of %d requests: %s",
+                len(self._reqs_to_send),
+                self._reqs_to_send.keys(),
+            )
         while self._reqs_to_send:
             req_id, expires = next(iter(self._reqs_to_send.items()))
             # Sorted dict, oldest requests are put first so we can exit early.

I think we're missing this?

        for req_id in metadata.reqs_not_processed:
+           del self._reqs_to_send[req_id]
            self._reqs_to_process.discard(req_id)

I don't think so... reqs_not_processed are the ones which haven't been / won't be added to _reqs_to_send, and hence why we need to clean them up here. Any other requests will be added to _reqs_to_send and need to remain in there until the timeout (or nixl notification, whichever comes sooner...).

The rare abort-after-finish scenario means the request could be in reqs_to_send at this point ... so this is really what I meant:

        for req_id in metadata.reqs_not_processed:
+          self._reqs_to_send.pop(req_id, None)
            self._reqs_to_process.discard(req_id)

The way I see it:

reqs_to_process:

  • added when a remote decode request is scheduled in a batch
  • deleted when we have received finished notifications
  • deleted when the expiry timer fires
  • deleted when a request is aborted

reqs_to_send:

  • reqs added when prefill finishes and we set an expiry timer, only if already in reqs_to_process
  • deleted when we have received finished notifications
  • deleted when the expiry timer fires
  • (currently missing) deleted when a request is aborted

southfreebird pushed a commit to southfreebird/vllm that referenced this pull request Oct 7, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
markmc added a commit to markmc/vllm that referenced this pull request Oct 7, 2025
We have observed a rare scenario with AsyncLLM where a client disconnect
triggers an abort request after the request has finished, but before
AsyncLLM has processed the request output.

See vllm-project#26012, vllm-project#25067, vllm-project#25844, and llm-d/llm-d#187.

Without the fix, the unit test fails with:

```
            logger.warning(
                "Releasing expired KV blocks for request %s which were "
                "retrieved by %d decode worker(s) within %d seconds.",
                req_id,
                count,
                envs.VLLM_NIXL_ABORT_REQUEST_TIMEOUT,
            )
>           self._reqs_to_process.remove(req_id)
E           KeyError: '0'

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py:1238: KeyError
```

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc
Copy link
Member

markmc commented Oct 7, 2025

The rare abort-after-finish scenario means the request could be in reqs_to_send at this point ... so this is really what I meant:

        for req_id in metadata.reqs_not_processed:
+          self._reqs_to_send.pop(req_id, None)
            self._reqs_to_process.discard(req_id)

Submitted as #26351

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: NickLucche <nlucches@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants