[NIXL] refactor scheduler->worker request state synchronization #26172

markmc · 2025-10-03T14:37:42Z

In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See #20139.

In #26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes refactoring the scheduler->worker request state synchronization to use a SCHEDULED/FINISHED/ABORTED enum rather than in_batch, to_send, and not_processed.

Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries.

gemini-code-assist

Code Review

This pull request refactors the send timeout tracking for KV block transfers in prefill instances by moving the logic from the worker side to the scheduler side. This simplifies the logic and better handles corner cases. The changes are well-structured and align with the stated goal. However, I've found a critical issue related to a type mismatch that will cause a runtime error. A bytes object is added to a set[str] in the worker, and then the scheduler attempts to decode it, which will fail. I've provided suggestions to fix this by ensuring type consistency across the worker-scheduler interface.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

markmc

Some comments on what I'm not fully happy with yet

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

mergify · 2025-10-06T13:17:11Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

NickLucche

Nice work @markmc !

Personally the major thing that seems a bit off to me is moving the heterogenous-related stuff from Worker->Scheduler. I was hoping we would be able to keep that transparent from the Scheduler, and let the worker handle that as well as the nixl_notifs, which I believe are meant to stay within "nixl agent's reach" (ie Worker-side).
This is also leading to having to keep consumer_count in a dataclass, which I believe is making the original changes from last week a bit more complicated.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

tests/v1/kv_connector/unit/test_nixl_connector.py

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

markmc · 2025-10-07T14:36:28Z

Nice work @markmc !

Personally the major thing that seems a bit off to me is moving the heterogenous-related stuff from Worker->Scheduler. I was hoping we would be able to keep that transparent from the Scheduler,

It remains hidden from the scheduler itself ... but yes, it's on the scheduler side of the NIXL connector logic

and let the worker handle that as well as the nixl_notifs, which I believe are meant to stay within "nixl agent's reach" (ie Worker-side).

Nothing jumps out at me as a reason for this being a significant violation of the scheduler/worker split in the connector - the scheduler side is naturally a sort of coordinator, so it being aware that it needs to wait for multiple sent notifications ... doesn't seem breaking any significant encapsulation?

Something needs to keep track of the number of notifications, and in the case of an abort, that counter needs to be freed/deleted ... if the worker is that something, then we can't avoid notifying the worker about the abort

This is also leading to having to keep consumer_count in a dataclass, which I believe is making the original changes from last week a bit more complicated.

For each request, we have a timeout and a consumer count ... I used a tuple initially, but it was a bit gross. It could be two dicts. I dunno ...

The changes are more significant, because I've refactored and introduced ReqsNeedSendTracker, but I'd see the end result as simpler and better encapsulated, rather than more complicated?

markmc · 2025-10-08T11:22:19Z

Looking at moving the consumer notification count stuff back to the worker ... my conclusion is that all of the complexity here comes from having to synchronize the state of requests between the worker and scheduler side, and we will need to do that if the consumer count is tracked by the worker ... so just moving the timeout doesn't achieve anything

So I tried some minor refactoring of the state synchronization

[NIXL] Refactor scheduler->worker request state synchronization

Use a SCHEDULED/FINISHED/ABORTED enum rather than in_batch, to_send, and not_processed.

Also move the expiry timestamp calculation to the worker side so we're not sending timestamps across process boundaries.

mergify · 2025-10-10T13:36:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-10-16T14:03:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Use a SCHEDULED/FINISHED/ENUM rather than in_batch, to_send, and not_processed. Also move the expiry timestamp calculation to the worker side so we're not sending timestamps across process boundaries. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

markmc requested review from ApostaC and NickLucche as code owners October 3, 2025 14:37

mergify bot added the kv-connector label Oct 3, 2025

gemini-code-assist bot reviewed Oct 3, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

markmc commented Oct 3, 2025

View reviewed changes

markmc mentioned this pull request Oct 3, 2025

[Bugfix] Fix _reqs_to_process leak on abort #26012

Merged

markmc commented Oct 3, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

markmc commented Oct 3, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

markmc force-pushed the nixl-rework-prefill-send-expiry branch from f26afba to b5796cc Compare October 4, 2025 12:46

mergify bot added the v1 label Oct 4, 2025

mergify bot added the needs-rebase label Oct 6, 2025

markmc force-pushed the nixl-rework-prefill-send-expiry branch from b5796cc to 9894745 Compare October 6, 2025 13:42

mergify bot removed the needs-rebase label Oct 6, 2025

NickLucche reviewed Oct 7, 2025

View reviewed changes

markmc force-pushed the nixl-rework-prefill-send-expiry branch from 9894745 to d384771 Compare October 8, 2025 11:20

markmc changed the title ~~NIXL: re-work send timeout tracking on prefill side~~ NIXL: refactor scheduler->worker request state synchronization Oct 8, 2025

markmc changed the title ~~NIXL: refactor scheduler->worker request state synchronization~~ [NIXL] refactor scheduler->worker request state synchronization Oct 8, 2025

mergify bot added the needs-rebase label Oct 10, 2025

markmc force-pushed the nixl-rework-prefill-send-expiry branch from d384771 to 63c07ee Compare October 10, 2025 14:09

mergify bot removed the needs-rebase label Oct 10, 2025

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025

mergify bot added the needs-rebase label Oct 16, 2025

markmc force-pushed the nixl-rework-prefill-send-expiry branch from 63c07ee to 9d47e78 Compare October 22, 2025 15:44

mergify bot removed the needs-rebase label Oct 22, 2025

markmc requested review from NickLucche and removed request for ApostaC October 22, 2025 18:58

markmc mentioned this pull request Nov 10, 2025

[Core] Send kv events from worker side to scheduler side #28309

Open

5 tasks

Uh oh!

[NIXL] refactor scheduler->worker request state synchronization #26172

Are you sure you want to change the base?

[NIXL] refactor scheduler->worker request state synchronization #26172

Uh oh!

Conversation

markmc commented Oct 3, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 6, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markmc commented Oct 7, 2025

Uh oh!

markmc commented Oct 8, 2025

Uh oh!

mergify bot commented Oct 10, 2025

Uh oh!

mergify bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markmc commented Oct 3, 2025 •

edited by github-actions bot

Loading