Skip to content

Conversation

@markmc
Copy link
Member

@markmc markmc commented Oct 3, 2025

In a prefill instance, we need to free KV blocks that have not been fetched after a timeout. See #20139.

In #26012, we're trying to deal with corner cases involved with doing this request timeout tracking on the worker side. This PR proposes refactoring the scheduler->worker request state synchronization to use a SCHEDULED/FINISHED/ABORTED enum rather than in_batch, to_send, and not_processed.

Note the expiry timer is switched back to monotonic time because the timestamp is no longer sent across process boundaries.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the send timeout tracking for KV block transfers in prefill instances by moving the logic from the worker side to the scheduler side. This simplifies the logic and better handles corner cases. The changes are well-structured and align with the stated goal. However, I've found a critical issue related to a type mismatch that will cause a runtime error. A bytes object is added to a set[str] in the worker, and then the scheduler attempts to decode it, which will fail. I've provided suggestions to fix this by ensuring type consistency across the worker-scheduler interface.

Copy link
Member Author

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on what I'm not fully happy with yet

@markmc markmc force-pushed the nixl-rework-prefill-send-expiry branch from f26afba to b5796cc Compare October 4, 2025 12:46
@mergify mergify bot added the v1 label Oct 4, 2025
@mergify
Copy link

mergify bot commented Oct 6, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 6, 2025
@markmc markmc force-pushed the nixl-rework-prefill-send-expiry branch from b5796cc to 9894745 Compare October 6, 2025 13:42
@mergify mergify bot removed the needs-rebase label Oct 6, 2025
Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @markmc !

Personally the major thing that seems a bit off to me is moving the heterogenous-related stuff from Worker->Scheduler. I was hoping we would be able to keep that transparent from the Scheduler, and let the worker handle that as well as the nixl_notifs, which I believe are meant to stay within "nixl agent's reach" (ie Worker-side).
This is also leading to having to keep consumer_count in a dataclass, which I believe is making the original changes from last week a bit more complicated.

@markmc
Copy link
Member Author

markmc commented Oct 7, 2025

Nice work @markmc !

Personally the major thing that seems a bit off to me is moving the heterogenous-related stuff from Worker->Scheduler. I was hoping we would be able to keep that transparent from the Scheduler,

It remains hidden from the scheduler itself ... but yes, it's on the scheduler side of the NIXL connector logic

and let the worker handle that as well as the nixl_notifs, which I believe are meant to stay within "nixl agent's reach" (ie Worker-side).

Nothing jumps out at me as a reason for this being a significant violation of the scheduler/worker split in the connector - the scheduler side is naturally a sort of coordinator, so it being aware that it needs to wait for multiple sent notifications ... doesn't seem breaking any significant encapsulation?

Something needs to keep track of the number of notifications, and in the case of an abort, that counter needs to be freed/deleted ... if the worker is that something, then we can't avoid notifying the worker about the abort

This is also leading to having to keep consumer_count in a dataclass, which I believe is making the original changes from last week a bit more complicated.

For each request, we have a timeout and a consumer count ... I used a tuple initially, but it was a bit gross. It could be two dicts. I dunno ...

The changes are more significant, because I've refactored and introduced ReqsNeedSendTracker, but I'd see the end result as simpler and better encapsulated, rather than more complicated?

@markmc markmc force-pushed the nixl-rework-prefill-send-expiry branch from 9894745 to d384771 Compare October 8, 2025 11:20
@markmc
Copy link
Member Author

markmc commented Oct 8, 2025

Looking at moving the consumer notification count stuff back to the worker ... my conclusion is that all of the complexity here comes from having to synchronize the state of requests between the worker and scheduler side, and we will need to do that if the consumer count is tracked by the worker ... so just moving the timeout doesn't achieve anything

So I tried some minor refactoring of the state synchronization

[NIXL] Refactor scheduler->worker request state synchronization

Use a SCHEDULED/FINISHED/ABORTED enum rather than in_batch, to_send, and not_processed.

Also move the expiry timestamp calculation to the worker side so we're not sending timestamps across process boundaries.

@markmc markmc changed the title NIXL: re-work send timeout tracking on prefill side NIXL: refactor scheduler->worker request state synchronization Oct 8, 2025
@markmc markmc changed the title NIXL: refactor scheduler->worker request state synchronization [NIXL] refactor scheduler->worker request state synchronization Oct 8, 2025
@mergify
Copy link

mergify bot commented Oct 10, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 10, 2025
@markmc markmc force-pushed the nixl-rework-prefill-send-expiry branch from d384771 to 63c07ee Compare October 10, 2025 14:09
@mergify mergify bot removed the needs-rebase label Oct 10, 2025
@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
@mergify
Copy link

mergify bot commented Oct 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @markmc.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 16, 2025
Use a SCHEDULED/FINISHED/ENUM rather than in_batch, to_send, and
not_processed.

Also move the expiry timestamp calculation to the worker side
so we're not sending timestamps across process boundaries.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the nixl-rework-prefill-send-expiry branch from 63c07ee to 9d47e78 Compare October 22, 2025 15:44
@mergify mergify bot removed the needs-rebase label Oct 22, 2025
@markmc markmc requested review from NickLucche and removed request for ApostaC October 22, 2025 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants