Skip to content

Conversation

@ptovam
Copy link
Contributor

@ptovam ptovam commented Oct 19, 2025

Purpose

Extends the existing /reset_prefix_cache endpoint with an optional reset_external flag.
When set, both the local (GPU) and external (connector) prefix caches are reset.

This allows clearing connector-side caches dynamically without restarting the server -useful for benchmarking and testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new API to reset the KV connector cache, which is a useful feature for testing and benchmarking. The implementation is well-structured, adding the reset_cache method to the base connector class and implementing it through the stack for both synchronous and asynchronous paths.

My main feedback is about a bug in the synchronous path where the boolean return value indicating the success of the cache reset is not propagated up to the LLM class. This results in LLM.reset_connector_cache having an incorrect return type hint and behavior. I've provided detailed comments and suggestions to fix this issue across the affected files. The asynchronous path used by the API server correctly handles this by design, as it doesn't check the return status.

Overall, the changes are good, and with the suggested fix, the new API will be more robust.

@markmc
Copy link
Member

markmc commented Oct 20, 2025

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

@ptovam
Copy link
Contributor Author

ptovam commented Oct 20, 2025

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Good point, there's no strict need for a separate endpoint.
The goal was mainly to keep the option to reset only the GPU prefix cache, or both GPU and connector caches together.
Extending /reset_prefix_cache with an optional flag would work just as well if that makes more sense.

@ptovam ptovam force-pushed the reset_connector_cache branch from 4b86b33 to ae370d8 Compare October 20, 2025 16:29
@ptovam
Copy link
Contributor Author

ptovam commented Oct 20, 2025

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Updated to extend /reset_prefix_cache with an optional reset_external flag, replacing the separate endpoint.

Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ptovam ptovam force-pushed the reset_connector_cache branch from ae370d8 to dc9168f Compare October 21, 2025 06:43
@ptovam ptovam changed the title [KVConnector][Feature] Add API to reset KV connector cache [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache Oct 21, 2025
@markmc markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025
@DarkLight1337
Copy link
Member

cc @NickLucche

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, sorry, one comment on the interface for the multiconnector case

Returns:
bool: True if the cache was successfully reset, False otherwise.
"""
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be optional to implement the suggestion below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[bool] or bool | None as return type, to allow for the all(..) check to work fine (you rule-out the connectors that do NOT implement it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why treat connectors that haven’t implemented this as successful?
It could confuse users who aren’t aware of the implementation details.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectors that have not implemented it will return None which is neither.
If you do any(..), one sub-connector being successful will result in multiconnector returning successful transfers.

What I would expect semantically is that multiconnector only returns true when all sub-connectors that implement the interface return True (hence None return value is needed to discern those that have implemented and yet fail).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - I’ve pushed a commit with that change, thanks for clarifying.

@ptovam ptovam force-pushed the reset_connector_cache branch 2 times, most recently from 1dd3ae5 to 26292d1 Compare October 23, 2025 09:04
@ptovam
Copy link
Contributor Author

ptovam commented Oct 23, 2025

Once the prefix cache metrics get merged - #26245, we should reset the metric too.

Edit: Done

@ptovam ptovam force-pushed the reset_connector_cache branch from 26292d1 to e8045dc Compare October 23, 2025 11:50
@mergify
Copy link

mergify bot commented Oct 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ptovam.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 28, 2025
@ptovam ptovam force-pushed the reset_connector_cache branch from e8045dc to cbd7b3d Compare October 29, 2025 09:19
@mergify mergify bot removed the needs-rebase label Oct 29, 2025
@mergify
Copy link

mergify bot commented Oct 30, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ptovam.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 30, 2025
…efix_cache

Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: tovam <tovam@pliops.com>
@ptovam ptovam force-pushed the reset_connector_cache branch from cbd7b3d to fce5383 Compare October 30, 2025 07:30
@mergify mergify bot removed the needs-rebase label Oct 30, 2025
@ptovam
Copy link
Contributor Author

ptovam commented Nov 6, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to reset the external KV connector cache via the /reset_prefix_cache endpoint, controlled by a new reset_external flag. The changes are well-propagated from the API layer down to the scheduler and connectors. My review focuses on the implementation of the reset logic, where I've identified two high-severity issues related to short-circuiting evaluation that could prevent all relevant caches from being reset. I've provided suggestions to ensure the reset operations are always fully executed as intended.

ptovam and others added 3 commits November 9, 2025 13:36
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants