-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache #27170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new API to reset the KV connector cache, which is a useful feature for testing and benchmarking. The implementation is well-structured, adding the reset_cache method to the base connector class and implementing it through the stack for both synchronous and asynchronous paths.
My main feedback is about a bug in the synchronous path where the boolean return value indicating the success of the cache reset is not propagated up to the LLM class. This results in LLM.reset_connector_cache having an incorrect return type hint and behavior. I've provided detailed comments and suggestions to fix this issue across the affected files. The asynchronous path used by the API server correctly handles this by design, as it doesn't check the return status.
Overall, the changes are good, and with the suggested fix, the new API will be more robust.
|
Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on |
Good point, there's no strict need for a separate endpoint. |
4b86b33 to
ae370d8
Compare
Updated to extend |
markmc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ae370d8 to
dc9168f
Compare
|
cc @NickLucche |
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
NickLucche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, sorry, one comment on the interface for the multiconnector case
vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py
Outdated
Show resolved
Hide resolved
| Returns: | ||
| bool: True if the cache was successfully reset, False otherwise. | ||
| """ | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be optional to implement the suggestion below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional[bool] or bool | None as return type, to allow for the all(..) check to work fine (you rule-out the connectors that do NOT implement it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why treat connectors that haven’t implemented this as successful?
It could confuse users who aren’t aware of the implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connectors that have not implemented it will return None which is neither.
If you do any(..), one sub-connector being successful will result in multiconnector returning successful transfers.
What I would expect semantically is that multiconnector only returns true when all sub-connectors that implement the interface return True (hence None return value is needed to discern those that have implemented and yet fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - I’ve pushed a commit with that change, thanks for clarifying.
1dd3ae5 to
26292d1
Compare
|
Once the prefix cache metrics get merged - #26245, we should reset the metric too. Edit: Done |
26292d1 to
e8045dc
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
e8045dc to
cbd7b3d
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
…efix_cache Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: tovam <tovam@pliops.com>
Signed-off-by: tovam <tovam@pliops.com>
cbd7b3d to
fce5383
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to reset the external KV connector cache via the /reset_prefix_cache endpoint, controlled by a new reset_external flag. The changes are well-propagated from the API layer down to the scheduler and connectors. My review focuses on the implementation of the reset logic, where I've identified two high-severity issues related to short-circuiting evaluation that could prevent all relevant caches from being reset. I've provided suggestions to ensure the reset operations are always fully executed as intended.
vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>
Purpose
Extends the existing
/reset_prefix_cacheendpoint with an optionalreset_externalflag.When set, both the local (GPU) and external (connector) prefix caches are reset.
This allows clearing connector-side caches dynamically without restarting the server -useful for benchmarking and testing.