[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache #27170

ptovam · 2025-10-19T10:44:37Z

Purpose

Extends the existing /reset_prefix_cache endpoint with an optional reset_external flag.
When set, both the local (GPU) and external (connector) prefix caches are reset.

This allows clearing connector-side caches dynamically without restarting the server -useful for benchmarking and testing.

gemini-code-assist

Code Review

This pull request introduces a new API to reset the KV connector cache, which is a useful feature for testing and benchmarking. The implementation is well-structured, adding the reset_cache method to the base connector class and implementing it through the stack for both synchronous and asynchronous paths.

My main feedback is about a bug in the synchronous path where the boolean return value indicating the success of the cache reset is not propagated up to the LLM class. This results in LLM.reset_connector_cache having an incorrect return type hint and behavior. I've provided detailed comments and suggestions to fix this issue across the affected files. The asynchronous path used by the API server correctly handles this by design, as it doesn't check the return status.

Overall, the changes are good, and with the suggested fix, the new API will be more robust.

vllm/entrypoints/llm.py

vllm/v1/engine/llm_engine.py

vllm/v1/engine/core.py

vllm/v1/engine/core_client.py

markmc · 2025-10-20T08:26:53Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

ptovam · 2025-10-20T09:08:53Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Good point, there's no strict need for a separate endpoint.
The goal was mainly to keep the option to reset only the GPU prefix cache, or both GPU and connector caches together.
Extending /reset_prefix_cache with an optional flag would work just as well if that makes more sense.

ptovam · 2025-10-20T16:35:27Z

Is there a compelling reason to add a new endpoint? It would seem reasonable to piggy-back on /reset_prefix_cache ?

Updated to extend /reset_prefix_cache with an optional reset_external flag, replacing the separate endpoint.

markmc

lgtm

vllm/entrypoints/openai/api_server.py

DarkLight1337 · 2025-10-22T13:19:36Z

cc @NickLucche

NickLucche

lgtm

NickLucche

Actually, sorry, one comment on the interface for the multiconnector case

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

NickLucche · 2025-10-22T13:28:23Z

vllm/distributed/kv_transfer/kv_connector/v1/base.py

+        Returns:
+            bool: True if the cache was successfully reset, False otherwise.
+        """
+        return False


this could be optional to implement the suggestion below

Optional[bool] or bool | None as return type, to allow for the all(..) check to work fine (you rule-out the connectors that do NOT implement it)

Why treat connectors that haven’t implemented this as successful?
It could confuse users who aren’t aware of the implementation details.

connectors that have not implemented it will return None which is neither.
If you do any(..), one sub-connector being successful will result in multiconnector returning successful transfers.

What I would expect semantically is that multiconnector only returns true when all sub-connectors that implement the interface return True (hence None return value is needed to discern those that have implemented and yet fail).

Got it - I’ve pushed a commit with that change, thanks for clarifying.

ptovam · 2025-10-23T09:19:16Z

Once the prefix cache metrics get merged - #26245, we should reset the metric too.

Edit: Done

mergify · 2025-10-28T22:41:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ptovam.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-10-30T04:14:06Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ptovam.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…efix_cache Signed-off-by: tovam <tovam@pliops.com>

Signed-off-by: tovam <tovam@pliops.com>

ptovam · 2025-11-06T09:58:46Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a feature to reset the external KV connector cache via the /reset_prefix_cache endpoint, controlled by a new reset_external flag. The changes are well-propagated from the API layer down to the scheduler and connectors. My review focuses on the implementation of the reset logic, where I've identified two high-severity issues related to short-circuiting evaluation that could prevent all relevant caches from being reset. I've provided suggestions to ensure the reset operations are always fully executed as intended.

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py

vllm/v1/core/sched/scheduler.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

ptovam requested review from ApostaC, NickLucche, WoosukKwon, aarnphm, alexm-redhat, chaunceyjiang, comaniac, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners October 19, 2025 10:44

mergify bot added frontend v1 kv-connector labels Oct 19, 2025

gemini-code-assist bot reviewed Oct 19, 2025

View reviewed changes

ptovam force-pushed the reset_connector_cache branch from 4b86b33 to ae370d8 Compare October 20, 2025 16:29

markmc approved these changes Oct 21, 2025

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

ptovam force-pushed the reset_connector_cache branch from ae370d8 to dc9168f Compare October 21, 2025 06:43

ptovam changed the title ~~[KVConnector][Feature] Add API to reset KV connector cache~~ [KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache Oct 21, 2025

markmc added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025

NickLucche approved these changes Oct 22, 2025

View reviewed changes

NickLucche requested changes Oct 22, 2025

View reviewed changes

ptovam force-pushed the reset_connector_cache branch 2 times, most recently from 1dd3ae5 to 26292d1 Compare October 23, 2025 09:04

ptovam force-pushed the reset_connector_cache branch from 26292d1 to e8045dc Compare October 23, 2025 11:50

mergify bot added the needs-rebase label Oct 28, 2025

ptovam force-pushed the reset_connector_cache branch from e8045dc to cbd7b3d Compare October 29, 2025 09:19

mergify bot removed the needs-rebase label Oct 29, 2025

mergify bot added the needs-rebase label Oct 30, 2025

ptovam added 4 commits October 30, 2025 09:29

[KVConnector][Feature] Support KV connector cache reset via /reset_pr…

f94cb15

…efix_cache Signed-off-by: tovam <tovam@pliops.com>

Use all() to ensure all connector cache resets succeed

a00cb91

Signed-off-by: tovam <tovam@pliops.com>

Make reset_cache optionally return None for unimplemented connectors

aee273c

Signed-off-by: tovam <tovam@pliops.com>

Reset prefix cache metric

fce5383

Signed-off-by: tovam <tovam@pliops.com>

ptovam force-pushed the reset_connector_cache branch from cbd7b3d to fce5383 Compare October 30, 2025 07:30

mergify bot removed the needs-rebase label Oct 30, 2025

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py Outdated Show resolved Hide resolved

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

ptovam and others added 3 commits November 9, 2025 13:36

multi-connector: ensure all connectors are attempted to be reset

cac8f27

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

ensure local and connector caches are reset independently

cc88d3c

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tova Movshovitz <tovam@pliops.com>

Merge branch 'main' into reset_connector_cache

395dd1a

Uh oh!

[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache #27170

Are you sure you want to change the base?

[KVConnector][Feature] Support KV connector cache reset via /reset_prefix_cache #27170

Uh oh!

Conversation

ptovam commented Oct 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

ptovam commented Oct 20, 2025

Uh oh!

ptovam commented Oct 20, 2025

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DarkLight1337 commented Oct 22, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

ptovam commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Oct 28, 2025

Uh oh!

mergify bot commented Oct 30, 2025

Uh oh!

ptovam commented Nov 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ptovam commented Oct 19, 2025 •

edited by github-actions bot

Loading

ptovam commented Oct 23, 2025 •

edited

Loading