Skip to content

Conversation

@sdavidbd
Copy link
Contributor

@sdavidbd sdavidbd commented May 8, 2025

vLLM recently introduced the KVConnectorBase_V1 API (#15960) which provides a standardized interface for integrating external KV cache offload, sharing, and transfer solutions - such as those used in prefill-decode (P-D) disaggregation. A concrete example of this is the LMCache connector (#16625).

To avoid duplicating boilerplate code across multiple vendor-specific connectors that primarily delegate to external packages, this PR adds a GenericKVConnector. This connector delegates all KVConnectorBase_V1 abstract methods to an external implementation provided at runtime. This approach enables vendors to encapsulate their custom logic in their own packages without requiring changes to the vLLM codebase.

For example, the current LMCache integration can switch from using a dedicated connector class to GenericKVConnector as follows:

KVTransferConfig(
    kv_connector="GenericKVConnector",
    kv_role="kv_both",
    kv_connector_extra_config={
        GenericKVConnector.ExtraConfigKeys.CONNECTOR_IMPL_MODULE_PATH:
            "lmcache.integration.vllm.vllm_v1_adapter",
        GenericKVConnector.ExtraConfigKeys.CONNECTOR_IMPL_CLASS_NAME:
            "LMCacheConnectorV1Impl",
         
        # Additional LMCache-specific config entries can be added here
        #"lmcache_config_key": "value",
    },
)

This design simplifies integration while keeping the core vLLM codebase clean and extensible.

@github-actions
Copy link

github-actions bot commented May 8, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@sdavidbd sdavidbd changed the title Add GenericKVConnector to delegate KVConnectorBase_V1 API [P/D][V1] Add generic KV Connector for delegation to external implementations May 8, 2025
@mgoin mgoin requested a review from robertgshaw2-redhat May 8, 2025 16:20
@njhill
Copy link
Member

njhill commented May 8, 2025

@sdavidbd I don't really follow what this buys you. If there is an external implementation of the V1 connector API then you can use it directly. If the external library doesn't implement the vllm connector API directly then an custom adapter implementation is needed regardless?

@sdavidbd
Copy link
Contributor Author

sdavidbd commented May 8, 2025

@sdavidbd I don't really follow what this buys you. If there is an external implementation of the V1 connector API then you can use it directly. If the external library doesn't implement the vllm connector API directly then an custom adapter implementation is needed regardless?

Thanks Nick! Using an external implementation of KVConnectorBase_V1 still requires upstreaming a concrete class into the vLLM codebase - one that simply wraps and delegates to the actual logic in the external package (e.g., like the current LMCacheConnectorV1).

The goal of GenericKVConnector is to avoid this duplication by providing a reusable delegating connector that dynamically loads and invokes an implementation defined in the external library. This allows vendors to ship their connector logic entirely within their own packages, enabling users to integrate with vLLM without needing to upstream or maintain a dedicated wrapper in the vLLM repo.

@njhill
Copy link
Member

njhill commented May 11, 2025

Thanks @sdavidbd. But I still don't see why this is needed. You can just call the below before starting vllm and then reference it directly in the passed KVTransferConfig.

KVConnectorFactory.register_connector(
    "MyConnector",
    "myconnector.module.path",
    "MyConnectorClass")

@da-x
Copy link

da-x commented May 12, 2025

@njhill I think the main issue is getting the call to KVConnectorFactory.register_connector happen in the first place, in a standalone vllm execution. Suppose you are using vLLM as a library, then you are right, but otherwise @sdavidbd's change helps.

For standalone vllm execution, you may wind up writing wrappers such this:

#!/usr/bin/env python

import re
import sys

from vllm.entrypoints.cli.main import main
# Just import this so it gets registered as a plugin
from vua.vllm.kv_connector_v1 import VUAStorageConnector_V1

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

And not as a setuptools entrypoint, otherwise it breaks — I've attempted other ways, and in each of them the problem was that mutlithreading usage in vllm/engine/multiprocessing was re-executing vllm's main module entrypoint without doing the import that would register the KV connector.

@mergify
Copy link

mergify bot commented May 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sdavidbd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 12, 2025
@liuzijing2014
Copy link
Collaborator

@sdavidbd @da-x @njhill We are seeing a similar need that we have an external implementation of kv connector (which is vendor internal implementation that we can't pull up a PR to merge it).

KVConnectorFactory.register_connector(
    "MyConnector",
    "myconnector.module.path",
    "MyConnectorClass")

This wouldn't work if we are using TP > 1 because different GPU workers would run on their own processes.

Another proposal: do you guys think we can just pass in kv connector cls as a parameter to the core engine just like how we do with model executor?

self.engine_core = core_client_class(
vllm_config=vllm_config,
executor_class=executor_class,
log_stats=self.log_stats,
)

@njhill
Copy link
Member

njhill commented May 12, 2025

Thanks @da-x @liuzijing2014, I see what you mean. Yes I'm still not sure a wrapper implementation is necessarily the way to solve this... hopefully we can follow a similar plugin mechanism that's used for other things like @liuzijing2014 said. See https://docs.vllm.ai/en/latest/design/plugin_system.html

Delegates all abstract methods to a dynamically loaded external implementation

Signed-off-by: David Ben-David <davidb@pliops.com>
@sdavidbd sdavidbd force-pushed the feature/generic-kv-connector branch from 84204d5 to b3fb374 Compare May 12, 2025 21:08
@mergify mergify bot removed the needs-rebase label May 12, 2025
@liuzijing2014
Copy link
Collaborator

KVConnectorFactory.register_connector(
    "MyConnector",
    "myconnector.module.path",
    "MyConnectorClass")

#18028 @da-x @njhill @sdavidbd let us know what do you think. It should be a more lightweight approach.

@sdavidbd
Copy link
Contributor Author

sdavidbd commented May 14, 2025

@njhill @da-x @liuzijing2014 Thanks everyone for the helpful and thoughtful feedback — much appreciated!

Indeed, the need for a delegating connector like GenericKVConnector felt a bit awkward in retrospect. Your comments helped me realize that requiring any explicit registration step is actually redundant. A cleaner approach is to dynamically load the external KV connector implementation directly from config, without needing to register or wrap it at all.

This avoids the need to upstream boilerplate delegating modules for each vendor implementation, and keeps the vLLM codebase cleaner and more extensible.

I've opened a new PR that takes this improved direction: #18142

Looking forward to your thoughts on the updated approach!

@mergify
Copy link

mergify bot commented May 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sdavidbd.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 14, 2025
@sdavidbd sdavidbd closed this May 18, 2025
@sdavidbd
Copy link
Contributor Author

Closing in favor of #18142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants