-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[P/D][V1] Add generic KV Connector for delegation to external implementations #17840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
@sdavidbd I don't really follow what this buys you. If there is an external implementation of the V1 connector API then you can use it directly. If the external library doesn't implement the vllm connector API directly then an custom adapter implementation is needed regardless? |
Thanks Nick! Using an external implementation of The goal of |
|
Thanks @sdavidbd. But I still don't see why this is needed. You can just call the below before starting vllm and then reference it directly in the passed KVTransferConfig. KVConnectorFactory.register_connector(
"MyConnector",
"myconnector.module.path",
"MyConnectorClass") |
|
@njhill I think the main issue is getting the call to For standalone vllm execution, you may wind up writing wrappers such this: #!/usr/bin/env python
import re
import sys
from vllm.entrypoints.cli.main import main
# Just import this so it gets registered as a plugin
from vua.vllm.kv_connector_v1 import VUAStorageConnector_V1
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
sys.exit(main())And not as a setuptools entrypoint, otherwise it breaks — I've attempted other ways, and in each of them the problem was that mutlithreading usage in |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@sdavidbd @da-x @njhill We are seeing a similar need that we have an external implementation of kv connector (which is vendor internal implementation that we can't pull up a PR to merge it). This wouldn't work if we are using TP > 1 because different GPU workers would run on their own processes. Another proposal: do you guys think we can just pass in kv connector cls as a parameter to the core engine just like how we do with model executor? vllm/vllm/v1/engine/async_llm.py Lines 118 to 122 in 289199f
|
|
Thanks @da-x @liuzijing2014, I see what you mean. Yes I'm still not sure a wrapper implementation is necessarily the way to solve this... hopefully we can follow a similar plugin mechanism that's used for other things like @liuzijing2014 said. See https://docs.vllm.ai/en/latest/design/plugin_system.html |
Delegates all abstract methods to a dynamically loaded external implementation Signed-off-by: David Ben-David <davidb@pliops.com>
84204d5 to
b3fb374
Compare
|
@njhill @da-x @liuzijing2014 Thanks everyone for the helpful and thoughtful feedback — much appreciated! Indeed, the need for a delegating connector like GenericKVConnector felt a bit awkward in retrospect. Your comments helped me realize that requiring any explicit registration step is actually redundant. A cleaner approach is to dynamically load the external KV connector implementation directly from config, without needing to register or wrap it at all. This avoids the need to upstream boilerplate delegating modules for each vendor implementation, and keeps the vLLM codebase cleaner and more extensible. I've opened a new PR that takes this improved direction: #18142 Looking forward to your thoughts on the updated approach! |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing in favor of #18142 |
vLLM recently introduced the KVConnectorBase_V1 API (#15960) which provides a standardized interface for integrating external KV cache offload, sharing, and transfer solutions - such as those used in prefill-decode (P-D) disaggregation. A concrete example of this is the LMCache connector (#16625).
To avoid duplicating boilerplate code across multiple vendor-specific connectors that primarily delegate to external packages, this PR adds a GenericKVConnector. This connector delegates all KVConnectorBase_V1 abstract methods to an external implementation provided at runtime. This approach enables vendors to encapsulate their custom logic in their own packages without requiring changes to the vLLM codebase.
For example, the current LMCache integration can switch from using a dedicated connector class to GenericKVConnector as follows:
This design simplifies integration while keeping the core vLLM codebase clean and extensible.