-
Notifications
You must be signed in to change notification settings - Fork 680
fix: re-enable deduplication #2343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedFailed to post review comments. WalkthroughThis change introduces a distributed Key-Value Block Manager (KVBM) system with deep integration into vLLM, including a new multi-stage Dockerfile, Rust and Python bindings, distributed leader-worker block management, connector protocols, and comprehensive testing and documentation. The update refactors block data handling for locality, adds connector APIs, and implements vLLM protocol adapters. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant PythonApp
participant KvbmCacheManager
participant RustKVBM
participant DistributedLeader
participant DistributedWorker
User->>PythonApp: Issues inference request
PythonApp->>KvbmCacheManager: create_slot(request)
KvbmCacheManager->>RustKVBM: Allocate blocks, manage slot
RustKVBM->>DistributedLeader: (If leader) Setup, synchronize workers
RustKVBM->>DistributedWorker: (If worker) Register, handle block transfer
PythonApp->>KvbmCacheManager: allocate_slots(request, tokens)
KvbmCacheManager->>RustKVBM: Block allocation, onboarding, transfer
RustKVBM->>DistributedWorker: Transfer blocks if needed
DistributedWorker-->>RustKVBM: Transfer complete
KvbmCacheManager-->>PythonApp: Blocks ready for inference
PythonApp-->>User: Returns inference result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
vLLM's block manager do not dedup.
We allowed duplication when we were override the vLLM device block manager.
Moving forward with the connector api, we want to dedup on host and disk. In the future, we will revisit how we interact with the native, or replace the native, block manager.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Chores
Tests
Refactor
End-users can now leverage distributed KV cache management for large language models, with improved reliability, extensibility, and documentation.