-
Notifications
You must be signed in to change notification settings - Fork 676
connector/250801 #2244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connector/250801 #2244
Conversation
Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
…n/connector-250731
|
Caution Review failedFailed to post review comments. WalkthroughThis update introduces a comprehensive distributed key-value block manager (KVBM) system for language model serving, featuring a leader-worker architecture, locality-aware block abstractions, and integration with vLLM for efficient KV cache management. Major changes include new Rust modules for block management, transfer, controller, and connector protocols, as well as extensive Python bindings and vLLM integration layers. Documentation and detailed test plans are provided, and legacy block manager tests are replaced with new KVBM-focused tests. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant vLLM
participant Python KVBM Integration
participant Rust KVBM Leader/Worker
participant Controller/Scheduler
User->>vLLM: Submit inference request
vLLM->>Python KVBM Integration: Request KV cache blocks
Python KVBM Integration->>Rust KVBM Leader/Worker: Allocate/check blocks
Rust KVBM Leader/Worker->>Controller/Scheduler: (If needed) Schedule/coordinate transfer
Controller/Scheduler-->>Rust KVBM Leader/Worker: Grant/deny transfer
Rust KVBM Leader/Worker-->>Python KVBM Integration: Return block info
Python KVBM Integration-->>vLLM: Provide block IDs/handles
vLLM-->>User: Return model output
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Complexity: This PR is extremely complex, introducing a distributed block manager system, new abstractions for locality, controller and connector protocols, extensive Python bindings, vLLM integration, new test plans, and documentation. The changes span core Rust, Python, Docker, and test code, requiring careful, multi-domain review. Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Signed-off-by: Neal Vaidya <nealv@nvidia.com> Signed-off-by: Pavithra Vijayakrishnan <160681768+pvijayakrish@users.noreply.github.com> Signed-off-by: Yan Ru Pei <yanrpei@gmail.com> Signed-off-by: Ryan McCormick <mccormick.codes@gmail.com> Signed-off-by: Anant Sharma <anants@nvidia.com> Signed-off-by: Neelay Shah <neelays@nvidia.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com> Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: Kapil Arya <kapila@nvidia.com> Signed-off-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com> Co-authored-by: ptarasiewiczNV <104908264+ptarasiewiczNV@users.noreply.github.com> Co-authored-by: Keiven C <213854356+keivenchang@users.noreply.github.com> Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com> Co-authored-by: Ryan Olson <rolson@nvidia.com> Co-authored-by: Yan Ru Pei <yanrpei@gmail.com> Co-authored-by: Graham King <grahamk@nvidia.com> Co-authored-by: Kapil Arya <kapil.arya.17@gmail.com> Co-authored-by: atchernych <atchernych@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: zaristei <zaristei@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: heisenberglit <85603888+heisenberglit@users.noreply.github.com> Co-authored-by: Pavithra Vijayakrishnan <160681768+pvijayakrish@users.noreply.github.com> Co-authored-by: J Wyman <jwyman@nvidia.com> Co-authored-by: tanmayv25 <tanmay2592@gmail.com> Co-authored-by: Paul Hendricks <phendricks@nvidia.com> Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: julienmancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Co-authored-by: hongkuan <hongkuanz@nvidia.com> Co-authored-by: mohammedabdulwahhab <furkhan324@berkeley.edu> Co-authored-by: Hongkuan Zhou <tedzhouhk@gmail.com> Co-authored-by: Tanmay Verma <tanmayv@nvidia.com> Co-authored-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com> Co-authored-by: alec-flowers <aflowers@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Jhao-Ting Chen <jtchen0528@gmail.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: saurabh-nvidia <saurabha@nvidia.com> Co-authored-by: Ubuntu <saurabha@saurabha-cpu.l5mxjxajs0be3dwlje3wrdx5ie.xx.internal.cloudapp.net> Co-authored-by: Harrison Saturley-Hall <454891+saturley-hall@users.noreply.github.com> Co-authored-by: Kapil Arya <kapila@nvidia.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com> Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com> Co-authored-by: KrishnanPrash <140860868+KrishnanPrash@users.noreply.github.com> Co-authored-by: Kris Hung <krish@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Tests
Chores