Skip to content

Conversation

@ziqifan617
Copy link
Contributor

@ziqifan617 ziqifan617 commented Aug 2, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Introduced a distributed Key-Value Block Manager (KVBM) with leader-worker architecture for vLLM integration, supporting asynchronous block transfer, multi-tier storage (device, host, disk), and advanced block lifecycle management.
    • Added Python and Rust APIs for KVBM cache management, distributed runtime, and connector interfaces for vLLM.
    • Implemented detailed controller and scheduling interfaces for block pool management, including reset and status operations.
    • Added new block layout strategies, including per-layer storage, and locality abstractions for flexible memory management.
    • Extensive documentation and guides for running KVBM in vLLM, block manager lifecycle, and distributed messaging.
  • Bug Fixes

    • Improved error handling and validation in block manager initialization, block allocation, and transfer operations.
  • Documentation

    • Added comprehensive guides, README files, and detailed test plans for new distributed features and block management workflows.
  • Tests

    • Introduced new test suites for KVBM cache manager, vLLM integration, block manager lifecycle, and distributed operations.
    • Removed legacy BlockManager test suite in favor of new distributed and cache manager tests.
  • Refactor

    • Major refactoring of block manager internals for locality, block data abstraction, and async initialization.
    • Simplified and generalized block traits, storage, and transfer logic.
    • Modularized codebase with clearer separation of distributed, controller, and connector logic.
  • Chores

    • Updated Docker and build scripts to support KVBM and multi-architecture builds.
    • Added new dependencies and feature flags for Rust and Python integration.

End-users now have access to distributed KV cache management, advanced block offloading, and vLLM integration with improved reliability, scalability, and observability.

nealvaidya and others added 30 commits July 21, 2025 15:59
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
…utedRuntime, Namespace, Components, and Endpoint (#2008)

Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Co-authored-by: Ryan Olson <rolson@nvidia.com>
Co-authored-by: Graham King <grahamk@nvidia.com>
Signed-off-by: Pavithra Vijayakrishnan <160681768+pvijayakrish@users.noreply.github.com>
…cheduler (#2071)

Signed-off-by: Yan Ru Pei <yanrpei@gmail.com>
)

Signed-off-by: Ryan McCormick <mccormick.codes@gmail.com>
Co-authored-by: tanmayv25 <tanmay2592@gmail.com>
@ziqifan617 ziqifan617 marked this pull request as draft August 2, 2025 01:09
@ziqifan617 ziqifan617 changed the base branch from main to ryan/connector-250801 August 2, 2025 01:10
@ziqifan617 ziqifan617 marked this pull request as ready for review August 2, 2025 01:10
@ziqifan617 ziqifan617 changed the title feat: add KVBM vLLM integration merge with main Aug 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 2, 2025

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces a comprehensive distributed key-value block manager (KVBM) integration for vLLM, spanning major Rust and Python codebases, Docker build infrastructure, and extensive documentation. It adds new block locality abstractions, distributed leader-worker protocols, controller interfaces, advanced block layouts, and a full Python binding and test suite for vLLM KV cache management, replacing legacy block manager APIs.

Changes

Cohort / File(s) Change Summary
Docker & Build Integration
container/Dockerfile.kvbm, container/build.sh, container/run.sh
Adds a multi-stage Dockerfile and build/run script logic for KVBM, including CUDA, Rust, Python, UCX, NATS, ETCD, and Prometheus setup, with architecture-specific handling and local/CI/runtime stages.
Python Bindings: Block Manager & vLLM Integration
lib/bindings/python/rust/llm/block_manager.rs, lib/bindings/python/rust/llm/block_manager/block.rs, lib/bindings/python/rust/llm/block_manager/block_list.rs, lib/bindings/python/rust/llm/block_manager/controller.rs, lib/bindings/python/rust/llm/block_manager/distributed.rs, lib/bindings/python/rust/llm/block_manager/distributed/leader.rs, lib/bindings/python/rust/llm/block_manager/distributed/utils.rs, lib/bindings/python/rust/llm/block_manager/distributed/worker.rs, lib/bindings/python/rust/llm/block_manager/dlpack.rs, lib/bindings/python/rust/llm/block_manager/layer.rs, lib/bindings/python/rust/llm/block_manager/vllm.rs, lib/bindings/python/rust/llm/block_manager/vllm/block_list.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/leader.rs, lib/bindings/python/rust/llm/block_manager/vllm/connector/worker.rs, lib/bindings/python/rust/llm/block_manager/vllm/request.rs, lib/bindings/python/rust/llm/block_manager/vllm/slot.rs, lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md, lib/bindings/python/rust/llm/block_manager/vllm/slot_test_plan.md
Major refactor and extension: introduces distributed leader/worker bindings, controller client, vLLM cache manager, slot/block management, block state APIs, test plans, and Rust-Python interface for distributed KV cache management.
Python: vLLM Integration & API
lib/bindings/python/src/dynamo/llm/vllm_integration/__init__.py, lib/bindings/python/src/dynamo/llm/vllm_integration/connector/__init__.py, lib/bindings/python/src/dynamo/llm/vllm_integration/connector/dynamo_connector.py, lib/bindings/python/src/dynamo/llm/vllm_integration/connector_leader.py, lib/bindings/python/src/dynamo/llm/vllm_integration/connector_worker.py, lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_manager.py, lib/bindings/python/src/dynamo/llm/vllm_integration/kv_cache_utils.py, lib/bindings/python/src/dynamo/llm/vllm_integration/rust.py
Adds Python modules/classes for vLLM KV cache manager, connector (leader/worker), metadata, block utilities, and dynamic Rust binding loading; defines vLLM protocol integration and API surface.
Python: Core & Import Surface
lib/bindings/python/src/dynamo/_core.pyi, lib/bindings/python/src/dynamo/llm/__init__.py
Declares new types (KvbmCacheManager, KvbmRequest, KvbmLeader, KvbmWorker) in type stubs and imports, expanding the Python API.
Python: Test Suite
lib/bindings/python/tests/test_kvbm.py, lib/bindings/python/tests/test_kvbm_vllm_integration.py, lib/bindings/python/tests/test_block_manager.py
Adds new async and unit tests for KVBM and vLLM integration, covering allocation, freeing, error cases, and cache behaviors; removes legacy BlockManager tests.
Rust: Block Manager Core & Locality Abstractions
lib/llm/src/block_manager.rs, lib/llm/src/block_manager/block.rs, lib/llm/src/block_manager/block/data.rs, lib/llm/src/block_manager/block/data/local.rs, lib/llm/src/block_manager/block/data/logical.rs, lib/llm/src/block_manager/block/data/logical/distributed_leader_worker.rs, lib/llm/src/block_manager/block/data/logical/null.rs, lib/llm/src/block_manager/block/data/view.rs, lib/llm/src/block_manager/block/factory.rs, lib/llm/src/block_manager/block/factory/local.rs, lib/llm/src/block_manager/block/factory/logical.rs, lib/llm/src/block_manager/block/locality.rs, lib/llm/src/block_manager/block/state.rs, lib/llm/src/block_manager/block/transfer.rs, lib/llm/src/block_manager/block/transfer/cuda.rs, lib/llm/src/block_manager/block/transfer/memcpy.rs, lib/llm/src/block_manager/block/transfer/nixl.rs, lib/llm/src/block_manager/config.rs, lib/llm/src/block_manager/layout.rs, lib/llm/src/block_manager/layout/nixl.rs, lib/llm/src/block_manager/layout/utils.rs
Refactors block manager core: introduces locality abstraction, logical resources, new block layouts (LayerSeparate), async block manager initialization, transfer logic, and validation utilities; updates config and serialization.
Rust: Distributed KVBM System
lib/llm/src/block_manager/distributed.rs, lib/llm/src/block_manager/distributed/leader.rs, lib/llm/src/block_manager/distributed/transfer.rs, lib/llm/src/block_manager/distributed/utils.rs, lib/llm/src/block_manager/distributed/worker.rs, lib/llm/src/block_manager/distributed/zmq.rs
Implements distributed leader-worker system for block management: ZMQ-based messaging, leader/worker configs and synchronization, transfer handler, protocol utilities, and async test harness for distributed block operations.
Rust: Block Manager Controller
lib/llm/src/block_manager/controller.rs, lib/llm/src/block_manager/controller/client.rs, lib/llm/src/block_manager/controller/handler.rs
Adds controller module for block pool management: control messages, status queries, pool/block reset, client and handler implementations, and async engine integration.
Rust: Block Manager Connector & Scheduler
lib/llm/src/block_manager/connector.rs, lib/llm/src/block_manager/connector/protocol.rs, lib/llm/src/block_manager/connector/scheduler.rs
Introduces connector and scheduler: slot management, transfer protocol, error types, async scheduling, slot state machine, and test coverage for transfer coordination.
Rust: Documentation
lib/llm/src/block_manager.md, lib/llm/src/block_manager/distributed/README.md
Adds markdown docs describing block lifecycle, offload manager, distributed message system, and test plans for slot/block management.
Cargo & Dependency Management
lib/bindings/python/Cargo.toml, lib/llm/Cargo.toml
Adds new dependencies (derive-getters, uuid, rstest, futures-util, tmq) and features for Python bindings and block manager crates.
Guides & User Documentation
docs/guides/run_kvbm_in_vllm.md
Adds user guide for running KVBM in vLLM, including setup, environment, and API usage.
Python: Minor/Support Changes
lib/bindings/python/rust/lib.rs, lib/bindings/python/rust/llm.rs
Updates public API surface for distributed runtime and block manager, including conditional compilation and new methods.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PythonApp
    participant KvbmCacheManager
    participant RustBlockManager
    participant DistributedLeader
    participant DistributedWorker

    User->>PythonApp: Issues KV cache operation (e.g., allocate, free)
    PythonApp->>KvbmCacheManager: Calls cache manager API
    KvbmCacheManager->>RustBlockManager: FFI call for block management
    RustBlockManager->>DistributedLeader: (if leader) Initiate distributed protocol
    RustBlockManager->>DistributedWorker: (if worker) Participate in protocol
    DistributedLeader-->>DistributedWorker: Synchronize via ZMQ/etcd
    DistributedWorker-->>RustBlockManager: Handle block transfer/ack
    RustBlockManager-->>KvbmCacheManager: Return result
    KvbmCacheManager-->>PythonApp: Return result
    PythonApp-->>User: Operation complete
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

  • Complexity: High. This PR introduces a distributed system, new abstractions, protocol layers, controller interfaces, and a full Python/Rust API, with extensive test and documentation changes.
  • Scope: Broad. It touches many core and peripheral files, including build infrastructure, Rust and Python APIs, and user/test documentation.
  • Volume: Very large, spanning hundreds of files and thousands of lines.

Possibly related PRs

  • ai-dynamo/dynamo#1141: Adds asynchronous Python bindings and a new Layer class to the block manager, with refactoring and tests. Related as a precursor to the broader distributed KVBM and vLLM integration in this PR.
  • ai-dynamo/dynamo#1965: Adds environment variable configuration for leader-worker heartbeat timeout in Dockerfile.kvbm and build scripts. Related as both PRs modify the KVBM Docker and build environment.

Poem

In a warren deep, where code does sprawl,
The rabbits built a cache for all—
With leaders, workers, blocks that leap,
Across the disk and RAM so deep.
Now vLLM and KVBM unite,
Distributed dreams take flight!
🐇✨

—A rabbit, delighting in distributed bytes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ryanolson ryanolson merged commit 585a026 into ryan/connector-250801 Aug 2, 2025
9 of 15 checks passed
@ryanolson ryanolson deleted the ziqi/connector-250801 branch August 2, 2025 19:37
@coderabbitai coderabbitai bot mentioned this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.