Skip to content

Conversation

@richardhuo-nv
Copy link
Contributor

@richardhuo-nv richardhuo-nv commented Jul 16, 2025

Overview:

add an env var to setup the leader-worker ping timeout

Details:

Having large memory for kv cache will significantly increase the leader worker ping time.
Add a env var to dynamically adjust the timeout

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

New Features

  • Introduced distributed Key-Value Block Manager (KVBM) with leader-worker sharding, supporting CUDA, host, and disk block management.
  • Added vLLM integration for KVBM, including a Python cache manager and utilities for block allocation and retrieval.
  • Implemented new block layout types, including layer-separated layouts and logical locality for distributed environments.
  • Added support for Torch tensor-backed device memory and improved disk storage handling.

Bug Fixes

  • Improved error handling and logging for distributed block transfers and runtime pipeline errors.
  • Fixed block registration and duplication handling in block pools.

Documentation

  • Added comprehensive guides and READMEs for running KVBM in vLLM, distributed block manager architecture, and detailed test plans.
  • Updated workspace and analysis paths for enhanced Python development experience.

Tests

  • Added new unit and integration tests for KVBM, vLLM integration, and distributed block manager workflows.
  • Removed legacy BlockManager tests and replaced with updated coverage for new features.

Refactor

  • Generalized block manager and offload manager to support locality abstraction.
  • Refactored block pool APIs to include locality and duplication settings.
  • Modularized layout, storage, and block data management for extensibility.

Chores

  • Updated build scripts and Dockerfiles for KVBM support and multi-architecture compatibility.
  • Enhanced test exclusion patterns for improved CI performance.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the fix label Jul 16, 2025
@richardhuo-nv richardhuo-nv changed the base branch from main to jthomson04/kvbm-vllm-fixes July 16, 2025 17:27
@jthomson04
Copy link
Contributor

@richardhuo-nv I'd imagine the intent here is to target kvbm-vllm-fixes?

@richardhuo-nv
Copy link
Contributor Author

@richardhuo-nv I'd imagine the intent here is to target kvbm-vllm-fixes?

Oh, yeah. Just changed the merge base

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 16, 2025

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces a distributed Key-Value Block Manager (KVBM) with leader-worker architecture, integrating it with vLLM and Python bindings. It adds multi-stage Docker support, new Rust modules for distributed block management, block locality abstraction, advanced block layouts, and comprehensive test suites and documentation. Python interfaces and Rust-Python bindings are extensively refactored to support distributed workflows and vLLM integration.

Changes

File(s) / Path(s) Change Summary
container/Dockerfile.kvbm, container/build.sh, container/run.sh Add multi-stage Docker build for KVBM, extend build/run scripts for KVBM support and GDS mounting logic.
docs/guides/run_kvbm_in_vllm.md New guide for running KVBM in vLLM with setup and usage instructions.
dynamo.code-workspace Add Python analysis paths for new bindings and vLLM integration.
lib/bindings/python/Cargo.toml, lib/llm/Cargo.toml Update default features, add dependencies, and dev-dependencies for block-manager and distributed features.
lib/bindings/python/rust/llm.rs, lib/bindings/python/rust/llm/block_manager.rs Refactor Python Rust bindings: conditional block_manager module, distributed leader-worker resource management, simplified BlockManager interface, add KvbmWorker/Leader, remove old block allocation methods.
lib/bindings/python/rust/llm/block_manager/block.rs, block_list.rs, dlpack.rs, layer.rs Refactor block types for locality, update DLPack handling, streamline dtype usage, and improve block data access.
lib/bindings/python/rust/llm/block_manager/distributed.rs, leader.rs, utils.rs, worker.rs Add distributed leader and worker Python bindings, environment-driven config, async initialization, and resource management.
lib/bindings/python/rust/llm/block_manager/vllm.rs, block_list.rs, request.rs, slot.rs Add vLLM cache manager Rust bindings, block list abstractions, slot management, and request hashing.
lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md, slot_test_plan.md Add detailed test plans for slot and slot manager block management.
lib/bindings/python/src/dynamo/_core.pyi, lib/bindings/python/src/dynamo/llm/__init__.py Add KvbmCacheManager, KvbmRequest, KvbmLeader, KvbmWorker to Python interface.
lib/bindings/python/src/dynamo/llm/vllm_integration/ (multiple new files) Implement Python vLLM integration: KvbmCacheManager, cache utils, Rust loader, and module stubs.
lib/bindings/python/tests/test_kvbm.py, test_kvbm_vllm_integration.py Add new pytest-based tests for KVBM and vLLM integration.
lib/bindings/python/tests/test_block_manager.py Remove legacy BlockManager test suite.
lib/llm/src/block_manager/ (multiple files: rs, md, block/, offload/, pool/, distributed/, etc.) Major refactor: add locality abstraction, distributed leader-worker modules, new block layouts (LayerSeparate), offload/onboard logic, async initialization, block pool duplication control, atomic counters, and comprehensive tests.
lib/llm/src/tokens.rs Add block size tracking, token range extraction, reset, and From<Vec> for Tokens.
lib/runtime/src/pipeline/network/ingress/push_endpoint.rs, push_handler.rs Conditional error logging for debug/release builds.
pyproject.toml Update pytest ignore-glob to skip vllm_integration files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Docker
    participant PythonApp
    participant RustKVBM
    participant Leader
    participant Worker
    participant vLLM

    User->>Docker: Build and run KVBM-enabled container
    Docker->>PythonApp: Start Python environment with KVBM/vLLM integration
    PythonApp->>RustKVBM: Initialize BlockManager (optionally with Leader/Worker)
    RustKVBM->>Leader: (If leader) Setup distributed coordination/barrier
    RustKVBM->>Worker: (If worker) Register with leader, setup block pools
    PythonApp->>vLLM: Serve model with KvbmCacheManager as KV cache
    vLLM->>PythonApp: Request cache blocks for inference
    PythonApp->>RustKVBM: Allocate/query/free blocks, update slots
    RustKVBM->>Leader/Worker: Coordinate block transfers (as needed)
    User->>PythonApp: Query model, validate cache behavior
Loading

Possibly related PRs

  • ai-dynamo/dynamo#1429: Adds distributed leader-worker barrier utilities using etcd, foundational for the distributed KVBM leader-worker coordination in this PR.
  • ai-dynamo/dynamo#1093: Restructures block registration in the block manager, directly related to the registration and duplication handling changes here.
  • ai-dynamo/dynamo#1462: Adds support for new block layouts (LayerSeparate), which is foundational for the layout abstractions and block pool changes in this update.

Poem

In the warren of code, a new path unfurled,
Distributed bunnies now hop ‘round the world.
Leaders and workers, with blocks in their paws,
Cache and offload with nary a pause.
Python and Rust, in harmony spun—
KVBM’s journey has only begun!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@richardhuo-nv richardhuo-nv merged commit bd1b8b3 into jthomson04/kvbm-vllm-fixes Jul 16, 2025
5 of 10 checks passed
@richardhuo-nv richardhuo-nv deleted the rihuo/add_dynamic_timeout branch July 16, 2025 19:37
This was referenced Aug 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants