fix: add an env var to setup the leader-worker heartbeat timeout #1965

richardhuo-nv · 2025-07-16T17:26:52Z

Overview:

add an env var to setup the leader-worker ping timeout

Details:

Having large memory for kv cache will significantly increase the leader worker ping time.
Add a env var to dynamically adjust the timeout

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features

Introduced distributed Key-Value Block Manager (KVBM) with leader-worker sharding, supporting CUDA, host, and disk block management.
Added vLLM integration for KVBM, including a Python cache manager and utilities for block allocation and retrieval.
Implemented new block layout types, including layer-separated layouts and logical locality for distributed environments.
Added support for Torch tensor-backed device memory and improved disk storage handling.

Bug Fixes

Improved error handling and logging for distributed block transfers and runtime pipeline errors.
Fixed block registration and duplication handling in block pools.

Documentation

Added comprehensive guides and READMEs for running KVBM in vLLM, distributed block manager architecture, and detailed test plans.
Updated workspace and analysis paths for enhanced Python development experience.

Tests

Added new unit and integration tests for KVBM, vLLM integration, and distributed block manager workflows.
Removed legacy BlockManager tests and replaced with updated coverage for new features.

Refactor

Generalized block manager and offload manager to support locality abstraction.
Refactored block pool APIs to include locality and duplication settings.
Modularized layout, storage, and block data management for extensibility.

Chores

Updated build scripts and Dockerfiles for KVBM support and multi-architecture compatibility.
Enhanced test exclusion patterns for improved CI performance.

copy-pr-bot · 2025-07-16T17:26:56Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

jthomson04 · 2025-07-16T17:28:45Z

@richardhuo-nv I'd imagine the intent here is to target kvbm-vllm-fixes?

richardhuo-nv · 2025-07-16T17:29:23Z

@richardhuo-nv I'd imagine the intent here is to target kvbm-vllm-fixes?

Oh, yeah. Just changed the merge base

coderabbitai · 2025-07-16T17:35:30Z

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces a distributed Key-Value Block Manager (KVBM) with leader-worker architecture, integrating it with vLLM and Python bindings. It adds multi-stage Docker support, new Rust modules for distributed block management, block locality abstraction, advanced block layouts, and comprehensive test suites and documentation. Python interfaces and Rust-Python bindings are extensively refactored to support distributed workflows and vLLM integration.

Changes

File(s) / Path(s)	Change Summary
`container/Dockerfile.kvbm`, `container/build.sh`, `container/run.sh`	Add multi-stage Docker build for KVBM, extend build/run scripts for KVBM support and GDS mounting logic.
`docs/guides/run_kvbm_in_vllm.md`	New guide for running KVBM in vLLM with setup and usage instructions.
`dynamo.code-workspace`	Add Python analysis paths for new bindings and vLLM integration.
`lib/bindings/python/Cargo.toml`, `lib/llm/Cargo.toml`	Update default features, add dependencies, and dev-dependencies for block-manager and distributed features.
`lib/bindings/python/rust/llm.rs`, `lib/bindings/python/rust/llm/block_manager.rs`	Refactor Python Rust bindings: conditional block_manager module, distributed leader-worker resource management, simplified BlockManager interface, add KvbmWorker/Leader, remove old block allocation methods.
`lib/bindings/python/rust/llm/block_manager/block.rs`, `block_list.rs`, `dlpack.rs`, `layer.rs`	Refactor block types for locality, update DLPack handling, streamline dtype usage, and improve block data access.
`lib/bindings/python/rust/llm/block_manager/distributed.rs`, `leader.rs`, `utils.rs`, `worker.rs`	Add distributed leader and worker Python bindings, environment-driven config, async initialization, and resource management.
`lib/bindings/python/rust/llm/block_manager/vllm.rs`, `block_list.rs`, `request.rs`, `slot.rs`	Add vLLM cache manager Rust bindings, block list abstractions, slot management, and request hashing.
`lib/bindings/python/rust/llm/block_manager/vllm/slot_manager_test_plan.md`, `slot_test_plan.md`	Add detailed test plans for slot and slot manager block management.
`lib/bindings/python/src/dynamo/_core.pyi`, `lib/bindings/python/src/dynamo/llm/__init__.py`	Add KvbmCacheManager, KvbmRequest, KvbmLeader, KvbmWorker to Python interface.
`lib/bindings/python/src/dynamo/llm/vllm_integration/` (multiple new files)	Implement Python vLLM integration: KvbmCacheManager, cache utils, Rust loader, and module stubs.
`lib/bindings/python/tests/test_kvbm.py`, `test_kvbm_vllm_integration.py`	Add new pytest-based tests for KVBM and vLLM integration.
`lib/bindings/python/tests/test_block_manager.py`	Remove legacy BlockManager test suite.
`lib/llm/src/block_manager/` (multiple files: rs, md, block/, offload/, pool/, distributed/, etc.)	Major refactor: add locality abstraction, distributed leader-worker modules, new block layouts (LayerSeparate), offload/onboard logic, async initialization, block pool duplication control, atomic counters, and comprehensive tests.
`lib/llm/src/tokens.rs`	Add block size tracking, token range extraction, reset, and From<Vec> for Tokens.
`lib/runtime/src/pipeline/network/ingress/push_endpoint.rs`, `push_handler.rs`	Conditional error logging for debug/release builds.
`pyproject.toml`	Update pytest ignore-glob to skip vllm_integration files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Docker
    participant PythonApp
    participant RustKVBM
    participant Leader
    participant Worker
    participant vLLM

    User->>Docker: Build and run KVBM-enabled container
    Docker->>PythonApp: Start Python environment with KVBM/vLLM integration
    PythonApp->>RustKVBM: Initialize BlockManager (optionally with Leader/Worker)
    RustKVBM->>Leader: (If leader) Setup distributed coordination/barrier
    RustKVBM->>Worker: (If worker) Register with leader, setup block pools
    PythonApp->>vLLM: Serve model with KvbmCacheManager as KV cache
    vLLM->>PythonApp: Request cache blocks for inference
    PythonApp->>RustKVBM: Allocate/query/free blocks, update slots
    RustKVBM->>Leader/Worker: Coordinate block transfers (as needed)
    User->>PythonApp: Query model, validate cache behavior

Possibly related PRs

ai-dynamo/dynamo#1429: Adds distributed leader-worker barrier utilities using etcd, foundational for the distributed KVBM leader-worker coordination in this PR.
ai-dynamo/dynamo#1093: Restructures block registration in the block manager, directly related to the registration and duplication handling changes here.
ai-dynamo/dynamo#1462: Adds support for new block layouts (LayerSeparate), which is foundational for the layout abstractions and block pool changes in this update.

Poem

In the warren of code, a new path unfurled,
Distributed bunnies now hop ‘round the world.
Leaders and workers, with blocks in their paws,
Cache and offload with nary a pause.
Python and Rust, in harmony spun—
KVBM’s journey has only begun!
🐇✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

lib/llm/src/block_manager/distributed/leader.rs

dynamic timeout

62753c3

richardhuo-nv requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners July 16, 2025 17:26

github-actions bot added the fix label Jul 16, 2025

pull-request-size bot added the size/XXL label Jul 16, 2025

richardhuo-nv changed the base branch from main to jthomson04/kvbm-vllm-fixes July 16, 2025 17:27

richardhuo-nv requested a review from oandreeva-nv as a code owner July 16, 2025 17:27

pull-request-size bot added size/S and removed size/XXL labels Jul 16, 2025

jthomson04 requested changes Jul 16, 2025

View reviewed changes

lib/llm/src/block_manager/distributed/leader.rs Outdated Show resolved Hide resolved

lib/llm/src/block_manager/distributed/leader.rs Outdated Show resolved Hide resolved

richardhuo-nv added 2 commits July 16, 2025 11:20

resolve commnets

00cc5b0

resolve commnets

0d7d155

jthomson04 approved these changes Jul 16, 2025

View reviewed changes

richardhuo-nv merged commit bd1b8b3 into jthomson04/kvbm-vllm-fixes Jul 16, 2025
5 of 10 checks passed

richardhuo-nv deleted the rihuo/add_dynamic_timeout branch July 16, 2025 19:37

coderabbitai bot mentioned this pull request Jul 23, 2025

feat: standalone connector api #2076

Closed

This was referenced Aug 2, 2025

merge with main #2252

Merged

fix: re-enable deduplication #2343

Merged

coderabbitai bot mentioned this pull request Aug 22, 2025

feat: KServe gRPC support #2638

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add an env var to setup the leader-worker heartbeat timeout #1965

fix: add an env var to setup the leader-worker heartbeat timeout #1965

Uh oh!

richardhuo-nv commented Jul 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Jul 16, 2025

Uh oh!

jthomson04 commented Jul 16, 2025

Uh oh!

richardhuo-nv commented Jul 16, 2025

Uh oh!

coderabbitai bot commented Jul 16, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: add an env var to setup the leader-worker heartbeat timeout #1965

fix: add an env var to setup the leader-worker heartbeat timeout #1965

Uh oh!

Conversation

richardhuo-nv commented Jul 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jul 16, 2025

Uh oh!

jthomson04 commented Jul 16, 2025

Uh oh!

richardhuo-nv commented Jul 16, 2025

Uh oh!

coderabbitai bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

richardhuo-nv commented Jul 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 16, 2025 •

edited

Loading