feat: expose estimated kv cache hit in dynamo-run #1246

tedzhouhk · 2025-05-28T18:24:34Z

expose estimated kv cache hit in dynamo-run when using kv router
update vllm_v0 example's disagg router to use the estimated kv cache hit
fix all example's frontend doesn't parse common argument

Summary by CodeRabbit

New Features
- Added support for propagating the estimated prefix hit length in requests for improved key-value routing.
- Centralized and streamlined configuration management, allowing components to inherit common parameters more easily.
Improvements
- Configuration files now use shared common parameters, reducing duplication and simplifying updates.
- Enhanced logging in frontend components for better visibility into configuration and runtime operations.
- Simplified frontend configuration loading across multiple components.
Bug Fixes
- Ensured optional fields are handled safely in request processing and routing.

copy-pr-bot · 2025-05-28T18:24:37Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-05-28T18:24:42Z

Walkthrough

The changes introduce a mechanism for services to inherit and merge common configuration parameters from a centralized "Common" section, streamlining configuration management across Python and YAML files. Rust components are updated to propagate and utilize a new estimated_prefix_hit_num_blocks field for requests, enabling more accurate routing and prefill logic in key-value aware systems. Logging and code clarity are improved in several Python components.

Changes

Files/Groups	Change Summary
deploy/sdk/src/dynamo/sdk/lib/config.py	Added class constants for common config keys; introduced `get_parsed_config` for merged configs; unified logic.
examples/llm/components/frontend.py examples/sglang/components/frontend.py examples/tensorrt_llm/components/frontend.py examples/vllm_v1/components/frontend.py	Simplified `Frontend` class initialization to use `get_parsed_config` for loading merged frontend config.
examples/vllm_v0/components/frontend.py	Used `get_parsed_config` for frontend config; added logging for loaded config and subprocess command.
examples/vllm_v0/components/worker.py	Updated prefill logic to use estimated prefix hit length; improved logging.
examples/vllm_v0/utils/protocol.py	Added optional `estimated_prefix_hit_num_blocks` field to `PreprocessedRequest`.
examples/vllm_v0/configs/agg_kv.yaml examples/vllm_v0/configs/disagg_kv.yaml	Centralized `router` parameter in `Common`; updated `common-configs` lists to include `router`.
lib/llm/src/kv_router.rs	`find_best_match` now returns overlap amount; propagated to request objects; updated method signatures.
lib/llm/src/preprocessor.rs	Sets `estimated_prefix_hit_num_blocks` to `None` in backend request builder.
lib/llm/src/protocols/common/preprocessor.rs	Added optional `estimated_prefix_hit_num_blocks` field to `PreprocessedRequest` struct.

Sequence Diagram(s)

sequenceDiagram
    participant Config as ServiceConfig
    participant YAML as YAML Config
    participant Frontend as Frontend Class

    Frontend->>Config: get_parsed_config("Frontend")
    Config->>YAML: Load Common and Frontend sections
    Config->>Config: Merge common-configs into Frontend config
    Config-->>Frontend: Return merged config dict
    Frontend->>Frontend: Initialize FrontendConfig with merged config

sequenceDiagram
    participant Router as KvRouter
    participant Worker as KvPushRouter
    participant Req as BackendInput

    Router->>Router: find_best_match(tokens)
    Router-->>Worker: (instance_id, overlap_amount)
    Worker->>Req: Set estimated_prefix_hit_num_blocks = overlap_amount
    Worker->>Router: direct(instance_id, updated Req)

Poem

In the warren of configs, we bunnies delight,
Merging common carrots, making settings light.
Now with overlap counted, our routes are so neat—
Each hop and nibble, a configuration treat!
From Rust to YAML, and Python in between,
Our code hops forward, robust and clean.
🥕✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c376619 and f9dc844.

📒 Files selected for processing (1)

lib/llm/src/kv_router.rs (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

lib/llm/src/kv_router.rs

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (.)

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

lib/llm/src/kv_router.rs (1)
134-149: ⚠️ Potential issue

Fix formatting issues flagged by cargo fmt.

The pipeline indicates formatting issues in this code block. Additionally, the method implementation looks correct for exposing overlap amount information.

Please run cargo fmt to fix the formatting issues:
#!/bin/bash
# Fix formatting issues in the Rust code
cargo fmt --check
The logic for returning both worker ID and overlap amount is sound and aligns with the PR objective to expose KV cache hit estimation.

🧰 Tools

🪛 GitHub Actions: Rust pre-merge checks

[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.

🧹 Nitpick comments (4)

deploy/sdk/src/dynamo/sdk/lib/config.py (1)
54-79: Well-implemented configuration merging logic.

The method correctly handles the configuration inheritance pattern:

Returns empty dict for missing services

Excludes ServiceArgs appropriately

Only applies common configs when not overridden

Cleans up the common-configs key from final result

Consider adding type hints and docstring parameter documentation:
+from typing import Dict, Any
+
 @classmethod
-def get_parsed_config(cls, service_name):
+def get_parsed_config(cls, service_name: str) -> Dict[str, Any]:
-    """Get parsed config for a service with common configs applied, returned as dict"""
+    """Get parsed config for a service with common configs applied, returned as dict
+    
+    Args:
+        service_name: Name of the service to get config for
+        
+    Returns:
+        Dictionary containing merged service and common configurations
+    """
examples/vllm_v0/components/frontend.py (1)
71-71: Consider using info level instead of warning for configuration logging.

The configuration logging is helpful for debugging, but using warning level for normal configuration output may not be appropriate. Consider using info level instead.
-        logger.warning(f"Frontend config: {self.frontend_config}")
+        logger.info(f"Frontend config: {self.frontend_config}")
examples/vllm_v0/configs/agg_kv.yaml (1)
35-35: Fix missing newline at end of file.

The configuration change looks good, but there's a missing newline character at the end of the file as flagged by YAMLlint.
   common-configs: [model, block-size, max-model-len, router]
+
🧰 Tools

🪛 YAMLlint (1.37.1)

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)
lib/llm/src/kv_router.rs (1)
26-26: Remove redundant tracing import.

The use tracing; statement appears redundant since tracing functionality is already available and used throughout the file (e.g., lines 100, 107, 128).
-use tracing;

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b1e089 and 7f5a1b0.

📒 Files selected for processing (13)

deploy/sdk/src/dynamo/sdk/lib/config.py (4 hunks)
examples/llm/components/frontend.py (1 hunks)
examples/sglang/components/frontend.py (1 hunks)
examples/tensorrt_llm/components/frontend.py (1 hunks)
examples/vllm_v0/components/frontend.py (2 hunks)
examples/vllm_v0/components/worker.py (2 hunks)
examples/vllm_v0/configs/agg_kv.yaml (2 hunks)
examples/vllm_v0/configs/disagg_kv.yaml (2 hunks)
examples/vllm_v0/utils/protocol.py (1 hunks)
examples/vllm_v1/components/frontend.py (1 hunks)
lib/llm/src/kv_router.rs (5 hunks)
lib/llm/src/preprocessor.rs (1 hunks)
lib/llm/src/protocols/common/preprocessor.rs (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

examples/vllm_v1/components/frontend.py (2)

examples/tensorrt_llm/components/frontend.py (1)

FrontendConfig (42-46)

deploy/sdk/src/dynamo/sdk/lib/config.py (2)

ServiceConfig (23-132)

get_parsed_config (55-79)

examples/sglang/components/frontend.py (6)

examples/vllm_v0/components/frontend.py (1)

FrontendConfig (42-49)

examples/llm/components/frontend.py (1)

FrontendConfig (43-48)

examples/tensorrt_llm/components/frontend.py (1)

FrontendConfig (42-46)

examples/vllm_v1/components/frontend.py (1)

FrontendConfig (42-47)

deploy/sdk/src/dynamo/sdk/lib/config.py (2)

ServiceConfig (23-132)

get_parsed_config (55-79)

deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)

ServiceConfig (37-40)

examples/tensorrt_llm/components/frontend.py (2)

examples/llm/components/frontend.py (1)

FrontendConfig (43-48)

deploy/sdk/src/dynamo/sdk/lib/config.py (2)

ServiceConfig (23-132)

get_parsed_config (55-79)

examples/llm/components/frontend.py (3)

examples/tensorrt_llm/components/frontend.py (1)

FrontendConfig (42-46)

deploy/sdk/src/dynamo/sdk/lib/config.py (2)

ServiceConfig (23-132)

get_parsed_config (55-79)

deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)

ServiceConfig (37-40)

🪛 YAMLlint (1.37.1)

examples/vllm_v0/configs/agg_kv.yaml

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

🪛 GitHub Actions: Rust pre-merge checks

lib/llm/src/kv_router.rs

[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build and Test - vllm

🔇 Additional comments (18)

deploy/sdk/src/dynamo/sdk/lib/config.py (2)

27-28: LGTM! Good use of class constants.

Using class-level constants improves maintainability and reduces the risk of typos in string literals.

99-99: Good refactoring to use class constants.

Replacing hardcoded strings with class constants improves consistency and maintainability.

Also applies to: 120-121

examples/vllm_v0/utils/protocol.py (1)

55-55: LGTM! Clean addition of optional field.

The new field follows the existing pattern with proper typing and sensible default value.

lib/llm/src/preprocessor.rs (1)

180-180: LGTM! Consistent with protocol definition.

Setting the field to None aligns with the Python protocol definition and maintains consistency across the codebase.

examples/sglang/components/frontend.py (1)

63-63: LGTM! Clean configuration loading refactoring.

The change from a two-step configuration loading process to using ServiceConfig.get_parsed_config("Frontend") directly is a good improvement. This approach:

Simplifies the code by eliminating the intermediate variable

Leverages the new method that handles common configuration merging automatically

Maintains consistency with similar changes across other frontend components

examples/llm/components/frontend.py (1)

67-67: LGTM! Consistent configuration refactoring.

This change aligns with the configuration loading improvements seen across other frontend components. The direct use of ServiceConfig.get_parsed_config("Frontend") streamlines the initialization while maintaining the same functional behavior.

examples/tensorrt_llm/components/frontend.py (1)

64-64: LGTM! Consistent configuration loading improvement.

This change follows the same beneficial refactoring pattern applied across all frontend components, using ServiceConfig.get_parsed_config("Frontend") to streamline configuration loading while maintaining functionality.

lib/llm/src/protocols/common/preprocessor.rs (1)

50-53: LGTM! Well-designed field addition for KV-aware routing.

The new estimated_prefix_hit_num_blocks field is properly implemented:

Uses Option<u32> to represent optional estimated block count

Includes #[builder(default)] annotation for optional builder pattern support

Has clear, descriptive documentation explaining its purpose in KV-aware routing

Follows existing struct patterns and naming conventions

The u32 type is appropriate for representing block counts

This addition supports the PR objective of exposing estimated KV cache hit metrics.

examples/vllm_v1/components/frontend.py (1)

65-65: LGTM! Configuration loading simplified effectively.

The change to use ServiceConfig.get_parsed_config("Frontend") simplifies the configuration loading process and automatically merges common configurations. This aligns well with the centralized configuration management pattern introduced in the ServiceConfig class.

examples/vllm_v0/components/frontend.py (2)

66-68: LGTM! Configuration loading improved.

The change to use ServiceConfig.get_parsed_config("Frontend") is consistent with the centralized configuration management pattern and properly handles common configurations.

93-93: Good addition for debugging visibility.

The command logging provides valuable debugging information and helps with traceability of the subprocess execution.

examples/vllm_v0/configs/agg_kv.yaml (2)

19-19: LGTM! Router parameter properly centralized.

Adding the router: kv parameter to the Common section enables key-value aware routing functionality and aligns with the PR objectives.

25-25: Good configuration inheritance setup.

Adding router to the Frontend's common-configs list enables it to inherit the centralized router configuration from the Common section.

examples/vllm_v0/configs/disagg_kv.yaml (3)

20-20: LGTM! Consistent router configuration.

The router parameter is properly added to the Common section, maintaining consistency with the agg_kv.yaml configuration file.

26-26: Good configuration inheritance for Frontend.

Adding router to the Frontend's common-configs enables proper inheritance of the router configuration from the Common section.

38-38: Proper configuration inheritance for VllmWorker.

The addition of router to VllmWorker's common-configs list ensures it inherits the centralized router configuration, supporting the key-value aware routing functionality.

lib/llm/src/kv_router.rs (2)

160-160: LGTM! Correct handling of the updated return type.

The destructuring correctly extracts the worker_id while discarding the overlap amount, maintaining the expected behavior for this usage context.

194-200: Excellent implementation of KV cache hit estimation propagation.

This implementation correctly:

Extracts both instance_id and overlap_amount from the KV router

Updates the backend input with the estimated prefix hit blocks

Properly reconstructs the request with the updated information

This aligns perfectly with the PR objective to expose estimated KV cache hit metrics in dynamo-run.

examples/vllm_v0/components/worker.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>

…mo/dynamo into hzhou/dynamorun-kv-ratio

grahamking

Approving Rust changes.

biswapanda

sdk side changes looks good

tedzhouhk added 5 commits May 27, 2025 15:33

stage

80d200c

stage

7978735

fix frontend args

d3b3664

working

9bb63d7

update logging

f57cc24

tedzhouhk requested review from biswapanda, grahamking, hhzhang16, hutm, ishandhanani, julienmancuso, nnshah1, piotrm-nvidia, ptarasiewiczNV, ryanolson and tanmayv25 as code owners May 28, 2025 18:24

tedzhouhk requested review from a team, GuanLuo, PeaBrane, alec-flowers, jthomson04, kkranen, mohammedabdulwahhab, oandreeva-nv, paulhendricks, rmccorm4, sshchoi and tmonty12 as code owners May 28, 2025 18:24

pull-request-size bot added the size/L label May 28, 2025

github-actions bot added the feat label May 28, 2025

pc

7f5a1b0

coderabbitai bot reviewed May 28, 2025

View reviewed changes

examples/vllm_v0/components/worker.py Show resolved Hide resolved

examples/vllm_v0/components/worker.py Outdated Show resolved Hide resolved

examples/vllm_v0/components/worker.py Show resolved Hide resolved

tedzhouhk and others added 4 commits May 28, 2025 12:00

Apply suggestions from code review

684043f

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>

remove unused import

ad6a406

Merge branch 'hzhou/dynamorun-kv-ratio' of https://github.com/ai-dyna…

c376619

…mo/dynamo into hzhou/dynamorun-kv-ratio

cargo fmt

f9dc844

grahamking approved these changes May 28, 2025

View reviewed changes

biswapanda approved these changes May 29, 2025

View reviewed changes

tedzhouhk merged commit c9eb6a8 into main May 29, 2025
10 checks passed

tedzhouhk deleted the hzhou/dynamorun-kv-ratio branch May 29, 2025 17:06

coderabbitai bot mentioned this pull request May 30, 2025

chore: Remove PreprocessedRequest alias BackendInput #1307

Merged

coderabbitai bot mentioned this pull request Aug 14, 2025

feat: skip router when worker id is pre-determined #2450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: expose estimated kv cache hit in dynamo-run #1246

feat: expose estimated kv cache hit in dynamo-run #1246

Uh oh!

tedzhouhk commented May 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented May 28, 2025

Uh oh!

coderabbitai bot commented May 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking left a comment

Uh oh!

biswapanda left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: expose estimated kv cache hit in dynamo-run #1246

feat: expose estimated kv cache hit in dynamo-run #1246

Uh oh!

Conversation

tedzhouhk commented May 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented May 28, 2025

Uh oh!

coderabbitai bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking left a comment

Choose a reason for hiding this comment

Uh oh!

biswapanda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented May 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)