Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented May 28, 2025

  • expose estimated kv cache hit in dynamo-run when using kv router
  • update vllm_v0 example's disagg router to use the estimated kv cache hit
  • fix all example's frontend doesn't parse common argument

Summary by CodeRabbit

  • New Features
    • Added support for propagating the estimated prefix hit length in requests for improved key-value routing.
    • Centralized and streamlined configuration management, allowing components to inherit common parameters more easily.
  • Improvements
    • Configuration files now use shared common parameters, reducing duplication and simplifying updates.
    • Enhanced logging in frontend components for better visibility into configuration and runtime operations.
    • Simplified frontend configuration loading across multiple components.
  • Bug Fixes
    • Ensured optional fields are handled safely in request processing and routing.

@copy-pr-bot
Copy link

copy-pr-bot bot commented May 28, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 28, 2025

Walkthrough

The changes introduce a mechanism for services to inherit and merge common configuration parameters from a centralized "Common" section, streamlining configuration management across Python and YAML files. Rust components are updated to propagate and utilize a new estimated_prefix_hit_num_blocks field for requests, enabling more accurate routing and prefill logic in key-value aware systems. Logging and code clarity are improved in several Python components.

Changes

Files/Groups Change Summary
deploy/sdk/src/dynamo/sdk/lib/config.py Added class constants for common config keys; introduced get_parsed_config for merged configs; unified logic.
examples/llm/components/frontend.py
examples/sglang/components/frontend.py
examples/tensorrt_llm/components/frontend.py
examples/vllm_v1/components/frontend.py
Simplified Frontend class initialization to use get_parsed_config for loading merged frontend config.
examples/vllm_v0/components/frontend.py Used get_parsed_config for frontend config; added logging for loaded config and subprocess command.
examples/vllm_v0/components/worker.py Updated prefill logic to use estimated prefix hit length; improved logging.
examples/vllm_v0/utils/protocol.py Added optional estimated_prefix_hit_num_blocks field to PreprocessedRequest.
examples/vllm_v0/configs/agg_kv.yaml
examples/vllm_v0/configs/disagg_kv.yaml
Centralized router parameter in Common; updated common-configs lists to include router.
lib/llm/src/kv_router.rs find_best_match now returns overlap amount; propagated to request objects; updated method signatures.
lib/llm/src/preprocessor.rs Sets estimated_prefix_hit_num_blocks to None in backend request builder.
lib/llm/src/protocols/common/preprocessor.rs Added optional estimated_prefix_hit_num_blocks field to PreprocessedRequest struct.

Sequence Diagram(s)

sequenceDiagram
    participant Config as ServiceConfig
    participant YAML as YAML Config
    participant Frontend as Frontend Class

    Frontend->>Config: get_parsed_config("Frontend")
    Config->>YAML: Load Common and Frontend sections
    Config->>Config: Merge common-configs into Frontend config
    Config-->>Frontend: Return merged config dict
    Frontend->>Frontend: Initialize FrontendConfig with merged config
Loading
sequenceDiagram
    participant Router as KvRouter
    participant Worker as KvPushRouter
    participant Req as BackendInput

    Router->>Router: find_best_match(tokens)
    Router-->>Worker: (instance_id, overlap_amount)
    Worker->>Req: Set estimated_prefix_hit_num_blocks = overlap_amount
    Worker->>Router: direct(instance_id, updated Req)
Loading

Poem

In the warren of configs, we bunnies delight,
Merging common carrots, making settings light.
Now with overlap counted, our routes are so neat—
Each hop and nibble, a configuration treat!
From Rust to YAML, and Python in between,
Our code hops forward, robust and clean.
🥕✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c376619 and f9dc844.

📒 Files selected for processing (1)
  • lib/llm/src/kv_router.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/llm/src/kv_router.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: pre-merge-rust (.)
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the feat label May 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
lib/llm/src/kv_router.rs (1)

134-149: ⚠️ Potential issue

Fix formatting issues flagged by cargo fmt.

The pipeline indicates formatting issues in this code block. Additionally, the method implementation looks correct for exposing overlap amount information.

Please run cargo fmt to fix the formatting issues:

#!/bin/bash
# Fix formatting issues in the Rust code
cargo fmt --check

The logic for returning both worker ID and overlap amount is sound and aligns with the PR objective to expose KV cache hit estimation.

🧰 Tools
🪛 GitHub Actions: Rust pre-merge checks

[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.

🧹 Nitpick comments (4)
deploy/sdk/src/dynamo/sdk/lib/config.py (1)

54-79: Well-implemented configuration merging logic.

The method correctly handles the configuration inheritance pattern:

  • Returns empty dict for missing services
  • Excludes ServiceArgs appropriately
  • Only applies common configs when not overridden
  • Cleans up the common-configs key from final result

Consider adding type hints and docstring parameter documentation:

+from typing import Dict, Any
+
 @classmethod
-def get_parsed_config(cls, service_name):
+def get_parsed_config(cls, service_name: str) -> Dict[str, Any]:
-    """Get parsed config for a service with common configs applied, returned as dict"""
+    """Get parsed config for a service with common configs applied, returned as dict
+    
+    Args:
+        service_name: Name of the service to get config for
+        
+    Returns:
+        Dictionary containing merged service and common configurations
+    """
examples/vllm_v0/components/frontend.py (1)

71-71: Consider using info level instead of warning for configuration logging.

The configuration logging is helpful for debugging, but using warning level for normal configuration output may not be appropriate. Consider using info level instead.

-        logger.warning(f"Frontend config: {self.frontend_config}")
+        logger.info(f"Frontend config: {self.frontend_config}")
examples/vllm_v0/configs/agg_kv.yaml (1)

35-35: Fix missing newline at end of file.

The configuration change looks good, but there's a missing newline character at the end of the file as flagged by YAMLlint.

   common-configs: [model, block-size, max-model-len, router]
+
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

lib/llm/src/kv_router.rs (1)

26-26: Remove redundant tracing import.

The use tracing; statement appears redundant since tracing functionality is already available and used throughout the file (e.g., lines 100, 107, 128).

-use tracing;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b1e089 and 7f5a1b0.

📒 Files selected for processing (13)
  • deploy/sdk/src/dynamo/sdk/lib/config.py (4 hunks)
  • examples/llm/components/frontend.py (1 hunks)
  • examples/sglang/components/frontend.py (1 hunks)
  • examples/tensorrt_llm/components/frontend.py (1 hunks)
  • examples/vllm_v0/components/frontend.py (2 hunks)
  • examples/vllm_v0/components/worker.py (2 hunks)
  • examples/vllm_v0/configs/agg_kv.yaml (2 hunks)
  • examples/vllm_v0/configs/disagg_kv.yaml (2 hunks)
  • examples/vllm_v0/utils/protocol.py (1 hunks)
  • examples/vllm_v1/components/frontend.py (1 hunks)
  • lib/llm/src/kv_router.rs (5 hunks)
  • lib/llm/src/preprocessor.rs (1 hunks)
  • lib/llm/src/protocols/common/preprocessor.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
examples/vllm_v1/components/frontend.py (2)
examples/tensorrt_llm/components/frontend.py (1)
  • FrontendConfig (42-46)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)
  • ServiceConfig (23-132)
  • get_parsed_config (55-79)
examples/sglang/components/frontend.py (6)
examples/vllm_v0/components/frontend.py (1)
  • FrontendConfig (42-49)
examples/llm/components/frontend.py (1)
  • FrontendConfig (43-48)
examples/tensorrt_llm/components/frontend.py (1)
  • FrontendConfig (42-46)
examples/vllm_v1/components/frontend.py (1)
  • FrontendConfig (42-47)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)
  • ServiceConfig (23-132)
  • get_parsed_config (55-79)
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
  • ServiceConfig (37-40)
examples/tensorrt_llm/components/frontend.py (2)
examples/llm/components/frontend.py (1)
  • FrontendConfig (43-48)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)
  • ServiceConfig (23-132)
  • get_parsed_config (55-79)
examples/llm/components/frontend.py (3)
examples/tensorrt_llm/components/frontend.py (1)
  • FrontendConfig (42-46)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)
  • ServiceConfig (23-132)
  • get_parsed_config (55-79)
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
  • ServiceConfig (37-40)
🪛 YAMLlint (1.37.1)
examples/vllm_v0/configs/agg_kv.yaml

[error] 35-35: no new line character at the end of file

(new-line-at-end-of-file)

🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/kv_router.rs

[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (18)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)

27-28: LGTM! Good use of class constants.

Using class-level constants improves maintainability and reduces the risk of typos in string literals.


99-99: Good refactoring to use class constants.

Replacing hardcoded strings with class constants improves consistency and maintainability.

Also applies to: 120-121

examples/vllm_v0/utils/protocol.py (1)

55-55: LGTM! Clean addition of optional field.

The new field follows the existing pattern with proper typing and sensible default value.

lib/llm/src/preprocessor.rs (1)

180-180: LGTM! Consistent with protocol definition.

Setting the field to None aligns with the Python protocol definition and maintains consistency across the codebase.

examples/sglang/components/frontend.py (1)

63-63: LGTM! Clean configuration loading refactoring.

The change from a two-step configuration loading process to using ServiceConfig.get_parsed_config("Frontend") directly is a good improvement. This approach:

  • Simplifies the code by eliminating the intermediate variable
  • Leverages the new method that handles common configuration merging automatically
  • Maintains consistency with similar changes across other frontend components
examples/llm/components/frontend.py (1)

67-67: LGTM! Consistent configuration refactoring.

This change aligns with the configuration loading improvements seen across other frontend components. The direct use of ServiceConfig.get_parsed_config("Frontend") streamlines the initialization while maintaining the same functional behavior.

examples/tensorrt_llm/components/frontend.py (1)

64-64: LGTM! Consistent configuration loading improvement.

This change follows the same beneficial refactoring pattern applied across all frontend components, using ServiceConfig.get_parsed_config("Frontend") to streamline configuration loading while maintaining functionality.

lib/llm/src/protocols/common/preprocessor.rs (1)

50-53: LGTM! Well-designed field addition for KV-aware routing.

The new estimated_prefix_hit_num_blocks field is properly implemented:

  • Uses Option<u32> to represent optional estimated block count
  • Includes #[builder(default)] annotation for optional builder pattern support
  • Has clear, descriptive documentation explaining its purpose in KV-aware routing
  • Follows existing struct patterns and naming conventions
  • The u32 type is appropriate for representing block counts

This addition supports the PR objective of exposing estimated KV cache hit metrics.

examples/vllm_v1/components/frontend.py (1)

65-65: LGTM! Configuration loading simplified effectively.

The change to use ServiceConfig.get_parsed_config("Frontend") simplifies the configuration loading process and automatically merges common configurations. This aligns well with the centralized configuration management pattern introduced in the ServiceConfig class.

examples/vllm_v0/components/frontend.py (2)

66-68: LGTM! Configuration loading improved.

The change to use ServiceConfig.get_parsed_config("Frontend") is consistent with the centralized configuration management pattern and properly handles common configurations.


93-93: Good addition for debugging visibility.

The command logging provides valuable debugging information and helps with traceability of the subprocess execution.

examples/vllm_v0/configs/agg_kv.yaml (2)

19-19: LGTM! Router parameter properly centralized.

Adding the router: kv parameter to the Common section enables key-value aware routing functionality and aligns with the PR objectives.


25-25: Good configuration inheritance setup.

Adding router to the Frontend's common-configs list enables it to inherit the centralized router configuration from the Common section.

examples/vllm_v0/configs/disagg_kv.yaml (3)

20-20: LGTM! Consistent router configuration.

The router parameter is properly added to the Common section, maintaining consistency with the agg_kv.yaml configuration file.


26-26: Good configuration inheritance for Frontend.

Adding router to the Frontend's common-configs enables proper inheritance of the router configuration from the Common section.


38-38: Proper configuration inheritance for VllmWorker.

The addition of router to VllmWorker's common-configs list ensures it inherits the centralized router configuration, supporting the key-value aware routing functionality.

lib/llm/src/kv_router.rs (2)

160-160: LGTM! Correct handling of the updated return type.

The destructuring correctly extracts the worker_id while discarding the overlap amount, maintaining the expected behavior for this usage context.


194-200: Excellent implementation of KV cache hit estimation propagation.

This implementation correctly:

  • Extracts both instance_id and overlap_amount from the KV router
  • Updates the backend input with the estimated prefix hit blocks
  • Properly reconstructs the request with the updated information

This aligns perfectly with the PR objective to expose estimated KV cache hit metrics in dynamo-run.

tedzhouhk and others added 4 commits May 28, 2025 12:00
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
Copy link
Contributor

@grahamking grahamking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving Rust changes.

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdk side changes looks good

@tedzhouhk tedzhouhk merged commit c9eb6a8 into main May 29, 2025
10 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/dynamorun-kv-ratio branch May 29, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants