-
Notifications
You must be signed in to change notification settings - Fork 679
feat: expose estimated kv cache hit in dynamo-run #1246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce a mechanism for services to inherit and merge common configuration parameters from a centralized "Common" section, streamlining configuration management across Python and YAML files. Rust components are updated to propagate and utilize a new Changes
Sequence Diagram(s)sequenceDiagram
participant Config as ServiceConfig
participant YAML as YAML Config
participant Frontend as Frontend Class
Frontend->>Config: get_parsed_config("Frontend")
Config->>YAML: Load Common and Frontend sections
Config->>Config: Merge common-configs into Frontend config
Config-->>Frontend: Return merged config dict
Frontend->>Frontend: Initialize FrontendConfig with merged config
sequenceDiagram
participant Router as KvRouter
participant Worker as KvPushRouter
participant Req as BackendInput
Router->>Router: find_best_match(tokens)
Router-->>Worker: (instance_id, overlap_amount)
Worker->>Req: Set estimated_prefix_hit_num_blocks = overlap_amount
Worker->>Router: direct(instance_id, updated Req)
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
lib/llm/src/kv_router.rs (1)
134-149:⚠️ Potential issueFix formatting issues flagged by cargo fmt.
The pipeline indicates formatting issues in this code block. Additionally, the method implementation looks correct for exposing overlap amount information.
Please run
cargo fmtto fix the formatting issues:#!/bin/bash # Fix formatting issues in the Rust code cargo fmt --checkThe logic for returning both worker ID and overlap amount is sound and aligns with the PR objective to expose KV cache hit estimation.
🧰 Tools
🪛 GitHub Actions: Rust pre-merge checks
[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.
🧹 Nitpick comments (4)
deploy/sdk/src/dynamo/sdk/lib/config.py (1)
54-79: Well-implemented configuration merging logic.The method correctly handles the configuration inheritance pattern:
- Returns empty dict for missing services
- Excludes
ServiceArgsappropriately- Only applies common configs when not overridden
- Cleans up the
common-configskey from final resultConsider adding type hints and docstring parameter documentation:
+from typing import Dict, Any + @classmethod -def get_parsed_config(cls, service_name): +def get_parsed_config(cls, service_name: str) -> Dict[str, Any]: - """Get parsed config for a service with common configs applied, returned as dict""" + """Get parsed config for a service with common configs applied, returned as dict + + Args: + service_name: Name of the service to get config for + + Returns: + Dictionary containing merged service and common configurations + """examples/vllm_v0/components/frontend.py (1)
71-71: Consider using info level instead of warning for configuration logging.The configuration logging is helpful for debugging, but using
warninglevel for normal configuration output may not be appropriate. Consider usinginfolevel instead.- logger.warning(f"Frontend config: {self.frontend_config}") + logger.info(f"Frontend config: {self.frontend_config}")examples/vllm_v0/configs/agg_kv.yaml (1)
35-35: Fix missing newline at end of file.The configuration change looks good, but there's a missing newline character at the end of the file as flagged by YAMLlint.
common-configs: [model, block-size, max-model-len, router] +🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
lib/llm/src/kv_router.rs (1)
26-26: Remove redundant tracing import.The
use tracing;statement appears redundant since tracing functionality is already available and used throughout the file (e.g., lines 100, 107, 128).-use tracing;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
deploy/sdk/src/dynamo/sdk/lib/config.py(4 hunks)examples/llm/components/frontend.py(1 hunks)examples/sglang/components/frontend.py(1 hunks)examples/tensorrt_llm/components/frontend.py(1 hunks)examples/vllm_v0/components/frontend.py(2 hunks)examples/vllm_v0/components/worker.py(2 hunks)examples/vllm_v0/configs/agg_kv.yaml(2 hunks)examples/vllm_v0/configs/disagg_kv.yaml(2 hunks)examples/vllm_v0/utils/protocol.py(1 hunks)examples/vllm_v1/components/frontend.py(1 hunks)lib/llm/src/kv_router.rs(5 hunks)lib/llm/src/preprocessor.rs(1 hunks)lib/llm/src/protocols/common/preprocessor.rs(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
examples/vllm_v1/components/frontend.py (2)
examples/tensorrt_llm/components/frontend.py (1)
FrontendConfig(42-46)deploy/sdk/src/dynamo/sdk/lib/config.py (2)
ServiceConfig(23-132)get_parsed_config(55-79)
examples/sglang/components/frontend.py (6)
examples/vllm_v0/components/frontend.py (1)
FrontendConfig(42-49)examples/llm/components/frontend.py (1)
FrontendConfig(43-48)examples/tensorrt_llm/components/frontend.py (1)
FrontendConfig(42-46)examples/vllm_v1/components/frontend.py (1)
FrontendConfig(42-47)deploy/sdk/src/dynamo/sdk/lib/config.py (2)
ServiceConfig(23-132)get_parsed_config(55-79)deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
ServiceConfig(37-40)
examples/tensorrt_llm/components/frontend.py (2)
examples/llm/components/frontend.py (1)
FrontendConfig(43-48)deploy/sdk/src/dynamo/sdk/lib/config.py (2)
ServiceConfig(23-132)get_parsed_config(55-79)
examples/llm/components/frontend.py (3)
examples/tensorrt_llm/components/frontend.py (1)
FrontendConfig(42-46)deploy/sdk/src/dynamo/sdk/lib/config.py (2)
ServiceConfig(23-132)get_parsed_config(55-79)deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
ServiceConfig(37-40)
🪛 YAMLlint (1.37.1)
examples/vllm_v0/configs/agg_kv.yaml
[error] 35-35: no new line character at the end of file
(new-line-at-end-of-file)
🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/kv_router.rs
[error] 144-147: cargo fmt check failed due to formatting differences. Code block needs to be reformatted to comply with rustfmt style.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (18)
deploy/sdk/src/dynamo/sdk/lib/config.py (2)
27-28: LGTM! Good use of class constants.Using class-level constants improves maintainability and reduces the risk of typos in string literals.
99-99: Good refactoring to use class constants.Replacing hardcoded strings with class constants improves consistency and maintainability.
Also applies to: 120-121
examples/vllm_v0/utils/protocol.py (1)
55-55: LGTM! Clean addition of optional field.The new field follows the existing pattern with proper typing and sensible default value.
lib/llm/src/preprocessor.rs (1)
180-180: LGTM! Consistent with protocol definition.Setting the field to
Nonealigns with the Python protocol definition and maintains consistency across the codebase.examples/sglang/components/frontend.py (1)
63-63: LGTM! Clean configuration loading refactoring.The change from a two-step configuration loading process to using
ServiceConfig.get_parsed_config("Frontend")directly is a good improvement. This approach:
- Simplifies the code by eliminating the intermediate variable
- Leverages the new method that handles common configuration merging automatically
- Maintains consistency with similar changes across other frontend components
examples/llm/components/frontend.py (1)
67-67: LGTM! Consistent configuration refactoring.This change aligns with the configuration loading improvements seen across other frontend components. The direct use of
ServiceConfig.get_parsed_config("Frontend")streamlines the initialization while maintaining the same functional behavior.examples/tensorrt_llm/components/frontend.py (1)
64-64: LGTM! Consistent configuration loading improvement.This change follows the same beneficial refactoring pattern applied across all frontend components, using
ServiceConfig.get_parsed_config("Frontend")to streamline configuration loading while maintaining functionality.lib/llm/src/protocols/common/preprocessor.rs (1)
50-53: LGTM! Well-designed field addition for KV-aware routing.The new
estimated_prefix_hit_num_blocksfield is properly implemented:
- Uses
Option<u32>to represent optional estimated block count- Includes
#[builder(default)]annotation for optional builder pattern support- Has clear, descriptive documentation explaining its purpose in KV-aware routing
- Follows existing struct patterns and naming conventions
- The
u32type is appropriate for representing block countsThis addition supports the PR objective of exposing estimated KV cache hit metrics.
examples/vllm_v1/components/frontend.py (1)
65-65: LGTM! Configuration loading simplified effectively.The change to use
ServiceConfig.get_parsed_config("Frontend")simplifies the configuration loading process and automatically merges common configurations. This aligns well with the centralized configuration management pattern introduced in the ServiceConfig class.examples/vllm_v0/components/frontend.py (2)
66-68: LGTM! Configuration loading improved.The change to use
ServiceConfig.get_parsed_config("Frontend")is consistent with the centralized configuration management pattern and properly handles common configurations.
93-93: Good addition for debugging visibility.The command logging provides valuable debugging information and helps with traceability of the subprocess execution.
examples/vllm_v0/configs/agg_kv.yaml (2)
19-19: LGTM! Router parameter properly centralized.Adding the
router: kvparameter to the Common section enables key-value aware routing functionality and aligns with the PR objectives.
25-25: Good configuration inheritance setup.Adding
routerto the Frontend'scommon-configslist enables it to inherit the centralized router configuration from the Common section.examples/vllm_v0/configs/disagg_kv.yaml (3)
20-20: LGTM! Consistent router configuration.The router parameter is properly added to the Common section, maintaining consistency with the agg_kv.yaml configuration file.
26-26: Good configuration inheritance for Frontend.Adding
routerto the Frontend's common-configs enables proper inheritance of the router configuration from the Common section.
38-38: Proper configuration inheritance for VllmWorker.The addition of
routerto VllmWorker's common-configs list ensures it inherits the centralized router configuration, supporting the key-value aware routing functionality.lib/llm/src/kv_router.rs (2)
160-160: LGTM! Correct handling of the updated return type.The destructuring correctly extracts the worker_id while discarding the overlap amount, maintaining the expected behavior for this usage context.
194-200: Excellent implementation of KV cache hit estimation propagation.This implementation correctly:
- Extracts both instance_id and overlap_amount from the KV router
- Updates the backend input with the estimated prefix hit blocks
- Properly reconstructs the request with the updated information
This aligns perfectly with the PR objective to expose estimated KV cache hit metrics in dynamo-run.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <tedzhouhk@gmail.com>
…mo/dynamo into hzhou/dynamorun-kv-ratio
grahamking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving Rust changes.
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sdk side changes looks good
Summary by CodeRabbit