-
Notifications
You must be signed in to change notification settings - Fork 679
feat: expose router configurations to dynamo-run #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change introduces three new optional floating-point weights— Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI
participant Flags
participant Main
participant ModelWatcher
participant ModelManager
participant KvRouter
User->>CLI: Runs `dynamo run` with optional KV weight flags
CLI->>Flags: Parses CLI arguments (including weights)
Flags->>Main: Passes weights to main logic
Main->>ModelWatcher: Instantiates with weights
ModelWatcher->>ModelManager: Calls kv_chooser_for(..., weights)
ModelManager->>KvRouter: Creates KvRouter with weighted selector
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
components/http/src/main.rs(1 hunks)docs/guides/dynamo_run.md(2 hunks)launch/dynamo-run/src/flags.rs(2 hunks)launch/dynamo-run/src/input/common.rs(1 hunks)launch/dynamo-run/src/input/http.rs(2 hunks)launch/dynamo-run/src/main.rs(1 hunks)launch/llmctl/src/main.rs(2 hunks)lib/llm/src/discovery/model_manager.rs(3 hunks)lib/llm/src/discovery/watcher.rs(3 hunks)lib/llm/src/kv_router/scheduler.rs(3 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1259/merge) by tedzhouhk.
launch/dynamo-run/src/input/http.rs
[error] 110-117: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please remove trailing whitespace.
lib/llm/src/discovery/watcher.rs
[error] 218-225: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please remove trailing whitespace.
[error] 260-267: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please remove trailing whitespace.
lib/llm/src/kv_router/scheduler.rs
[error] 319-319: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please remove trailing whitespace.
lib/llm/src/discovery/model_manager.rs
[error] 201-208: Trailing whitespace detected and fixed by pre-commit hook 'trailing-whitespace'. Please remove trailing whitespace.
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build Documentation
🔇 Additional comments (21)
launch/dynamo-run/src/input/common.rs (1)
53-55: LGTM! Consistent API extension.The addition of three
Noneparameters for the new KV-aware routing weights is appropriate for existing code that doesn't need to specify these optional parameters.launch/llmctl/src/main.rs (2)
261-263: LGTM! Consistent with API extension.The addition of three
Noneparameters aligns with the updatedModelWatcher::newsignature for KV-aware routing weights.
319-321: LGTM! Consistent with API extension.The addition of three
Noneparameters matches the pattern used throughout the codebase for the new optional routing weights.components/http/src/main.rs (1)
59-59: LGTM! Consistent API update.The addition of three
Noneparameters correctly extends theModelWatcher::newcall to accommodate the new optional KV-aware routing weights.launch/dynamo-run/src/main.rs (2)
33-33: LGTM! Good documentation of new CLI flags.The USAGE string correctly documents the new KV-aware routing weight flags with their default values, making them discoverable to users.
33-33:✅ Verification successful
Verify all ModelWatcher::new calls have been updated.
Ensure that all instances of
ModelWatcher::newthroughout the codebase have been updated to include the three new optional parameters.
🏁 Script executed:
#!/bin/bash # Description: Search for all ModelWatcher::new calls to verify they include the new parameters # Expected: All calls should have 6 parameters (3 original + 3 new None values) echo "Searching for ModelWatcher::new calls..." ast-grep --pattern 'ModelWatcher::new($$$)' echo -e "\nSearching for any remaining 3-parameter ModelWatcher::new calls..." rg -A 5 "ModelWatcher::new\(" --type rustLength of output: 4392
All ModelWatcher::new invocations updated
Confirmed that every
ModelWatcher::newcall now includes all three new optional parameters (no remaining 3-parameter invocations).launch/dynamo-run/src/flags.rs (2)
108-122: LGTM: Well-documented KV router weight configuration flags.The three new optional weight parameters are properly implemented with clear documentation explaining their purpose and default values. The Option type allows for optional configuration while maintaining type safety.
193-204: LGTM: Proper handling of optional flags in command line generation.The as_vec method correctly includes the new weight flags only when they are set, maintaining the expected behavior for command line argument generation.
docs/guides/dynamo_run.md (2)
3-34: LGTM: Well-structured table of contents.The hierarchical restructuring of the table of contents improves the document's organization and readability.
44-44: LGTM: Usage example updated with new KV router weight flags.The CLI usage example correctly documents the three new optional weight parameters with their default values, providing clear guidance for users wanting to tune KV-aware routing.
launch/dynamo-run/src/input/http.rs (1)
49-52: LGTM: Correct parameter propagation to run_watcher.The new weight parameters are properly passed from the flags to the run_watcher function when using dynamic engine configuration.
lib/llm/src/discovery/watcher.rs (2)
39-42: LGTM: Proper addition of weight fields to ModelWatcher.The three new optional weight fields are correctly added to the ModelWatcher struct with consistent Option typing.
49-61: LGTM: Constructor properly updated for new weight parameters.The ModelWatcher::new constructor correctly accepts and initializes the three new weight parameters.
lib/llm/src/discovery/model_manager.rs (3)
186-188: LGTM: Clean addition of configurable weight parametersThe three new optional weight parameters are properly typed and consistently named across the function signature.
223-225: LGTM: Consistent parameter propagationThe method signature correctly mirrors the parameters from
kv_chooser_for, maintaining consistency in the API.
227-231: LGTM: Proper instantiation with configurable weightsThe change from default constructor to
DefaultWorkerSelector::new()with the weight parameters correctly enables the configurable routing behavior.lib/llm/src/kv_router/scheduler.rs (5)
234-239: LGTM: Well-structured configurable weight fieldsThe addition of the three weight fields with clear, descriptive names enables the desired configurable routing behavior. The
Clonederivation is appropriate for this data structure.
241-249: LGTM: Reasonable default valuesThe default values (2.0 for overlap_score_weight, 1.0 for both cache usage and waiting requests) appear well-balanced and maintain backward compatibility.
251-263: LGTM: Clean constructor with optional parametersThe constructor properly handles optional parameters with sensible fallbacks to the same defaults used in the
Defaultimplementation, ensuring consistency.
99-99: LGTM: Consistent use of default implementationUsing
Default::default()maintains backward compatibility while enabling the new configurable behavior when weights are explicitly provided.
320-323: LGTM: Updated trace logging with dynamic weightsThe trace logging correctly reflects the configurable weights in the formula, which will be helpful for debugging and performance tuning.
|
This is great. It should solve most of the times people want to write a custom router. Can you put those three values in a struct, and pass that around instead? That will allow us to add / modify settings in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/llm/src/kv_router.rs (1)
83-100: Consider optimizing the constructor implementation.The constructor works correctly but has a minor inefficiency and lacks input validation.
Apply this diff to optimize the constructor and add basic validation:
pub fn new( overlap_score_weight: Option<f64>, gpu_cache_usage_weight: Option<f64>, waiting_requests_weight: Option<f64>, ) -> Self { - let default = Self::default(); Self { - overlap_score_weight: overlap_score_weight.unwrap_or(default.overlap_score_weight), - gpu_cache_usage_weight: gpu_cache_usage_weight - .unwrap_or(default.gpu_cache_usage_weight), - waiting_requests_weight: waiting_requests_weight - .unwrap_or(default.waiting_requests_weight), + overlap_score_weight: overlap_score_weight.unwrap_or(2.0).max(0.0), + gpu_cache_usage_weight: gpu_cache_usage_weight.unwrap_or(1.0).max(0.0), + waiting_requests_weight: waiting_requests_weight.unwrap_or(1.0).max(0.0), } }This eliminates the unnecessary default instance creation and adds basic validation to ensure non-negative weights.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
components/http/src/main.rs(1 hunks)launch/dynamo-run/src/flags.rs(4 hunks)launch/dynamo-run/src/input/common.rs(1 hunks)launch/dynamo-run/src/input/http.rs(3 hunks)launch/llmctl/src/main.rs(2 hunks)lib/llm/src/discovery/model_manager.rs(4 hunks)lib/llm/src/discovery/watcher.rs(4 hunks)lib/llm/src/kv_router.rs(1 hunks)lib/llm/src/kv_router/scheduler.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- launch/dynamo-run/src/input/common.rs
- launch/llmctl/src/main.rs
- components/http/src/main.rs
- launch/dynamo-run/src/input/http.rs
- launch/dynamo-run/src/flags.rs
- lib/llm/src/discovery/watcher.rs
- lib/llm/src/kv_router/scheduler.rs
- lib/llm/src/discovery/model_manager.rs
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (2)
lib/llm/src/kv_router.rs (2)
57-71: Well-structured configuration struct with clear documentation.The
KvRouterConfigstruct is well-designed with comprehensive documentation explaining the purpose of each weight parameter. The field visibility and documentation follow good Rust practices.
73-81: Appropriate default values for KV router weights.The default implementation provides sensible starting values that prioritize KV cache reuse (overlap_score_weight: 2.0) while balancing GPU cache usage and waiting requests (both 1.0).
Expose router configurations to dynamo-run (following the guide: https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/kv_router_perf_tuning.md)
Rust reviewers: please double-check everything, I'm not sure if I break anything
Summary by CodeRabbit
New Features
--kv-overlap-score-weight,--kv-gpu-cache-usage-weight, and--kv-waiting-requests-weight, allowing users to fine-tune worker selection behavior.Documentation
dynamo runguide to include detailed steps and descriptions for the new KV-aware routing flags and usage examples.