-
Notifications
You must be signed in to change notification settings - Fork 680
feat: kv commit router #3024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: kv commit router #3024
Conversation
|
👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughKvRouter now processes three request types—New, MarkPrefill, and MarkFree—deriving context_id from ctx.context().id(). New routes via find_best_match and returns worker_id plus overlap_blocks. MarkPrefill and MarkFree update state via mark_prefill_completed and free, respectively, returning sentinel worker_id and overlap_blocks. Protocols switch RouterRequest to a tagged enum and add overlap_blocks to RouterResponse. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant KvRouter
participant Matcher as find_best_match
participant Prefill as mark_prefill_completed
participant Free as free
rect rgb(235, 245, 255)
note over Client,KvRouter: New request
Client->>KvRouter: RouterRequest::New { tokens }
KvRouter->>Matcher: find_best_match(context_id, tokens, None, true)
Matcher-->>KvRouter: (worker_id, overlap_blocks)
KvRouter-->>Client: RouterResponse { worker_id, overlap_blocks }
end
rect rgb(240, 255, 240)
note over Client,KvRouter: Mark prefill completed
Client->>KvRouter: RouterRequest::MarkPrefill
KvRouter->>Prefill: mark_prefill_completed(context_id)
Prefill-->>KvRouter: ok
KvRouter-->>Client: RouterResponse { worker_id: -1, overlap_blocks: 0 }
end
rect rgb(255, 245, 235)
note over Client,KvRouter: Mark free
Client->>KvRouter: RouterRequest::MarkFree
KvRouter->>Free: free(context_id)
Free-->>KvRouter: ok
KvRouter-->>Client: RouterResponse { worker_id: -1, overlap_blocks: 0 }
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
lib/llm/src/kv_router/protocols.rs (1)
8-18: Drop redundant serde rename and stray comment
#[serde(rename = "new")]is redundant withrename_all = "snake_case".// inilooks like a leftover.#[derive(Debug, Clone, Serialize, Deserialize)] #[serde(tag = "method", rename_all = "snake_case")] pub enum RouterRequest { - // ini - #[serde(rename = "new")] + // new routing decision New { tokens: Vec<Token>, }, MarkPrefill {}, MarkFree {}, }lib/llm/src/kv_router.rs (5)
379-381: Fix trailing whitespace and tighten comment (CI is failing)Pre-commit is failing on trailing whitespace. Also simplify wording.
-// NOTE: KVRouter works like a PushRouter, - -// but without the reverse proxy functionality, but based on contract of 3 request types +// NOTE: KVRouter behaves like a PushRouter without reverse proxy functionality, +// using a 3-method request contract.
388-416: Type alignment for tokens to avoid confusion
RouterRequest::NewcarriesVec<Token>, whilefind_best_matchexpects&[u32]. IfTokenis a type alias foru32, it compiles, but usingTokenimproves clarity.- async fn find_best_match( + async fn find_best_match( &self, context_id: &str, - tokens: &[u32], + tokens: &[Token], router_config_override: Option<&RouterConfigOverride>, update_states: bool, ) -> anyhow::Result<(i64, u32)> {Call sites (no functional change if
type Token = u32):- let (worker_id, overlap_blocks) = self - .find_best_match(&context_id, &tokens, None, true) + let (worker_id, overlap_blocks) = self + .find_best_match(&context_id, &tokens, None, true) .await?;
401-415: Add trace logs for lifecycle signalsHelps diagnose double-prefill/free or ordering issues across replicas.
RouterRequest::MarkPrefill {} => { + tracing::trace!(%context_id, "mark_prefill: prefill completed"); self.mark_prefill_completed(&context_id).await; RouterResponse { worker_id: -1, overlap_blocks: 0, } } RouterRequest::MarkFree {} => { + tracing::trace!(%context_id, "mark_free: request freed"); self.free(&context_id).await; RouterResponse { worker_id: -1, overlap_blocks: 0, } }
285-324: Avoid unnecessary clones in hot path (if scheduler API allows)
seq_hashes.clone()andoverlap_scores.clone()can be costly. If the scheduler accepts references or owned values you no longer need, pass by move/borrow.Please confirm
KvScheduler::schedulesignature; if it can accept&OverlapScoresand&[SequenceHash], we can drop clones. If not, ignore.
388-416: Consider returning richer statuses for MarkPrefill/MarkFreeSurface outcomes like already-marked, unknown-context, or double-free warnings as part of the response. This aligns with the PR’s “warnings for unsuccessful frees” idea.
Happy to wire a minimal
RouterResponse::PrefillMarked/AlreadyMarked/UnknownContextand plumb booleans fromKvScheduler.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/llm/src/kv_router.rs(1 hunks)lib/llm/src/kv_router/protocols.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/kv_router.rs (2)
examples/deployments/router_standalone/router.py (2)
RouterRequest(34-36)RouterResponse(39-40)lib/llm/src/migration.rs (2)
generate(52-77)generate(274-458)
lib/llm/src/kv_router/protocols.rs (1)
examples/deployments/router_standalone/router.py (2)
RouterRequest(34-36)RouterResponse(39-40)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3024/merge) by michaelfeil.
lib/llm/src/kv_router.rs
[error] 381-381: Command 'pre-commit run --show-diff-on-failure --color=always --all-files' failed due to trailing-whitespace hook. Trailing whitespace detected; the hook modified lib/llm/src/kv_router.rs and exited with code 1.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (1)
lib/llm/src/kv_router/protocols.rs (1)
20-23: Remove Default impl for RouterRequest — avoid accidental empty requestsDefaulting to
New { tokens: vec![] }can create accidental empty/scheduled requests via..Default::default()or serde defaults; remove this impl or require callers to construct RouterRequest explicitly.Location: lib/llm/src/kv_router/protocols.rs
Suggested change:
-impl Default for RouterRequest { - fn default() -> Self { - RouterRequest::New { tokens: vec![] } - } -} +// Consider removing Default; construct explicitly at call sites.
Signed-off-by: michaelfeil <me@michaelfeil.eu>
55e02bb to
c7dbde3
Compare
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: michaelfeil <me@michaelfeil.eu>
|
/ok to test |
@PeaBrane, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
|
/ok to test a80fdfc |
Signed-off-by: michaelfeil <me@michaelfeil.eu> Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Overview:
1 phase: New:
3, phase (optional, signaling):
So, this Pr adds the option to use the protocol, signaling a specfic router replica that the request is no longer.
This is the minimal idea we are using today
Considerations: Why not the pushrouter:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Refactor