Skip to content

Conversation

@michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented Sep 14, 2025

Overview:

  • this PR rewrites the currently non-functional KVRouter -generate, to have a 3 phase protocol:

1 phase: New:

  • works like the default at the moment, gets a new routing decision
  1. phase (optional, signaling): mark prefill
  • marks as prefill complete. The request has completed prefill. This will come e.g. before any TTL, meaning we are signaing the router
    3, phase (optional, signaling):
  • marks as free, request has reached the end of live and is no longer decoded.

So, this Pr adds the option to use the protocol, signaling a specfic router replica that the request is no longer.

This is the minimal idea we are using today

  • optional: more specfic return types for successfull free: allows for more help
  • optiona: return warning if e.g. free, but free is not successfull (double free etc)
  • rename of the

Considerations: Why not the pushrouter:

  • few routers can't be trused with reverse proxy (>100 input rps per router expected, >10k streaming yields/s)
  • PushRouter<PreprocessedRequest, Annotated> is actually hard to fulfill for us (we dont want LLMEngineOutput)
  • reverse proxy goes down, whole chain goes down. (uptime)

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added support for marking prefill completion and freeing a session, enabling smoother lifecycle management during generation.
    • Responses now include overlap metrics, providing better insight for clients to optimize streaming and caching behavior.
    • Enhanced routing logic to select the best worker for new requests based on token context.
  • Refactor

    • Updated internal router behavior and protocol to use explicit request types for clearer, more reliable interactions.

@michaelfeil michaelfeil requested a review from a team as a code owner September 14, 2025 03:26
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Sep 14, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 14, 2025

Walkthrough

KvRouter now processes three request types—New, MarkPrefill, and MarkFree—deriving context_id from ctx.context().id(). New routes via find_best_match and returns worker_id plus overlap_blocks. MarkPrefill and MarkFree update state via mark_prefill_completed and free, respectively, returning sentinel worker_id and overlap_blocks. Protocols switch RouterRequest to a tagged enum and add overlap_blocks to RouterResponse.

Changes

Cohort / File(s) Change summary
Router control-flow updates
lib/llm/src/kv_router.rs
Replace single-path routing with match over RouterRequest::{New, MarkPrefill, MarkFree}. Use ctx.context().id().to_string() as context_id. For New, call find_best_match(..., update_states=true) and return worker_id and overlap_blocks. For MarkPrefill and MarkFree, call mark_prefill_completed/free and return worker_id=-1, overlap_blocks=0. Update comments accordingly.
Protocol/schema changes
lib/llm/src/kv_router/protocols.rs
Change RouterRequest from struct to #[serde(tag="method")] enum with variants: New { tokens }, MarkPrefill, MarkFree. Add Default impl returning New { tokens: [] }. Add RouterResponse.overlap_blocks: u32. Adjust serde names (snake_case; New renamed to "new").

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant KvRouter
  participant Matcher as find_best_match
  participant Prefill as mark_prefill_completed
  participant Free as free

  rect rgb(235, 245, 255)
    note over Client,KvRouter: New request
    Client->>KvRouter: RouterRequest::New { tokens }
    KvRouter->>Matcher: find_best_match(context_id, tokens, None, true)
    Matcher-->>KvRouter: (worker_id, overlap_blocks)
    KvRouter-->>Client: RouterResponse { worker_id, overlap_blocks }
  end

  rect rgb(240, 255, 240)
    note over Client,KvRouter: Mark prefill completed
    Client->>KvRouter: RouterRequest::MarkPrefill
    KvRouter->>Prefill: mark_prefill_completed(context_id)
    Prefill-->>KvRouter: ok
    KvRouter-->>Client: RouterResponse { worker_id: -1, overlap_blocks: 0 }
  end

  rect rgb(255, 245, 235)
    note over Client,KvRouter: Mark free
    Client->>KvRouter: RouterRequest::MarkFree
    KvRouter->>Free: free(context_id)
    Free-->>KvRouter: ok
    KvRouter-->>Client: RouterResponse { worker_id: -1, overlap_blocks: 0 }
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

In burrows of bytes I hop and weave,
Three little signals—new, prefill, free.
I sniff out blocks, overlap I see,
Then thump a reply: who’ll carry thee?
When caches are calm, I’m light on my feet—
A router rabbit, routing neat. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description includes a helpful high-level Overview describing the 3-phase protocol and motivation, but it does not follow the repository template: the "Details" and "Where should the reviewer start?" sections are empty and the Related Issues field contains a placeholder ("#xxx"), so file-level changes, reviewer guidance, and issue linkage are missing. Because required template sections are incomplete, the description does not meet the repository's documented expectations. Please populate the "Details" section with concrete file-level changes (e.g., RouterRequest enum serialization changes, RouterResponse.overlap_blocks addition, and kv_router generate logic updates), fill "Where should the reviewer start?" with specific files and functions to review, replace the placeholder related-issue with a real issue or remove it, and add notes about tests/compatibility or migration impact so reviewers can validate behavior and serialization changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: kv commit router" is concise and directly related to the main change (introducing a KV commit/router feature and protocol) and therefore accurately summarizes the primary intent of the changeset from the author's perspective. It follows conventional commit-style prefixing and is clear enough for a teammate scanning history to understand the primary change.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
lib/llm/src/kv_router/protocols.rs (1)

8-18: Drop redundant serde rename and stray comment

  • #[serde(rename = "new")] is redundant with rename_all = "snake_case".
  • // ini looks like a leftover.
 #[derive(Debug, Clone, Serialize, Deserialize)]
 #[serde(tag = "method", rename_all = "snake_case")]
 pub enum RouterRequest {
-    // ini
-    #[serde(rename = "new")]
+    // new routing decision
     New {
         tokens: Vec<Token>,
     },
     MarkPrefill {},
     MarkFree {},
 }
lib/llm/src/kv_router.rs (5)

379-381: Fix trailing whitespace and tighten comment (CI is failing)

Pre-commit is failing on trailing whitespace. Also simplify wording.

-// NOTE: KVRouter works like a PushRouter, 
-
-// but without the reverse proxy functionality, but based on contract of 3 request types
+// NOTE: KVRouter behaves like a PushRouter without reverse proxy functionality,
+// using a 3-method request contract.

388-416: Type alignment for tokens to avoid confusion

RouterRequest::New carries Vec<Token>, while find_best_match expects &[u32]. If Token is a type alias for u32, it compiles, but using Token improves clarity.

-    async fn find_best_match(
+    async fn find_best_match(
         &self,
         context_id: &str,
-        tokens: &[u32],
+        tokens: &[Token],
         router_config_override: Option<&RouterConfigOverride>,
         update_states: bool,
     ) -> anyhow::Result<(i64, u32)> {

Call sites (no functional change if type Token = u32):

-                let (worker_id, overlap_blocks) = self
-                    .find_best_match(&context_id, &tokens, None, true)
+                let (worker_id, overlap_blocks) = self
+                    .find_best_match(&context_id, &tokens, None, true)
                     .await?;

401-415: Add trace logs for lifecycle signals

Helps diagnose double-prefill/free or ordering issues across replicas.

             RouterRequest::MarkPrefill {} => {
+                tracing::trace!(%context_id, "mark_prefill: prefill completed");
                 self.mark_prefill_completed(&context_id).await;

                 RouterResponse {
                     worker_id: -1,
                     overlap_blocks: 0,
                 }
             }
             RouterRequest::MarkFree {} => {
+                tracing::trace!(%context_id, "mark_free: request freed");
                 self.free(&context_id).await;
                 RouterResponse {
                     worker_id: -1,
                     overlap_blocks: 0,
                 }
             }

285-324: Avoid unnecessary clones in hot path (if scheduler API allows)

seq_hashes.clone() and overlap_scores.clone() can be costly. If the scheduler accepts references or owned values you no longer need, pass by move/borrow.

Please confirm KvScheduler::schedule signature; if it can accept &OverlapScores and &[SequenceHash], we can drop clones. If not, ignore.


388-416: Consider returning richer statuses for MarkPrefill/MarkFree

Surface outcomes like already-marked, unknown-context, or double-free warnings as part of the response. This aligns with the PR’s “warnings for unsuccessful frees” idea.

Happy to wire a minimal RouterResponse::PrefillMarked/AlreadyMarked/UnknownContext and plumb booleans from KvScheduler.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8ecc40 and 55e02bb.

📒 Files selected for processing (2)
  • lib/llm/src/kv_router.rs (1 hunks)
  • lib/llm/src/kv_router/protocols.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/kv_router.rs (2)
examples/deployments/router_standalone/router.py (2)
  • RouterRequest (34-36)
  • RouterResponse (39-40)
lib/llm/src/migration.rs (2)
  • generate (52-77)
  • generate (274-458)
lib/llm/src/kv_router/protocols.rs (1)
examples/deployments/router_standalone/router.py (2)
  • RouterRequest (34-36)
  • RouterResponse (39-40)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3024/merge) by michaelfeil.
lib/llm/src/kv_router.rs

[error] 381-381: Command 'pre-commit run --show-diff-on-failure --color=always --all-files' failed due to trailing-whitespace hook. Trailing whitespace detected; the hook modified lib/llm/src/kv_router.rs and exited with code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (1)
lib/llm/src/kv_router/protocols.rs (1)

20-23: Remove Default impl for RouterRequest — avoid accidental empty requests

Defaulting to New { tokens: vec![] } can create accidental empty/scheduled requests via ..Default::default() or serde defaults; remove this impl or require callers to construct RouterRequest explicitly.

Location: lib/llm/src/kv_router/protocols.rs

Suggested change:

-impl Default for RouterRequest {
-    fn default() -> Self {
-        RouterRequest::New { tokens: vec![] }
-    }
-}
+// Consider removing Default; construct explicitly at call sites.

Signed-off-by: michaelfeil <me@michaelfeil.eu>
michaelfeil and others added 2 commits September 15, 2025 23:42
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: michaelfeil <me@michaelfeil.eu>
@PeaBrane PeaBrane enabled auto-merge (squash) September 16, 2025 20:14
@PeaBrane
Copy link
Contributor

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 16, 2025

/ok to test

@PeaBrane, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@PeaBrane
Copy link
Contributor

/ok to test a80fdfc

@PeaBrane PeaBrane merged commit 0373b89 into ai-dynamo:main Sep 16, 2025
15 of 16 checks passed
kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025
Signed-off-by: michaelfeil <me@michaelfeil.eu>
Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants