feat: skip router when worker id is pre-determined #2450

atchernych · 2025-08-14T22:47:28Z

Overview:

This is a refresh of Biswa's original #2117
feat: skip router when worker id is pre-determined (EPP-aware Gateway Integration)
In combination with: https://github.com/ai-dynamo/dynamo/pull/1787/files

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added an optional request parameter to target a specific backend instance. When provided, requests are routed directly to that instance.
- Preserves existing behavior when not set: requests automatically route to the best-matching instance.
- Fully backward compatible; no changes required for existing integrations.
- Improves routing control for advanced use cases while maintaining current defaults.

copy-pr-bot · 2025-08-14T22:47:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-14T22:53:43Z

Walkthrough

Introduces an optional backend_instance_id across request layers (NvExt, PreprocessedRequest, builder) and updates KvPushRouter::generate to route directly to the specified instance when provided; otherwise it uses existing token-based best-match logic. Overlap is set to zero for explicit routing. No exported function signatures changed.

Changes

Cohort / File(s)	Summary
Routing override in KV router `lib/llm/src/kv_router.rs`	Added branch in generate: if request.backend_instance_id is set, route to that instance with overlap_amount=0; else use chooser.find_best_match. Subsequent flow unchanged; annotations and streaming behavior preserved.
backend_instance_id plumbing `lib/llm/src/protocols/openai/nvext.rs`, `lib/llm/src/protocols/common/preprocessor.rs`, `lib/llm/src/preprocessor.rs`	NvExt: new optional backend_instance_id field (serde/builder). PreprocessedRequest: new Option backend_instance_id with builder(default). Preprocessor: propagates nvext.backend_instance_id into PreprocessedRequest.
Builder API extension `lib/llm/src/protocols/common/llm_backend.rs`	PreprocessedRequestBuilder: added backend_instance_id(...) setter to support wiring through the targeted instance ID.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAI API Layer
    participant Preprocessor
    participant KvPushRouter
    participant Chooser
    participant Worker

    Client->>OpenAI API Layer: Request (+optional nvext.backend_instance_id)
    OpenAI API Layer->>Preprocessor: Build PreprocessedRequest
    Preprocessor-->>KvPushRouter: PreprocessedRequest (+optional backend_instance_id)

    alt backend_instance_id provided
        KvPushRouter->>Worker: Route to backend_instance_id (overlap=0)
    else no backend_instance_id
        KvPushRouter->>Chooser: find_best_match(token_ids)
        Chooser-->>KvPushRouter: instance_id, overlap_amount
        KvPushRouter->>Worker: Route to chosen instance_id
    end

    Worker-->>KvPushRouter: Stream + lifecycle events
    KvPushRouter-->>OpenAI API Layer: Stream response
    OpenAI API Layer-->>Client: Stream response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: query instance_id based on routing strategy #1787: Also modifies KvPushRouter::generate for instance selection, adding an early-return path when query_instance_id exists.
feat: expose estimated kv cache hit in dynamo-run #1246: Touches kv routing and preprocessing paths related to overlap/instance selection, overlapping with modified fields and flow.
feat: Router replicas with state-sharing #2264: Refactors scheduling/prefill behavior in KvPushRouter::generate, interacting with the same routing logic.

Poem

A rabbit routes with ears held high,
“Direct to burrow nine!” I cry.
If none is named, I sniff the trail,
Match the tokens, never fail.
New paths flagged, the streams align—
Hop, hop, packets right on time. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

lib/llm/src/protocols/openai/nvext.rs (3)
65-70: LGTM: Optional routing hook is well-shaped and consistent with existing patterns.

The new NvExt.backend_instance_id field is appropriately optional, builder-friendly, and serde-friendly. Nice placement and docs.

Two small follow-ups:

Consider clarifying docs to say “routed to the backend instance with the given ID.”

Consider whether negative IDs should be rejected. If instance IDs are always positive, a lightweight validation in validate_nv_ext could enforce id > 0 to catch user errors early.

141-154: Add a default assertion for backend_instance_id in the builder-default test.

To keep tests aligned with the new field, assert its default is None.

Apply:
     fn test_nv_ext_builder_default() {
         let nv_ext = NvExt::builder().build().unwrap();
         assert_eq!(nv_ext.ignore_eos, None);
         assert_eq!(nv_ext.top_k, None);
         assert_eq!(nv_ext.repetition_penalty, None);
         assert_eq!(nv_ext.greed_sampling, None);
+        assert_eq!(nv_ext.backend_instance_id, None);
         assert_eq!(nv_ext.guided_json, None);
         assert_eq!(nv_ext.guided_regex, None);
         assert_eq!(nv_ext.guided_grammar, None);
         assert_eq!(nv_ext.guided_choice, None);
     }
156-192: Optionally expand the custom-builder test to cover backend_instance_id.

Exercising the setter will catch any regressions in builder/serde wiring.

Apply:
     fn test_nv_ext_builder_custom() {
         let nv_ext = NvExt::builder()
             .ignore_eos(true)
             .top_k(10)
             .repetition_penalty(1.5)
             .greed_sampling(true)
+            .backend_instance_id(42)
             .guided_json(serde_json::json!({"type": "object"}))
             .guided_regex("^[0-9]+$".to_string())
             .guided_grammar("S -> 'a' S 'b' | 'c'".to_string())
             .guided_choice(vec!["choice1".to_string(), "choice2".to_string()])
             .guided_decoding_backend("xgrammar".to_string())
             .build()
             .unwrap();

         assert_eq!(nv_ext.ignore_eos, Some(true));
         assert_eq!(nv_ext.top_k, Some(10));
         assert_eq!(nv_ext.repetition_penalty, Some(1.5));
         assert_eq!(nv_ext.greed_sampling, Some(true));
+        assert_eq!(nv_ext.backend_instance_id, Some(42));
         assert_eq!(
             nv_ext.guided_json,
             Some(serde_json::json!({"type": "object"}))
         );
lib/llm/src/protocols/common/preprocessor.rs (1)
54-56: Consider serde defaults/skips for forward/backward compatibility.

Given this struct is serialized, adding serde defaults can reduce wire size and improve compatibility with older readers.

Apply:
     /// Targeted backend instance ID for the request
-    #[builder(default)]
+    #[builder(default)]
+    #[serde(default, skip_serializing_if = "Option::is_none")]
     pub backend_instance_id: Option<i64>,
lib/llm/src/kv_router.rs (1)

340-347: Optional: validate explicit instance existence before direct routing.

If a user-supplied backend_instance_id is stale or unknown, fail fast with a clear error instead of letting direct(...) bubble a less actionable error. Consider a quick guard using the scheduler/known instances map.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddb3ef and 6230741.

📒 Files selected for processing (4)

lib/llm/src/kv_router.rs (1 hunks)
lib/llm/src/preprocessor.rs (1 hunks)
lib/llm/src/protocols/common/preprocessor.rs (1 hunks)
lib/llm/src/protocols/openai/nvext.rs (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (4)

lib/llm/src/protocols/openai/nvext.rs (2)

lib/llm/src/protocols/common/preprocessor.rs (2)

builder (66-68)

builder (103-105)

lib/llm/src/protocols/common.rs (1)

builder (199-201)

lib/llm/src/protocols/common/preprocessor.rs (4)

lib/llm/src/tokenizers.rs (1)

builder (426-428)

lib/llm/src/protocols/common.rs (1)

builder (199-201)

lib/llm/src/protocols/openai/completions.rs (1)

builder (223-225)

lib/llm/src/protocols/openai/nvext.rs (1)

builder (105-107)

lib/llm/src/preprocessor.rs (3)

lib/llm/src/protocols/openai/chat_completions.rs (3)

nvext (84-86)

nvext (144-146)

nvext (226-228)

lib/llm/src/protocols/openai/completions.rs (4)

nvext (85-87)

nvext (134-136)

nvext (195-197)

builder (223-225)

lib/llm/src/protocols/openai/nvext.rs (2)

nvext (21-21)

builder (105-107)

lib/llm/src/kv_router.rs (2)

lib/runtime/src/component.rs (2)

id (104-106)

id (363-369)

lib/runtime/src/transports/etcd.rs (1)

id (64-66)

🪛 GitHub Actions: Rust pre-merge checks

lib/llm/src/kv_router.rs

[error] 373-373: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.

[error] 381-381: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.

[error] 345-345: Step: cargo clippy failed. E0061 this method takes 2 arguments but 1 argument was supplied; missing the required 'context_id' parameter for find_best_match.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

lib/llm/src/preprocessor.rs (1)

257-260: Propagation OK — builder signature matches Option

Verified nvext.backend_instance_id is Option and PreprocessedRequest.backend_instance_id is declared as Option with #[builder(default)] (derived builder). No manual impl/override of PreprocessedRequestBuilder was found.

Checked locations:

lib/llm/src/preprocessor.rs:259 — builder.backend_instance_id(nvext.backend_instance_id);

lib/llm/src/protocols/common/preprocessor.rs:56 — pub backend_instance_id: Option

lib/llm/src/protocols/openai/nvext.rs:70 — pub backend_instance_id: Option

lib/llm/src/kv_router.rs

Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

biswapanda and others added 2 commits July 25, 2025 09:54

--wip--

030b063

merge main

6230741

atchernych requested a review from a team as a code owner August 14, 2025 22:47

pull-request-size bot added the size/S label Aug 14, 2025

github-actions bot added the feat label Aug 14, 2025

atchernych changed the title ~~feat: pr-2117 merged main~~ feat: skip router when worker id is pre-determined Aug 14, 2025

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

lib/llm/src/kv_router.rs Show resolved Hide resolved

atchernych added 4 commits August 14, 2025 16:28

fix call .find_best_match(&request.context_id

dfb0306

update lib/llm/src/kv_router.rs

32d2e9b

add comment

f5c3e53

add backend_instance_id: None, to 2 calls

256a44b

pull-request-size bot added size/M and removed size/S labels Aug 15, 2025

atchernych enabled auto-merge (squash) August 15, 2025 19:58

grahamking approved these changes Aug 19, 2025

View reviewed changes

atchernych merged commit 6bc6d40 into main Aug 19, 2025
10 checks passed

atchernych deleted the pr-2117 branch August 19, 2025 12:50

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

feat: skip router when worker id is pre-determined (#2450)

e33a07a

Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

biswapanda mentioned this pull request Sep 4, 2025

feat: skip router when worker id is pre-determined #2117

Closed

coderabbitai bot mentioned this pull request Sep 15, 2025

feat: Add router awareness of prefill workers with intelligent load b… #3025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: skip router when worker id is pre-determined #2450

feat: skip router when worker id is pre-determined #2450

Uh oh!

atchernych commented Aug 14, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Aug 14, 2025

Uh oh!

coderabbitai bot commented Aug 14, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: skip router when worker id is pre-determined #2450

feat: skip router when worker id is pre-determined #2450

Uh oh!

Conversation

atchernych commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 14, 2025

Uh oh!

coderabbitai bot commented Aug 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

atchernych commented Aug 14, 2025 •

edited

Loading