Skip to content

Conversation

@atchernych
Copy link
Contributor

@atchernych atchernych commented Aug 14, 2025

Overview:

This is a refresh of Biswa's original #2117
feat: skip router when worker id is pre-determined (EPP-aware Gateway Integration)
In combination with: https://github.com/ai-dynamo/dynamo/pull/1787/files

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added an optional request parameter to target a specific backend instance. When provided, requests are routed directly to that instance.
    • Preserves existing behavior when not set: requests automatically route to the best-matching instance.
    • Fully backward compatible; no changes required for existing integrations.
    • Improves routing control for advanced use cases while maintaining current defaults.

@atchernych atchernych requested a review from a team as a code owner August 14, 2025 22:47
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 14, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the feat label Aug 14, 2025
@atchernych atchernych changed the title feat: pr-2117 merged main feat: skip router when worker id is pre-determined Aug 14, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

Introduces an optional backend_instance_id across request layers (NvExt, PreprocessedRequest, builder) and updates KvPushRouter::generate to route directly to the specified instance when provided; otherwise it uses existing token-based best-match logic. Overlap is set to zero for explicit routing. No exported function signatures changed.

Changes

Cohort / File(s) Summary
Routing override in KV router
lib/llm/src/kv_router.rs
Added branch in generate: if request.backend_instance_id is set, route to that instance with overlap_amount=0; else use chooser.find_best_match. Subsequent flow unchanged; annotations and streaming behavior preserved.
backend_instance_id plumbing
lib/llm/src/protocols/openai/nvext.rs, lib/llm/src/protocols/common/preprocessor.rs, lib/llm/src/preprocessor.rs
NvExt: new optional backend_instance_id field (serde/builder). PreprocessedRequest: new Option backend_instance_id with builder(default). Preprocessor: propagates nvext.backend_instance_id into PreprocessedRequest.
Builder API extension
lib/llm/src/protocols/common/llm_backend.rs
PreprocessedRequestBuilder: added backend_instance_id(...) setter to support wiring through the targeted instance ID.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant OpenAI API Layer
    participant Preprocessor
    participant KvPushRouter
    participant Chooser
    participant Worker

    Client->>OpenAI API Layer: Request (+optional nvext.backend_instance_id)
    OpenAI API Layer->>Preprocessor: Build PreprocessedRequest
    Preprocessor-->>KvPushRouter: PreprocessedRequest (+optional backend_instance_id)

    alt backend_instance_id provided
        KvPushRouter->>Worker: Route to backend_instance_id (overlap=0)
    else no backend_instance_id
        KvPushRouter->>Chooser: find_best_match(token_ids)
        Chooser-->>KvPushRouter: instance_id, overlap_amount
        KvPushRouter->>Worker: Route to chosen instance_id
    end

    Worker-->>KvPushRouter: Stream + lifecycle events
    KvPushRouter-->>OpenAI API Layer: Stream response
    OpenAI API Layer-->>Client: Stream response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A rabbit routes with ears held high,
“Direct to burrow nine!” I cry.
If none is named, I sniff the trail,
Match the tokens, never fail.
New paths flagged, the streams align—
Hop, hop, packets right on time. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
lib/llm/src/protocols/openai/nvext.rs (3)

65-70: LGTM: Optional routing hook is well-shaped and consistent with existing patterns.

The new NvExt.backend_instance_id field is appropriately optional, builder-friendly, and serde-friendly. Nice placement and docs.

Two small follow-ups:

  • Consider clarifying docs to say “routed to the backend instance with the given ID.”
  • Consider whether negative IDs should be rejected. If instance IDs are always positive, a lightweight validation in validate_nv_ext could enforce id > 0 to catch user errors early.

141-154: Add a default assertion for backend_instance_id in the builder-default test.

To keep tests aligned with the new field, assert its default is None.

Apply:

     fn test_nv_ext_builder_default() {
         let nv_ext = NvExt::builder().build().unwrap();
         assert_eq!(nv_ext.ignore_eos, None);
         assert_eq!(nv_ext.top_k, None);
         assert_eq!(nv_ext.repetition_penalty, None);
         assert_eq!(nv_ext.greed_sampling, None);
+        assert_eq!(nv_ext.backend_instance_id, None);
         assert_eq!(nv_ext.guided_json, None);
         assert_eq!(nv_ext.guided_regex, None);
         assert_eq!(nv_ext.guided_grammar, None);
         assert_eq!(nv_ext.guided_choice, None);
     }

156-192: Optionally expand the custom-builder test to cover backend_instance_id.

Exercising the setter will catch any regressions in builder/serde wiring.

Apply:

     fn test_nv_ext_builder_custom() {
         let nv_ext = NvExt::builder()
             .ignore_eos(true)
             .top_k(10)
             .repetition_penalty(1.5)
             .greed_sampling(true)
+            .backend_instance_id(42)
             .guided_json(serde_json::json!({"type": "object"}))
             .guided_regex("^[0-9]+$".to_string())
             .guided_grammar("S -> 'a' S 'b' | 'c'".to_string())
             .guided_choice(vec!["choice1".to_string(), "choice2".to_string()])
             .guided_decoding_backend("xgrammar".to_string())
             .build()
             .unwrap();

         assert_eq!(nv_ext.ignore_eos, Some(true));
         assert_eq!(nv_ext.top_k, Some(10));
         assert_eq!(nv_ext.repetition_penalty, Some(1.5));
         assert_eq!(nv_ext.greed_sampling, Some(true));
+        assert_eq!(nv_ext.backend_instance_id, Some(42));
         assert_eq!(
             nv_ext.guided_json,
             Some(serde_json::json!({"type": "object"}))
         );
lib/llm/src/protocols/common/preprocessor.rs (1)

54-56: Consider serde defaults/skips for forward/backward compatibility.

Given this struct is serialized, adding serde defaults can reduce wire size and improve compatibility with older readers.

Apply:

     /// Targeted backend instance ID for the request
-    #[builder(default)]
+    #[builder(default)]
+    #[serde(default, skip_serializing_if = "Option::is_none")]
     pub backend_instance_id: Option<i64>,
lib/llm/src/kv_router.rs (1)

340-347: Optional: validate explicit instance existence before direct routing.

If a user-supplied backend_instance_id is stale or unknown, fail fast with a clear error instead of letting direct(...) bubble a less actionable error. Consider a quick guard using the scheduler/known instances map.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddb3ef and 6230741.

📒 Files selected for processing (4)
  • lib/llm/src/kv_router.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (1 hunks)
  • lib/llm/src/protocols/common/preprocessor.rs (1 hunks)
  • lib/llm/src/protocols/openai/nvext.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/llm/src/protocols/openai/nvext.rs (2)
lib/llm/src/protocols/common/preprocessor.rs (2)
  • builder (66-68)
  • builder (103-105)
lib/llm/src/protocols/common.rs (1)
  • builder (199-201)
lib/llm/src/protocols/common/preprocessor.rs (4)
lib/llm/src/tokenizers.rs (1)
  • builder (426-428)
lib/llm/src/protocols/common.rs (1)
  • builder (199-201)
lib/llm/src/protocols/openai/completions.rs (1)
  • builder (223-225)
lib/llm/src/protocols/openai/nvext.rs (1)
  • builder (105-107)
lib/llm/src/preprocessor.rs (3)
lib/llm/src/protocols/openai/chat_completions.rs (3)
  • nvext (84-86)
  • nvext (144-146)
  • nvext (226-228)
lib/llm/src/protocols/openai/completions.rs (4)
  • nvext (85-87)
  • nvext (134-136)
  • nvext (195-197)
  • builder (223-225)
lib/llm/src/protocols/openai/nvext.rs (2)
  • nvext (21-21)
  • builder (105-107)
lib/llm/src/kv_router.rs (2)
lib/runtime/src/component.rs (2)
  • id (104-106)
  • id (363-369)
lib/runtime/src/transports/etcd.rs (1)
  • id (64-66)
🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/kv_router.rs

[error] 373-373: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.


[error] 381-381: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.


[error] 345-345: Step: cargo clippy failed. E0061 this method takes 2 arguments but 1 argument was supplied; missing the required 'context_id' parameter for find_best_match.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
lib/llm/src/preprocessor.rs (1)

257-260: Propagation OK — builder signature matches Option

Verified nvext.backend_instance_id is Option and PreprocessedRequest.backend_instance_id is declared as Option with #[builder(default)] (derived builder). No manual impl/override of PreprocessedRequestBuilder was found.

Checked locations:

  • lib/llm/src/preprocessor.rs:259 — builder.backend_instance_id(nvext.backend_instance_id);
  • lib/llm/src/protocols/common/preprocessor.rs:56 — pub backend_instance_id: Option
  • lib/llm/src/protocols/openai/nvext.rs:70 — pub backend_instance_id: Option

@pull-request-size pull-request-size bot added size/M and removed size/S labels Aug 15, 2025
@atchernych atchernych enabled auto-merge (squash) August 15, 2025 19:58
@atchernych atchernych merged commit 6bc6d40 into main Aug 19, 2025
10 checks passed
@atchernych atchernych deleted the pr-2117 branch August 19, 2025 12:50
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants