-
Notifications
You must be signed in to change notification settings - Fork 676
feat: skip router when worker id is pre-determined #2450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces an optional backend_instance_id across request layers (NvExt, PreprocessedRequest, builder) and updates KvPushRouter::generate to route directly to the specified instance when provided; otherwise it uses existing token-based best-match logic. Overlap is set to zero for explicit routing. No exported function signatures changed. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant OpenAI API Layer
participant Preprocessor
participant KvPushRouter
participant Chooser
participant Worker
Client->>OpenAI API Layer: Request (+optional nvext.backend_instance_id)
OpenAI API Layer->>Preprocessor: Build PreprocessedRequest
Preprocessor-->>KvPushRouter: PreprocessedRequest (+optional backend_instance_id)
alt backend_instance_id provided
KvPushRouter->>Worker: Route to backend_instance_id (overlap=0)
else no backend_instance_id
KvPushRouter->>Chooser: find_best_match(token_ids)
Chooser-->>KvPushRouter: instance_id, overlap_amount
KvPushRouter->>Worker: Route to chosen instance_id
end
Worker-->>KvPushRouter: Stream + lifecycle events
KvPushRouter-->>OpenAI API Layer: Stream response
OpenAI API Layer-->>Client: Stream response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
lib/llm/src/protocols/openai/nvext.rs (3)
65-70: LGTM: Optional routing hook is well-shaped and consistent with existing patterns.The new NvExt.backend_instance_id field is appropriately optional, builder-friendly, and serde-friendly. Nice placement and docs.
Two small follow-ups:
- Consider clarifying docs to say “routed to the backend instance with the given ID.”
- Consider whether negative IDs should be rejected. If instance IDs are always positive, a lightweight validation in validate_nv_ext could enforce id > 0 to catch user errors early.
141-154: Add a default assertion for backend_instance_id in the builder-default test.To keep tests aligned with the new field, assert its default is None.
Apply:
fn test_nv_ext_builder_default() { let nv_ext = NvExt::builder().build().unwrap(); assert_eq!(nv_ext.ignore_eos, None); assert_eq!(nv_ext.top_k, None); assert_eq!(nv_ext.repetition_penalty, None); assert_eq!(nv_ext.greed_sampling, None); + assert_eq!(nv_ext.backend_instance_id, None); assert_eq!(nv_ext.guided_json, None); assert_eq!(nv_ext.guided_regex, None); assert_eq!(nv_ext.guided_grammar, None); assert_eq!(nv_ext.guided_choice, None); }
156-192: Optionally expand the custom-builder test to cover backend_instance_id.Exercising the setter will catch any regressions in builder/serde wiring.
Apply:
fn test_nv_ext_builder_custom() { let nv_ext = NvExt::builder() .ignore_eos(true) .top_k(10) .repetition_penalty(1.5) .greed_sampling(true) + .backend_instance_id(42) .guided_json(serde_json::json!({"type": "object"})) .guided_regex("^[0-9]+$".to_string()) .guided_grammar("S -> 'a' S 'b' | 'c'".to_string()) .guided_choice(vec!["choice1".to_string(), "choice2".to_string()]) .guided_decoding_backend("xgrammar".to_string()) .build() .unwrap(); assert_eq!(nv_ext.ignore_eos, Some(true)); assert_eq!(nv_ext.top_k, Some(10)); assert_eq!(nv_ext.repetition_penalty, Some(1.5)); assert_eq!(nv_ext.greed_sampling, Some(true)); + assert_eq!(nv_ext.backend_instance_id, Some(42)); assert_eq!( nv_ext.guided_json, Some(serde_json::json!({"type": "object"})) );lib/llm/src/protocols/common/preprocessor.rs (1)
54-56: Consider serde defaults/skips for forward/backward compatibility.Given this struct is serialized, adding serde defaults can reduce wire size and improve compatibility with older readers.
Apply:
/// Targeted backend instance ID for the request - #[builder(default)] + #[builder(default)] + #[serde(default, skip_serializing_if = "Option::is_none")] pub backend_instance_id: Option<i64>,lib/llm/src/kv_router.rs (1)
340-347: Optional: validate explicit instance existence before direct routing.If a user-supplied backend_instance_id is stale or unknown, fail fast with a clear error instead of letting direct(...) bubble a less actionable error. Consider a quick guard using the scheduler/known instances map.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
lib/llm/src/kv_router.rs(1 hunks)lib/llm/src/preprocessor.rs(1 hunks)lib/llm/src/protocols/common/preprocessor.rs(1 hunks)lib/llm/src/protocols/openai/nvext.rs(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
lib/llm/src/protocols/openai/nvext.rs (2)
lib/llm/src/protocols/common/preprocessor.rs (2)
builder(66-68)builder(103-105)lib/llm/src/protocols/common.rs (1)
builder(199-201)
lib/llm/src/protocols/common/preprocessor.rs (4)
lib/llm/src/tokenizers.rs (1)
builder(426-428)lib/llm/src/protocols/common.rs (1)
builder(199-201)lib/llm/src/protocols/openai/completions.rs (1)
builder(223-225)lib/llm/src/protocols/openai/nvext.rs (1)
builder(105-107)
lib/llm/src/preprocessor.rs (3)
lib/llm/src/protocols/openai/chat_completions.rs (3)
nvext(84-86)nvext(144-146)nvext(226-228)lib/llm/src/protocols/openai/completions.rs (4)
nvext(85-87)nvext(134-136)nvext(195-197)builder(223-225)lib/llm/src/protocols/openai/nvext.rs (2)
nvext(21-21)builder(105-107)
lib/llm/src/kv_router.rs (2)
lib/runtime/src/component.rs (2)
id(104-106)id(363-369)lib/runtime/src/transports/etcd.rs (1)
id(64-66)
🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/kv_router.rs
[error] 373-373: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.
[error] 381-381: Step: cargo clippy failed. E0425 cannot find value 'context_id' in this scope.
[error] 345-345: Step: cargo clippy failed. E0061 this method takes 2 arguments but 1 argument was supplied; missing the required 'context_id' parameter for find_best_match.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
lib/llm/src/preprocessor.rs (1)
257-260: Propagation OK — builder signature matches OptionVerified nvext.backend_instance_id is Option and PreprocessedRequest.backend_instance_id is declared as Option with #[builder(default)] (derived builder). No manual impl/override of PreprocessedRequestBuilder was found.
Checked locations:
- lib/llm/src/preprocessor.rs:259 — builder.backend_instance_id(nvext.backend_instance_id);
- lib/llm/src/protocols/common/preprocessor.rs:56 — pub backend_instance_id: Option
- lib/llm/src/protocols/openai/nvext.rs:70 — pub backend_instance_id: Option
Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Overview:
This is a refresh of Biswa's original #2117
feat: skip router when worker id is pre-determined (EPP-aware Gateway Integration)
In combination with: https://github.com/ai-dynamo/dynamo/pull/1787/files
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit