-
Notifications
You must be signed in to change notification settings - Fork 689
feat: query instance_id based on routing strategy #1787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce a conditional early-return path in the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant KvPushRouter
participant InnerEngine
Client->>KvPushRouter: generate(request)
alt request has "query_instance_id" annotation
KvPushRouter-->>Client: response with "worker_instance_id" annotation
else
KvPushRouter->>InnerEngine: direct(updated_request, instance_id)
InnerEngine-->>KvPushRouter: response stream
KvPushRouter-->>Client: response stream
end
Poem
Warning Review ran into problems🔥 ProblemsCheck-run timed out after 90 seconds. Some checks/pipelines were still in progress when the timeout was reached. Consider increasing the reviews.tools.github-checks.timeout_ms value in your CodeRabbit configuration to allow more time for checks to complete. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/llm/src/kv_router.rs (1)
254-266: LGTM: Well-structured conditional routing logic.The early-return pattern when
query_instance_idannotation is present is a clean approach that avoids unnecessary processing and provides direct routing information. The response structure with theworker_instance_idannotation is appropriate.Consider using
instance_id.to_string()directly in the annotation call to avoid the intermediate variable:- let instance_id_str = instance_id.to_string(); - let response = - Annotated::from_annotation("worker_instance_id", &instance_id_str)?; + let response = + Annotated::from_annotation("worker_instance_id", &instance_id.to_string())?;
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/llm/src/kv_router.rs(1 hunks)lib/llm/src/preprocessor.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
lib/llm/src/preprocessor.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
lib/llm/src/kv_router.rs (4)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (2)
lib/llm/src/preprocessor.rs (1)
415-415: LGTM: Improved conditional comment assignment.This change maintains the existing logic to set the comment field only when no event is already present, preventing overwriting of existing response metadata. The conditional assignment aligns well with the surrounding logic that checks
response.event.is_none().lib/llm/src/kv_router.rs (1)
249-249: LGTM: Clear annotation check for routing query.The annotation check is straightforward and matches the PR objective of querying instance_id based on routing strategy.
3afdfaf to
21fd272
Compare
21fd272 to
5aba407
Compare
|
The comment tells me what you did but not why you did it. Could you add to the description? |
Thanks @grahamking , updated the intro section with why. |
|
How does inference gateway know the |
inference gateway forwards the chat completion request to Frontend with added nv_ext annotation This PR implements the same. |
Overview:
Why:
This change supports inference gateway integration integration
Gateway extension applies filters/scorers to identify target worker which will handle an incoming request.
worker_idand don't want to route the request yet.2.In service path , we want to enforce the
worker_idwhich will bypass the routerWhat:
closes: https://linear.app/nvidia-dynamo/issue/DEP-161
Details:
The changes introduce a conditional early-return path in the KvPushRouter's generate method to respond with routing information when a specific annotation (
query_instance_id) is present.Additionally, the logic for setting the comment field in OpenAIPreprocessor is refined to prevent overwriting it when an event is already set.
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes