-
Notifications
You must be signed in to change notification settings - Fork 679
feat: align OpenAI response IDs with distributed trace IDs #2695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Graham King <grahamk@nvidia.com>
WalkthroughIntroduces per-request identifiers across HTTP handlers, engines, and protocol delta generators. Updates response_generator APIs to accept a context/request ID, propagates IDs into streaming flows, and switches delta IDs from UUIDs to deterministic chatcmpl-/cmpl-. Adjusts embeddings handler signature to include headers and centralizes request ID derivation. Updates tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant HTTP as HTTP Service (openai.rs)
participant Ctx as Context
participant Engine as LLM Engine
participant Proto as Protocol Delta Gen
participant M as MistralRS
Client->>HTTP: POST /v1/chat/completions or /v1/completions
HTTP->>HTTP: Derive request_id (request.id or headers via get_or_create_request_id)
HTTP->>Ctx: Context::with_id(request_id)
HTTP->>Engine: invoke(stream, ctx)
Engine->>Proto: response_generator(ctx.id())
Proto->>Proto: Build DeltaGenerator with id chatcmpl-{id} / cmpl-{id}
alt Mistral-backed
Engine->>M: start stream with mistralrs_request_id
M-->>Engine: token deltas
else Other backend
Proto-->>Engine: token deltas
end
Engine-->>HTTP: streamed deltas (id includes {request_id})
HTTP-->>Client: SSE/stream response
sequenceDiagram
autonumber
actor Client
participant HTTP as HTTP Service (Embeddings)
participant Ctx as Context
participant Engine as LLM Engine
Client->>HTTP: POST /v1/embeddings (with headers)
HTTP->>HTTP: request_id = get_or_create_request_id(primary, headers)
HTTP->>Ctx: Context::with_id(request_id)
HTTP->>Engine: run embeddings with ctx
Engine-->>HTTP: result
HTTP-->>Client: JSON response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
218-247: Role field emitted on every chunk due to never-incremented msg_counter
roleis intended to appear only on the first delta; sincemsg_counternever changes, every chunk will includerole: Assistant. Increment the counter after emitting the first delta.Apply this diff:
pub fn create_choice( &mut self, index: u32, text: Option<String>, reasoning_content: Option<String>, finish_reason: Option<dynamo_async_openai::types::FinishReason>, logprobs: Option<dynamo_async_openai::types::ChatChoiceLogprobs>, ) -> NvCreateChatCompletionStreamResponse { let delta = dynamo_async_openai::types::ChatCompletionStreamResponseDelta { content: text, function_call: None, tool_calls: None, role: if self.msg_counter == 0 { Some(dynamo_async_openai::types::Role::Assistant) } else { None }, refusal: None, reasoning_content, }; + // Ensure subsequent chunks omit the role, matching OpenAI streaming semantics + self.msg_counter = self.msg_counter.saturating_add(1);lib/engines/mistralrs/src/lib.rs (2)
543-559: Respect client logprobs request for completions
return_logprobsis hard-coded tofalse. This ignoresrequest.inner.logprobs, so clients asking for logprobs won’t get them.Apply this diff:
- return_logprobs: false, + // return logprobs if the client requested a positive count + return_logprobs: request.inner.logprobs.map(|v| v > 0).unwrap_or(false),
579-589: Pass finish_reason into the completion choiceYou compute
finish_reasonbut don't pass it tocreate_choice, resulting inNonein the response.Apply this diff:
- let inner = response_generator.create_choice(0, Some(from_assistant), None, None); + let inner = response_generator.create_choice(0, Some(from_assistant), finish_reason, None);Also applies to: 593-593
🧹 Nitpick comments (7)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
103-106: Deterministic ID: chatcmpl-{request_id}The ID format matches OpenAI conventions. Consider guarding against whitespace or overly long IDs if upstream contexts are user-controlled, but not blocking.
Optional defensive tweak (upstream input sanitize):
- let chatcmpl_id = format!("chatcmpl-{request_id}"); + let sanitized = request_id.trim().replace(char::is_whitespace, ""); + let chatcmpl_id = format!("chatcmpl-{}", sanitized);lib/llm/src/protocols/openai/completions/delta.rs (1)
75-97: Minor perf/readability nit: avoid extra allocations in create_logprobsIf this becomes hot, consider accepting
token_ids: &[TokenIdType](and borrowing tokens when possible) to reduce copies. Not blocking.Example shape:
- pub fn create_logprobs( - &self, - tokens: Vec<common::llm_backend::TokenType>, - token_ids: Vec<TokenIdType>, + pub fn create_logprobs( + &self, + tokens: Vec<common::llm_backend::TokenType>, + token_ids: &[TokenIdType], logprobs: Option<common::llm_backend::LogProbs>, top_logprobs: Option<common::llm_backend::TopLogprobs>, ) -> Option<dynamo_async_openai::types::Logprobs> {And at call site:
- let logprobs = self.create_logprobs( - delta.tokens, - delta.token_ids, + let logprobs = self.create_logprobs( + delta.tokens, + &delta.token_ids, delta.log_probs, delta.top_logprobs, );Also applies to: 192-205
lib/engines/mistralrs/src/lib.rs (3)
399-421: Set object to the OpenAI constant "chat.completion.chunk"We currently pass through
c.object.clone(). To ensure spec consistency across backends, prefer the canonical OpenAI value.Apply this diff:
- object: c.object.clone(), + object: "chat.completion.chunk".to_string(),
391-393: Typo: "Unknow stop reason" → "Unknown stop reason"Minor log message polish.
Apply this diff:
- tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknow stop reason"); + tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknown stop reason");
571-572: Typo: "Unknow stop reason" → "Unknown stop reason"Same log polish for the completions path.
Apply this diff:
- tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknow stop reason"); + tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknown stop reason");lib/llm/src/engines.rs (1)
187-187: Nit: dropmutunlesscreate_choicerequires&mut self.If
DeltaGenerator::create_choicefor chat completions no longer needs&mut self, removemutto avoid anunused_mutlint in stricter builds.- let mut deltas = request.response_generator(ctx.id().to_string()); + let deltas = request.response_generator(ctx.id().to_string());If it still needs
&mut self, keep as-is.lib/llm/src/http/service/openai.rs (1)
248-266: Optional: add structured tracing to completions() with request_id field.For parity with
responses()(which already uses#[tracing::instrument(..., fields(request_id = %request.id()))]), consider instrumentingcompletions()similarly. This centralizesrequest_idon all logs from this function.-async fn completions( +#[tracing::instrument(level = "debug", skip_all, fields(request_id = %request.id()))] +async fn completions( state: Arc<service_v2::State>, request: Context<NvCreateCompletionRequest>, stream_handle: ConnectionHandle, ) -> Result<Response, ErrorResponse> {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
lib/engines/mistralrs/src/lib.rs(13 hunks)lib/llm/src/engines.rs(2 hunks)lib/llm/src/http/service/openai.rs(2 hunks)lib/llm/src/preprocessor.rs(2 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs(3 hunks)lib/llm/src/protocols/openai/completions/delta.rs(3 hunks)lib/llm/tests/http-service.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-22T19:55:41.592Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/tests/http-service.rslib/llm/src/protocols/openai/chat_completions/delta.rslib/llm/src/preprocessor.rslib/engines/mistralrs/src/lib.rslib/llm/src/engines.rslib/llm/src/protocols/openai/completions/delta.rs
📚 Learning: 2025-08-22T19:55:41.592Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Applied to files:
lib/llm/src/engines.rs
🧬 Code graph analysis (5)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
lib/llm/src/protocols/openai/completions/delta.rs (2)
response_generator(10-17)new(38-69)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
response_generator(20-28)lib/llm/src/protocols/openai/completions/delta.rs (1)
response_generator(10-17)
lib/llm/src/http/service/openai.rs (3)
lib/llm/src/http/service/service_v2.rs (1)
state(186-188)lib/llm/src/http/client.rs (1)
with_id(81-88)lib/runtime/src/pipeline/context.rs (1)
with_id(69-76)
lib/engines/mistralrs/src/lib.rs (3)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
new(76-116)response_generator(20-28)lib/llm/src/protocols/openai/completions/delta.rs (2)
new(38-69)response_generator(10-17)lib/bindings/python/rust/lib.rs (3)
new(268-293)new(907-911)id(929-931)
lib/llm/src/protocols/openai/completions/delta.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
response_generator(20-28)new(76-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (13)
lib/llm/src/preprocessor.rs (2)
497-498: Good: propagating deterministic request_id into the generatorPassing
context.id().to_string()intorequest.response_generator(...)aligns with the PR objective of making streamed IDs deterministic and trace-aligned. No issues spotted.
551-552: Good: completions path also threads request_idSame as chat completions, this ensures
cmpl-<request_id>IDs are stable across the pipeline.lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
15-21: API change is clear and documentedNice docstring update explaining the new
request_idparameter; signature change is minimal and aligns with the PR goal.
27-28: Constructor and call chain now accept request_id
DeltaGenerator::new(..., request_id)and its usage ensurechatcmpl-<request_id>determinism. Looks good.Also applies to: 72-77
lib/llm/src/protocols/openai/completions/delta.rs (2)
10-17: API change: request_id threaded into completions as well
response_generator(&self, request_id: String)is consistent with the chat path and meets the objectives.
38-39: Deterministic ID: cmpl-{request_id}
cmpl-<request_id>is correct and consistent. Constructor changes are coherent.Also applies to: 58-62
lib/engines/mistralrs/src/lib.rs (3)
216-218: Warmup request: logging context updatedUsing
mistralrs_request_idin logs makes tracing clearer. LGTM.Also applies to: 249-253
275-276: Good: request_id sourced from contextThreading
ctx.id().to_string()will align streamed ids with distributed traces.
489-490: Good: completions path uses response_generator(ctx.id())This ensures the
cmpl-<request_id>ID is trace-aligned.lib/llm/src/engines.rs (2)
235-235: Completions: request_id propagation into delta generator looks correct.Using the context ID for
cmpl-<id>is consistent with the chat path and supports downstream trace alignment.
187-187: No remaining zero-argresponse_generatorcalls — LGTM
- Ran
rg -nP --type=rust '\bresponse_generator\s*\(\s*\)'and confirmed there are no zero-argument invocations ofresponse_generator.- Verified that both delta modules now use the incoming
request_idfor ID formatting:
- lib/llm/src/protocols/openai/completions/delta.rs formats as
cmpl-{request_id}- lib/llm/src/protocols/openai/chat_completions/delta.rs formats as
chatcmpl-{request_id}All looks good—approving the change.
lib/llm/src/http/service/openai.rs (2)
256-256: Expose request_id early in completions() for logging/annotations — LGTM.Grabbing
let request_id = request.id().to_string();up-front is clean and is correctly used later for error logs and optional annotations.
356-365: Embeddings handler integration verified
- The embeddings route in
lib/llm/src/http/service/openai.rs:1061correctly points to theembeddingshandler via.route(&path, post(embeddings)).- All three OpenAI handlers consistently accept a
HeaderMapparameter, ensuring uniform request_id derivation and propagation:
handler_completionsat line 213embeddingsat line 354handler_chat_completionsat line 408LGTM—approving these changes.
|
The main PR landed. |
Rebased #2496 by @qimcis after the Rust upgrade.
-- Origin description
Overview:
Aligns OpenAI response IDs with distributed trace IDs
Details:
Replaces random UUID generation with consistent trace IDs from request context so that OpenAI API responses (chatcmpl-, cmpl-) match distributed tracing identifiers.
Where should the reviewer start?
lib/llm/src/protocols/openai/chat_completions/delta.rs:New request_id parameter in response_generator()lib/llm/src/http/service/openai.rs:Removed UUID generation, using request.id()lib/llm/src/engines.rs:Updated response generator calls with context IDsRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
chatcmpl-<ID>with other notions of request ID throughout #2248Summary by CodeRabbit
New Features
Refactor