feat: align OpenAI response IDs with distributed trace IDs #2695

grahamking · 2025-08-25T19:01:02Z

Rebased #2496 by @qimcis after the Rust upgrade.

-- Origin description

Overview:

Aligns OpenAI response IDs with distributed trace IDs

Details:

Replaces random UUID generation with consistent trace IDs from request context so that OpenAI API responses (chatcmpl-, cmpl-) match distributed tracing identifiers.

Where should the reviewer start?

lib/llm/src/protocols/openai/chat_completions/delta.rs: New request_id parameter in response_generator()
lib/llm/src/http/service/openai.rs: Removed UUID generation, using request.id()
lib/llm/src/engines.rs: Updated response generator calls with context IDs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Addresses [FEATURE]: Revisit aligning chatcmpl-<ID> with other notions of request ID throughout #2248

Summary by CodeRabbit

New Features
- Deterministic, per-request IDs for streaming responses (chatcmpl- and cmpl-) for easier tracing across streams, logs, and metrics.
- Embeddings now supports propagating request IDs via HTTP headers, aligning traceability with other endpoints.
Refactor
- Unified request ID handling across chat, completions, embeddings, and warmup flows to ensure consistent logging and monitoring.
- Stream generation now uses the request’s context ID end-to-end, improving observability without changing response formats or error codes.

Signed-off-by: Graham King <grahamk@nvidia.com>

coderabbitai · 2025-08-25T19:09:24Z

Walkthrough

Introduces per-request identifiers across HTTP handlers, engines, and protocol delta generators. Updates response_generator APIs to accept a context/request ID, propagates IDs into streaming flows, and switches delta IDs from UUIDs to deterministic chatcmpl-/cmpl-. Adjusts embeddings handler signature to include headers and centralizes request ID derivation. Updates tests accordingly.

Changes

Cohort / File(s)	Summary
MistralRS request ID plumbing `lib/engines/mistralrs/src/lib.rs`	Adds per-request mistralrs_request_id, routes logs to it, uses ctx.id().to_string() for streaming request_id, and passes ctx ID into non-chat response generation.
Engine callers updated to pass context ID `lib/llm/src/engines.rs`	Both chat and completion streaming paths now call response_generator(ctx.id().to_string()).
HTTP service ID derivation changes `lib/llm/src/http/service/openai.rs`	Completions uses request.id() as request_id. Embeddings handler signature adds HeaderMap, derives ID via get_or_create_request_id, attaches with Context::with_id, and reuses the derived ID.
Preprocessor forwards context ID `lib/llm/src/preprocessor.rs`	response_generator calls updated to include context.id().to_string() for chat and completion requests.
OpenAI chat delta generator API `lib/llm/src/protocols/openai/chat_completions/delta.rs`	NvCreateChatCompletionRequest::response_generator(request_id: String) and DeltaGenerator::new(..., request_id: String). Delta IDs now chatcmpl-{request_id}.
OpenAI completion delta generator API and options `lib/llm/src/protocols/openai/completions/delta.rs`	Adds DeltaGeneratorOptions. response_generator(request_id: String). DeltaGenerator::new(..., request_id: String). Delta IDs now cmpl-{request_id} instead of UUID.
Tests updated for generator argument `lib/llm/tests/http-service.rs`	Test invokes response_generator with Some(ctx.id().to_string()) instead of no args; removes license header.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant HTTP as HTTP Service (openai.rs)
  participant Ctx as Context
  participant Engine as LLM Engine
  participant Proto as Protocol Delta Gen
  participant M as MistralRS

  Client->>HTTP: POST /v1/chat/completions or /v1/completions
  HTTP->>HTTP: Derive request_id (request.id or headers via get_or_create_request_id)
  HTTP->>Ctx: Context::with_id(request_id)
  HTTP->>Engine: invoke(stream, ctx)

  Engine->>Proto: response_generator(ctx.id())
  Proto->>Proto: Build DeltaGenerator with id chatcmpl-{id} / cmpl-{id}

  alt Mistral-backed
    Engine->>M: start stream with mistralrs_request_id
    M-->>Engine: token deltas
  else Other backend
    Proto-->>Engine: token deltas
  end

  Engine-->>HTTP: streamed deltas (id includes {request_id})
  HTTP-->>Client: SSE/stream response

sequenceDiagram
  autonumber
  actor Client
  participant HTTP as HTTP Service (Embeddings)
  participant Ctx as Context
  participant Engine as LLM Engine

  Client->>HTTP: POST /v1/embeddings (with headers)
  HTTP->>HTTP: request_id = get_or_create_request_id(primary, headers)
  HTTP->>Ctx: Context::with_id(request_id)
  HTTP->>Engine: run embeddings with ctx
  Engine-->>HTTP: result
  HTTP-->>Client: JSON response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

[FEATURE]: Revisit aligning chatcmpl-<ID> with other notions of request ID throughout #2248 — Implements deterministic chatcmpl-/cmpl- by propagating request_id through response_generator and delta generators.

Possibly related PRs

feat: http disconnects #2014 — Also modifies request ID propagation and HTTP handler APIs, overlapping with Context::id usage and header-derived IDs.
feat: add implementation for embeddings #1290 — Touches the embeddings handler in the same file; related to signature and handler flow changes.
refactor: refactored using CompletionResponse #1658 — Alters completions delta flow in the same modules, overlapping with DeltaGenerator and response generation paths.

Poem

A whisk of logs, an ID in tow,
chatcmpl dances where UUIDs go.
cmpl echoes, tidy and bright,
headers whisper the name of the night.
I thump my foot—request in stream—
carrots aligned, IDs supreme. 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
218-247: Role field emitted on every chunk due to never-incremented msg_counter

role is intended to appear only on the first delta; since msg_counter never changes, every chunk will include role: Assistant. Increment the counter after emitting the first delta.

Apply this diff:
     pub fn create_choice(
         &mut self,
         index: u32,
         text: Option<String>,
         reasoning_content: Option<String>,
         finish_reason: Option<dynamo_async_openai::types::FinishReason>,
         logprobs: Option<dynamo_async_openai::types::ChatChoiceLogprobs>,
     ) -> NvCreateChatCompletionStreamResponse {
         let delta = dynamo_async_openai::types::ChatCompletionStreamResponseDelta {
             content: text,
             function_call: None,
             tool_calls: None,
             role: if self.msg_counter == 0 {
                 Some(dynamo_async_openai::types::Role::Assistant)
             } else {
                 None
             },
             refusal: None,
             reasoning_content,
         };
+        // Ensure subsequent chunks omit the role, matching OpenAI streaming semantics
+        self.msg_counter = self.msg_counter.saturating_add(1);
lib/engines/mistralrs/src/lib.rs (2)
543-559: Respect client logprobs request for completions

return_logprobs is hard-coded to false. This ignores request.inner.logprobs, so clients asking for logprobs won’t get them.

Apply this diff:
-            return_logprobs: false,
+            // return logprobs if the client requested a positive count
+            return_logprobs: request.inner.logprobs.map(|v| v > 0).unwrap_or(false),
579-589: Pass finish_reason into the completion choice

You compute finish_reason but don't pass it to create_choice, resulting in None in the response.

Apply this diff:
-                        let inner = response_generator.create_choice(0, Some(from_assistant), None, None);
+                        let inner = response_generator.create_choice(0, Some(from_assistant), finish_reason, None);
Also applies to: 593-593

🧹 Nitpick comments (7)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
103-106: Deterministic ID: chatcmpl-{request_id}

The ID format matches OpenAI conventions. Consider guarding against whitespace or overly long IDs if upstream contexts are user-controlled, but not blocking.

Optional defensive tweak (upstream input sanitize):
-        let chatcmpl_id = format!("chatcmpl-{request_id}");
+        let sanitized = request_id.trim().replace(char::is_whitespace, "");
+        let chatcmpl_id = format!("chatcmpl-{}", sanitized);
lib/llm/src/protocols/openai/completions/delta.rs (1)
75-97: Minor perf/readability nit: avoid extra allocations in create_logprobs

If this becomes hot, consider accepting token_ids: &[TokenIdType] (and borrowing tokens when possible) to reduce copies. Not blocking.

Example shape:
-    pub fn create_logprobs(
-        &self,
-        tokens: Vec<common::llm_backend::TokenType>,
-        token_ids: Vec<TokenIdType>,
+    pub fn create_logprobs(
+        &self,
+        tokens: Vec<common::llm_backend::TokenType>,
+        token_ids: &[TokenIdType],
         logprobs: Option<common::llm_backend::LogProbs>,
         top_logprobs: Option<common::llm_backend::TopLogprobs>,
     ) -> Option<dynamo_async_openai::types::Logprobs> {
And at call site:
-        let logprobs = self.create_logprobs(
-            delta.tokens,
-            delta.token_ids,
+        let logprobs = self.create_logprobs(
+            delta.tokens,
+            &delta.token_ids,
             delta.log_probs,
             delta.top_logprobs,
         );
Also applies to: 192-205
lib/engines/mistralrs/src/lib.rs (3)
399-421: Set object to the OpenAI constant "chat.completion.chunk"

We currently pass through c.object.clone(). To ensure spec consistency across backends, prefer the canonical OpenAI value.

Apply this diff:
-                            object: c.object.clone(),
+                            object: "chat.completion.chunk".to_string(),
391-393: Typo: "Unknow stop reason" → "Unknown stop reason"

Minor log message polish.

Apply this diff:
-                                tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknow stop reason");
+                                tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknown stop reason");
571-572: Typo: "Unknow stop reason" → "Unknown stop reason"

Same log polish for the completions path.

Apply this diff:
-                                tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknow stop reason");
+                                tracing::warn!(mistralrs_request_id, stop_reason = s, "Unknown stop reason");
lib/llm/src/engines.rs (1)
187-187: Nit: drop mut unless create_choice requires &mut self.

If DeltaGenerator::create_choice for chat completions no longer needs &mut self, remove mut to avoid an unused_mut lint in stricter builds.
-        let mut deltas = request.response_generator(ctx.id().to_string());
+        let deltas = request.response_generator(ctx.id().to_string());
If it still needs &mut self, keep as-is.
lib/llm/src/http/service/openai.rs (1)
248-266: Optional: add structured tracing to completions() with request_id field.

For parity with responses() (which already uses #[tracing::instrument(..., fields(request_id = %request.id()))]), consider instrumenting completions() similarly. This centralizes request_id on all logs from this function.
-async fn completions(
+#[tracing::instrument(level = "debug", skip_all, fields(request_id = %request.id()))]
+async fn completions(
     state: Arc<service_v2::State>,
     request: Context<NvCreateCompletionRequest>,
     stream_handle: ConnectionHandle,
 ) -> Result<Response, ErrorResponse> {

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a24221d and dd84aca.

📒 Files selected for processing (7)

lib/engines/mistralrs/src/lib.rs (13 hunks)
lib/llm/src/engines.rs (2 hunks)
lib/llm/src/http/service/openai.rs (2 hunks)
lib/llm/src/preprocessor.rs (2 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (3 hunks)
lib/llm/src/protocols/openai/completions/delta.rs (3 hunks)
lib/llm/tests/http-service.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-22T19:55:41.592Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/tests/http-service.rs
lib/llm/src/protocols/openai/chat_completions/delta.rs
lib/llm/src/preprocessor.rs
lib/engines/mistralrs/src/lib.rs
lib/llm/src/engines.rs
lib/llm/src/protocols/openai/completions/delta.rs

📚 Learning: 2025-08-22T19:55:41.592Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.592Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

lib/llm/src/engines.rs

🧬 Code graph analysis (5)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

lib/llm/src/protocols/openai/completions/delta.rs (2)

response_generator (10-17)

new (38-69)

lib/llm/src/preprocessor.rs (2)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

response_generator (20-28)

lib/llm/src/protocols/openai/completions/delta.rs (1)

response_generator (10-17)

lib/llm/src/http/service/openai.rs (3)

lib/llm/src/http/service/service_v2.rs (1)

state (186-188)

lib/llm/src/http/client.rs (1)

with_id (81-88)

lib/runtime/src/pipeline/context.rs (1)

with_id (69-76)

lib/engines/mistralrs/src/lib.rs (3)

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

new (76-116)

response_generator (20-28)

lib/llm/src/protocols/openai/completions/delta.rs (2)

new (38-69)

response_generator (10-17)

lib/bindings/python/rust/lib.rs (3)

new (268-293)

new (907-911)

id (929-931)

lib/llm/src/protocols/openai/completions/delta.rs (1)

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

response_generator (20-28)

new (76-116)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (13)

lib/llm/src/preprocessor.rs (2)

497-498: Good: propagating deterministic request_id into the generator

Passing context.id().to_string() into request.response_generator(...) aligns with the PR objective of making streamed IDs deterministic and trace-aligned. No issues spotted.

551-552: Good: completions path also threads request_id

Same as chat completions, this ensures cmpl-<request_id> IDs are stable across the pipeline.

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

15-21: API change is clear and documented

Nice docstring update explaining the new request_id parameter; signature change is minimal and aligns with the PR goal.

27-28: Constructor and call chain now accept request_id

DeltaGenerator::new(..., request_id) and its usage ensure chatcmpl-<request_id> determinism. Looks good.

Also applies to: 72-77

lib/llm/src/protocols/openai/completions/delta.rs (2)

10-17: API change: request_id threaded into completions as well

response_generator(&self, request_id: String) is consistent with the chat path and meets the objectives.

38-39: Deterministic ID: cmpl-{request_id}

cmpl-<request_id> is correct and consistent. Constructor changes are coherent.

Also applies to: 58-62

lib/engines/mistralrs/src/lib.rs (3)

216-218: Warmup request: logging context updated

Using mistralrs_request_id in logs makes tracing clearer. LGTM.

Also applies to: 249-253

275-276: Good: request_id sourced from context

Threading ctx.id().to_string() will align streamed ids with distributed traces.

489-490: Good: completions path uses response_generator(ctx.id())

This ensures the cmpl-<request_id> ID is trace-aligned.

lib/llm/src/engines.rs (2)

235-235: Completions: request_id propagation into delta generator looks correct.

Using the context ID for cmpl-<id> is consistent with the chat path and supports downstream trace alignment.

187-187: No remaining zero-arg response_generator calls — LGTM

Ran rg -nP --type=rust '\bresponse_generator\s*\(\s*\)' and confirmed there are no zero-argument invocations of response_generator.

Verified that both delta modules now use the incoming request_id for ID formatting:

lib/llm/src/protocols/openai/completions/delta.rs formats as cmpl-{request_id}

lib/llm/src/protocols/openai/chat_completions/delta.rs formats as chatcmpl-{request_id}

All looks good—approving the change.

lib/llm/src/http/service/openai.rs (2)

256-256: Expose request_id early in completions() for logging/annotations — LGTM.

Grabbing let request_id = request.id().to_string(); up-front is clean and is correctly used later for error logs and optional annotations.

356-365: Embeddings handler integration verified

The embeddings route in lib/llm/src/http/service/openai.rs:1061 correctly points to the embeddings handler via .route(&path, post(embeddings)).

All three OpenAI handlers consistently accept a HeaderMap parameter, ensuring uniform request_id derivation and propagation:

handler_completions at line 213

embeddings at line 354

handler_chat_completions at line 408

LGTM—approving these changes.

lib/llm/src/preprocessor.rs

lib/llm/tests/http-service.rs

grahamking · 2025-08-26T19:34:44Z

The main PR landed.

qimcis and others added 4 commits August 25, 2025 14:39

align OpenAI response IDs with distributed trace IDs

363ed8f

fix: align mistralrs response ids with distributed trace ids

75dc684

fix: use monotonic IDs for MistralRS to prevent collisions

df9df39

chore: Minor tidy for merge

dd84aca

Signed-off-by: Graham King <grahamk@nvidia.com>

grahamking requested a review from a team as a code owner August 25, 2025 19:01

pull-request-size bot added the size/L label Aug 25, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 25, 2025 19:01 Inactive

github-actions bot added the feat label Aug 25, 2025

grahamking mentioned this pull request Aug 25, 2025

feat: align OpenAI response IDs with distributed trace IDs #2496

Merged

copy-pr-bot bot temporarily deployed to GITLAB August 25, 2025 19:06 Inactive

Missed clippy

c8cf233

Signed-off-by: Graham King <grahamk@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB August 25, 2025 19:07 Inactive

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

lib/llm/src/preprocessor.rs Show resolved Hide resolved

lib/llm/tests/http-service.rs Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB August 25, 2025 19:17 Inactive

grahamking marked this pull request as draft August 25, 2025 20:13

grahamking closed this Aug 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: align OpenAI response IDs with distributed trace IDs #2695

feat: align OpenAI response IDs with distributed trace IDs #2695

Uh oh!

grahamking commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 25, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

grahamking commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: align OpenAI response IDs with distributed trace IDs #2695

feat: align OpenAI response IDs with distributed trace IDs #2695

Uh oh!

Conversation

grahamking commented Aug 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 25, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

grahamking commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

grahamking commented Aug 25, 2025 •

edited by coderabbitai bot

Loading