Skip to content

Conversation

@nachiketb-nvidia
Copy link
Contributor

@nachiketb-nvidia nachiketb-nvidia commented Oct 1, 2025

Overview:

create a skeleton for a separate postprocessor node which can take over the responsibility of post processing backend output.

Details:

This will create a modularized OpenAIPostprocessor, which will not have to deal with the complexities of preprocessing, and we can add general parsing/jailing functionality into llm/postprocessor.rs.

We would have complete control over what the PostprocessedResponse response would look like allowing for more extensible code.

Where should the reviewer start?

Postprocessor.rs, both llm/postprocessor.rs and protocols/.../postprocessor.rs

Summary by CodeRabbit

  • New Features
    • Added a post-processing layer that converts backend outputs into client-facing responses for chat, completions, and embeddings.
    • Introduced streaming transformations for faster, incremental responses.
    • Added structured parsing of outputs (text, optional reasoning content, and tool calls).
    • Improved validation with clearer errors when model configuration is incomplete.

@nachiketb-nvidia nachiketb-nvidia requested a review from a team as a code owner October 1, 2025 19:11
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@nachiketb-nvidia nachiketb-nvidia changed the title enhance: create a skeleton for a separate postprocessor node which ca… refactor: create a skeleton for a separate postprocessor node which ca… Oct 1, 2025
@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/create-postprocessor-pipeline branch from 7ccf21b to 5fcc351 Compare October 1, 2025 19:14
…n take over the responsibility of post processing backend output

Signed-off-by: nachiketb <nachiketb@nvidia.com>
Signed-off-by: nachiketb <nachiketb@nvidia.com>
@nachiketb-nvidia nachiketb-nvidia force-pushed the nachiketb/create-postprocessor-pipeline branch from 5fcc351 to 75a0103 Compare October 1, 2025 19:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a modular postprocessor architecture to handle the transformation of backend outputs into client-facing responses. The change separates post-processing logic from preprocessing, creating a dedicated OpenAIPostprocessor that can be extended with parsing and jailing functionality.

  • Added a new ParsedComponents struct to support structured parsing of reasoning content and tool calls
  • Created a skeleton OpenAIPostprocessor with transformation methods for different protocol types
  • Implemented operator interfaces for chat completions, completions, and embeddings workflows

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
lib/llm/src/protocols/common/postprocessor.rs Extended PostprocessedResponse with parsed_components field and new ParsedComponents struct
lib/llm/src/postprocessor.rs Added new OpenAIPostprocessor module with transformation logic and operator implementations
lib/llm/src/lib.rs Added postprocessor module to the public API

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +142 to +143
// TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse
todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion")
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.

Suggested change
// TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse
todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion")
// Basic placeholder conversion from PostprocessedResponse to NvCreateChatCompletionStreamResponse
NvCreateChatCompletionStreamResponse {
id: String::from("placeholder_id"),
object: String::from("chat.completion.chunk"),
created: 0,
model: mdcsum.clone().unwrap_or_else(|| "unknown_model".to_string()),
choices: vec![crate::protocols::openai::chat_completions::NvChatCompletionChunkChoice {
index: backend_output.index.unwrap_or(0) as usize,
delta: crate::protocols::openai::chat_completions::NvChatCompletionChunkDelta {
role: None,
content: backend_output.text.clone(),
function_call: None,
tool_calls: None,
},
finish_reason: backend_output.finish_reason.clone(),
}],
system_fingerprint: None,
}

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +195
// TODO: Convert PostprocessedResponse to NvCreateCompletionResponse
todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion")
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.

Suggested change
// TODO: Convert PostprocessedResponse to NvCreateCompletionResponse
todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion")
// Basic placeholder conversion from PostprocessedResponse to NvCreateCompletionResponse
NvCreateCompletionResponse {
id: None,
object: None,
created: None,
model: None,
choices: vec![openai::completions::Choice {
text: backend_output.text.clone().unwrap_or_default(),
index: backend_output.index.unwrap_or(0) as usize,
logprobs: None,
finish_reason: backend_output.finish_reason.clone().unwrap_or_else(|| "stop".to_string()),
}],
usage: None,
}

Copilot uses AI. Check for mistakes.
annotated.map_data(|_engine_output| {
// TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
// This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion")
Copy link

Copilot AI Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.

Suggested change
todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion")
Err(Error::msg("EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion not implemented"))

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a new public postprocessor module and an OpenAIPostprocessor that transforms backend outputs and streams into a PostprocessedResponse shape and scaffolds conversions to OpenAI-like chat/completion/embedding responses. Extends PostprocessedResponse with optional ParsedComponents including tool-call types.

Changes

Cohort / File(s) Summary
Module exposure
lib/llm/src/lib.rs
Declares pub mod postprocessor; to export the new module.
OpenAI-like postprocessor implementation
lib/llm/src/postprocessor.rs
Adds OpenAIPostprocessor (constructor validates ModelDeploymentCard), transform_backend_output, transform_backend_stream, and async Operator impls for chat completions, completions, and embeddings. Stream-based mapping implemented; placeholders/TODOs remain for final conversions to client response types.
Postprocess protocol extensions
lib/llm/src/protocols/common/postprocessor.rs
Adds parsed_components: Option<ParsedComponents> to PostprocessedResponse. Introduces ParsedComponents { text, reasoning_content, tool_calls } and imports dynamo_async_openai::types::ChatCompletionMessageToolCall.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Orchestrator
  participant Backend as BackendEngine
  participant Postproc as OpenAIPostprocessor
  note right of Postproc: New component handling post-processing

  Client->>Orchestrator: generate request
  Orchestrator->>Backend: run inference (stream)
  Backend-->>Orchestrator: Annotated<BackendOutput> stream
  Orchestrator->>Postproc: transform_backend_stream(stream)
  Postproc-->>Orchestrator: Annotated<PostprocessedResponse> stream
  note right of Postproc: convert/map -> OpenAI-like response (TODO)
  Orchestrator-->>Client: Annotated<OpenAI-like response> stream

  opt Embeddings path
    Orchestrator->>Postproc: generate(Embeddings stream)
    Postproc-->>Orchestrator: Annotated<NvCreateEmbeddingResponse> stream
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I nibble on diffs by the glow of a thread,
Streams become sentences, tokens lightly spread.
I parse little carrots—tools and thought—
Stitching outputs the backend brought.
A hop, a patch, then off to the next bed. 🥕🐇

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description follows the template’s Overview, Details, and Where to start sections but omits the required Related Issues section linking to any GitHub issue, leaving the template incomplete. Please add the “Related Issues” section with an appropriate action keyword (e.g., “closes #xxx”) to fully comply with the repository’s pull request description template.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly and concisely describes the main change of introducing a skeleton for a separate postprocessor node and follows conventional “refactor:” prefix, making it easily understandable to reviewers.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
lib/llm/src/postprocessor.rs (2)

100-149: Runtime panic risk in Chat Completions operator.

The todo!() macro at lines 142-143 will cause a panic if this code path is executed. Even for skeleton code, consider returning an error instead of panicking.

The past review suggested a placeholder implementation. As an alternative, consider returning a proper error:

-                // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse
-                todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion")
+                // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse
+                Err(Error::msg("PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion not yet implemented"))

This allows callers to handle the unimplemented functionality gracefully rather than panicking.


151-201: Runtime panic risk in Completions operator.

The todo!() macro at lines 194-195 will cause a panic if executed. Similar to the Chat Completions operator, consider returning an error instead.

Apply this diff:

-                // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse
-                todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion")
+                // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse
+                Err(Error::msg("PostprocessedResponse to NvCreateCompletionResponse conversion not yet implemented"))
🧹 Nitpick comments (3)
lib/llm/src/postprocessor.rs (3)

57-72: Acknowledge scaffolding TODO for parsed_components.

The transform logic correctly maps fields from BackendOutput to PostprocessedResponse. The TODO for implementing reasoning/tool parsing is expected in this skeleton PR.

Would you like me to open an issue to track the implementation of reasoning and tool-call parsing for parsed_components?


74-98: Consider extracting common transformation logic.

The field mapping logic (lines 85-94) duplicates the logic in transform_backend_output (lines 62-71). While the duplication is minor, extracting a shared helper method would improve maintainability.

Consider this refactor:

+    fn map_backend_to_postprocessed(
+        &self,
+        backend_output: BackendOutput,
+    ) -> PostprocessedResponse {
+        PostprocessedResponse {
+            mdcsum: self.mdcsum.clone(),
+            index: backend_output.index.map(|i| i as usize),
+            finish_reason: backend_output.finish_reason,
+            token_ids: backend_output.token_ids,
+            tokens: Some(backend_output.tokens),
+            text: backend_output.text,
+            cum_log_probs: backend_output.cum_log_probs,
+            parsed_components: None, // TODO: Implement reasoning/tool parsing
+        }
+    }
+
     pub fn transform_backend_output(
         &self,
         backend_output: BackendOutput,
     ) -> Result<PostprocessedResponse> {
-        Ok(PostprocessedResponse {
-            mdcsum: self.mdcsum.clone(),
-            index: backend_output.index.map(|i| i as usize),
-            finish_reason: backend_output.finish_reason,
-            token_ids: backend_output.token_ids,
-            tokens: Some(backend_output.tokens),
-            text: backend_output.text,
-            cum_log_probs: backend_output.cum_log_probs,
-            parsed_components: None, // TODO: Implement reasoning/tool parsing
-        })
+        Ok(self.map_backend_to_postprocessed(backend_output))
     }

And similarly update transform_backend_stream to use the helper.


100-239: Consider reducing duplication across Operator implementations.

The three Operator implementations (Chat Completions, Completions, Embeddings) follow nearly identical patterns with the primary difference being the output types. Once the conversion logic is implemented, consider extracting shared scaffolding into helper methods.

For example, you could introduce a generic helper:

fn transform_stream<I, O, F>(
    &self,
    request: ManyOut<Annotated<I>>,
    converter: F,
) -> ManyOut<Annotated<O>>
where
    I: Send + 'static,
    O: Send + 'static,
    F: Fn(I) -> Result<O, Error> + Send + 'static,
{
    let context = request.context();
    let stream = request.map(move |annotated| {
        annotated.map_data(|input| converter(input))
    });
    ResponseStream::new(Box::pin(stream), context)
}

This would reduce the structural duplication while maintaining type safety for each endpoint.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f63a05 and 75a0103.

📒 Files selected for processing (3)
  • lib/llm/src/lib.rs (1 hunks)
  • lib/llm/src/postprocessor.rs (1 hunks)
  • lib/llm/src/protocols/common/postprocessor.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/llm/src/postprocessor.rs (1)
lib/llm/src/model_card.rs (1)
  • mdcsum (201-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (.)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (4)
lib/llm/src/protocols/common/postprocessor.rs (2)

4-4: LGTM! Clean extension of PostprocessedResponse.

The addition of parsed_components as an optional field maintains backward compatibility while enabling structured parsing of tool calls and reasoning content.

Also applies to: 32-34


36-41: LGTM! Well-structured component container.

The ParsedComponents struct provides a clean separation of concerns for text, reasoning, and tool calls with appropriate optional fields.

lib/llm/src/lib.rs (1)

34-34: LGTM! Proper module exposure.

The new postprocessor module is correctly added to the public API.

lib/llm/src/postprocessor.rs (1)

32-55: LGTM! Solid constructor with proper validation.

The constructor correctly validates that ModelDeploymentCard contains model_info before proceeding, with clear error messaging. The use of Arc for shared ownership is appropriate.

Comment on lines +203 to +239
/// Operator implementation for Embeddings
/// This transforms EmbeddingsEngineOutput -> NvCreateEmbeddingResponse
#[async_trait]
impl
Operator<
ManyOut<Annotated<EmbeddingsEngineOutput>>, // Input from backend
ManyOut<Annotated<NvCreateEmbeddingResponse>>, // Output to client
ManyOut<Annotated<EmbeddingsEngineOutput>>, // Forward edge (pass-through)
ManyOut<Annotated<NvCreateEmbeddingResponse>>, // Backward edge
> for OpenAIPostprocessor
{
async fn generate(
&self,
request: ManyOut<Annotated<EmbeddingsEngineOutput>>,
_next: Arc<
dyn AsyncEngine<
ManyOut<Annotated<EmbeddingsEngineOutput>>,
ManyOut<Annotated<NvCreateEmbeddingResponse>>,
Error,
>,
>,
) -> Result<ManyOut<Annotated<NvCreateEmbeddingResponse>>, Error> {
// Extract context from the input stream
let context = request.context();

// Transform EmbeddingsEngineOutput to NvCreateEmbeddingResponse directly
let final_stream = request.map(|annotated| {
annotated.map_data(|_engine_output| {
// TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
// This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion")
})
});

Ok(ResponseStream::new(Box::pin(final_stream), context))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Runtime panic risk and potential duplication in Embeddings operator.

The todo!() at line 233 will panic at runtime. Additionally, the comment references existing embedding transformation logic in preprocessor.rs.

  1. Replace the todo!() with a proper error:
-                // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
-                // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
-                todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion")
+                // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
+                // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
+                Err(Error::msg("EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion not yet implemented"))
  1. Verify whether this postprocessor duplicates functionality in preprocessor.rs:

🏁 Script executed:

#!/bin/bash
# Search for existing embedding transformation in preprocessor.rs
rg -n -A 10 -B 3 "transform_embedding_postprocessor_stream|EmbeddingsEngineOutput.*NvCreateEmbeddingResponse" --type rust

Length of output: 4189


Avoid runtime panic and consolidate embedding conversion

  • At line 233, remove todo!() to prevent a panic; either invoke the existing transform_embedding_postprocessor_stream from preprocessor.rs (import it) to handle the data mapping, or return Err(Error::msg("…not yet implemented")) if conversion cannot be reused here.
  • Extract or call the shared conversion logic to eliminate duplication and ensure consistency with the preprocessor’s implementation.
🤖 Prompt for AI Agents
In lib/llm/src/postprocessor.rs around lines 203 to 239, remove the runtime
panic caused by todo!() at line ~233 and replace it with a real conversion:
import and call the existing transform_embedding_postprocessor_stream
implementation from preprocessor.rs to map EmbeddingsEngineOutput ->
NvCreateEmbeddingResponse (or, if reuse is impossible, return an
Err(Error::msg("Embeddings conversion not implemented")) instead); ensure the
request stream is mapped using that shared function and propagate the original
context in the returned ResponseStream so behavior matches the preprocessor's
conversion and avoids duplication or a panic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants