-
Notifications
You must be signed in to change notification settings - Fork 693
refactor: create a skeleton for a separate postprocessor node which ca… #3351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7ccf21b to
5fcc351
Compare
…n take over the responsibility of post processing backend output Signed-off-by: nachiketb <nachiketb@nvidia.com>
5fcc351 to
75a0103
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a modular postprocessor architecture to handle the transformation of backend outputs into client-facing responses. The change separates post-processing logic from preprocessing, creating a dedicated OpenAIPostprocessor that can be extended with parsing and jailing functionality.
- Added a new
ParsedComponentsstruct to support structured parsing of reasoning content and tool calls - Created a skeleton
OpenAIPostprocessorwith transformation methods for different protocol types - Implemented operator interfaces for chat completions, completions, and embeddings workflows
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| lib/llm/src/protocols/common/postprocessor.rs | Extended PostprocessedResponse with parsed_components field and new ParsedComponents struct |
| lib/llm/src/postprocessor.rs | Added new OpenAIPostprocessor module with transformation logic and operator implementations |
| lib/llm/src/lib.rs | Added postprocessor module to the public API |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse | ||
| todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion") |
Copilot
AI
Oct 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.
| // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse | |
| todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion") | |
| // Basic placeholder conversion from PostprocessedResponse to NvCreateChatCompletionStreamResponse | |
| NvCreateChatCompletionStreamResponse { | |
| id: String::from("placeholder_id"), | |
| object: String::from("chat.completion.chunk"), | |
| created: 0, | |
| model: mdcsum.clone().unwrap_or_else(|| "unknown_model".to_string()), | |
| choices: vec![crate::protocols::openai::chat_completions::NvChatCompletionChunkChoice { | |
| index: backend_output.index.unwrap_or(0) as usize, | |
| delta: crate::protocols::openai::chat_completions::NvChatCompletionChunkDelta { | |
| role: None, | |
| content: backend_output.text.clone(), | |
| function_call: None, | |
| tool_calls: None, | |
| }, | |
| finish_reason: backend_output.finish_reason.clone(), | |
| }], | |
| system_fingerprint: None, | |
| } |
| // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse | ||
| todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion") |
Copilot
AI
Oct 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.
| // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse | |
| todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion") | |
| // Basic placeholder conversion from PostprocessedResponse to NvCreateCompletionResponse | |
| NvCreateCompletionResponse { | |
| id: None, | |
| object: None, | |
| created: None, | |
| model: None, | |
| choices: vec![openai::completions::Choice { | |
| text: backend_output.text.clone().unwrap_or_default(), | |
| index: backend_output.index.unwrap_or(0) as usize, | |
| logprobs: None, | |
| finish_reason: backend_output.finish_reason.clone().unwrap_or_else(|| "stop".to_string()), | |
| }], | |
| usage: None, | |
| } |
| annotated.map_data(|_engine_output| { | ||
| // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse | ||
| // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream | ||
| todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion") |
Copilot
AI
Oct 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO represents incomplete functionality that would cause runtime panics if executed. Consider implementing a basic placeholder conversion or adding proper error handling instead of using todo!() macros in production code.
| todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion") | |
| Err(Error::msg("EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion not implemented")) |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a new public Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Orchestrator
participant Backend as BackendEngine
participant Postproc as OpenAIPostprocessor
note right of Postproc: New component handling post-processing
Client->>Orchestrator: generate request
Orchestrator->>Backend: run inference (stream)
Backend-->>Orchestrator: Annotated<BackendOutput> stream
Orchestrator->>Postproc: transform_backend_stream(stream)
Postproc-->>Orchestrator: Annotated<PostprocessedResponse> stream
note right of Postproc: convert/map -> OpenAI-like response (TODO)
Orchestrator-->>Client: Annotated<OpenAI-like response> stream
opt Embeddings path
Orchestrator->>Postproc: generate(Embeddings stream)
Postproc-->>Orchestrator: Annotated<NvCreateEmbeddingResponse> stream
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
lib/llm/src/postprocessor.rs (2)
100-149: Runtime panic risk in Chat Completions operator.The
todo!()macro at lines 142-143 will cause a panic if this code path is executed. Even for skeleton code, consider returning an error instead of panicking.The past review suggested a placeholder implementation. As an alternative, consider returning a proper error:
- // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse - todo!("Implement PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion") + // TODO: Convert PostprocessedResponse to NvCreateChatCompletionStreamResponse + Err(Error::msg("PostprocessedResponse to NvCreateChatCompletionStreamResponse conversion not yet implemented"))This allows callers to handle the unimplemented functionality gracefully rather than panicking.
151-201: Runtime panic risk in Completions operator.The
todo!()macro at lines 194-195 will cause a panic if executed. Similar to the Chat Completions operator, consider returning an error instead.Apply this diff:
- // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse - todo!("Implement PostprocessedResponse to NvCreateCompletionResponse conversion") + // TODO: Convert PostprocessedResponse to NvCreateCompletionResponse + Err(Error::msg("PostprocessedResponse to NvCreateCompletionResponse conversion not yet implemented"))
🧹 Nitpick comments (3)
lib/llm/src/postprocessor.rs (3)
57-72: Acknowledge scaffolding TODO for parsed_components.The transform logic correctly maps fields from
BackendOutputtoPostprocessedResponse. The TODO for implementing reasoning/tool parsing is expected in this skeleton PR.Would you like me to open an issue to track the implementation of reasoning and tool-call parsing for
parsed_components?
74-98: Consider extracting common transformation logic.The field mapping logic (lines 85-94) duplicates the logic in
transform_backend_output(lines 62-71). While the duplication is minor, extracting a shared helper method would improve maintainability.Consider this refactor:
+ fn map_backend_to_postprocessed( + &self, + backend_output: BackendOutput, + ) -> PostprocessedResponse { + PostprocessedResponse { + mdcsum: self.mdcsum.clone(), + index: backend_output.index.map(|i| i as usize), + finish_reason: backend_output.finish_reason, + token_ids: backend_output.token_ids, + tokens: Some(backend_output.tokens), + text: backend_output.text, + cum_log_probs: backend_output.cum_log_probs, + parsed_components: None, // TODO: Implement reasoning/tool parsing + } + } + pub fn transform_backend_output( &self, backend_output: BackendOutput, ) -> Result<PostprocessedResponse> { - Ok(PostprocessedResponse { - mdcsum: self.mdcsum.clone(), - index: backend_output.index.map(|i| i as usize), - finish_reason: backend_output.finish_reason, - token_ids: backend_output.token_ids, - tokens: Some(backend_output.tokens), - text: backend_output.text, - cum_log_probs: backend_output.cum_log_probs, - parsed_components: None, // TODO: Implement reasoning/tool parsing - }) + Ok(self.map_backend_to_postprocessed(backend_output)) }And similarly update
transform_backend_streamto use the helper.
100-239: Consider reducing duplication across Operator implementations.The three
Operatorimplementations (Chat Completions, Completions, Embeddings) follow nearly identical patterns with the primary difference being the output types. Once the conversion logic is implemented, consider extracting shared scaffolding into helper methods.For example, you could introduce a generic helper:
fn transform_stream<I, O, F>( &self, request: ManyOut<Annotated<I>>, converter: F, ) -> ManyOut<Annotated<O>> where I: Send + 'static, O: Send + 'static, F: Fn(I) -> Result<O, Error> + Send + 'static, { let context = request.context(); let stream = request.map(move |annotated| { annotated.map_data(|input| converter(input)) }); ResponseStream::new(Box::pin(stream), context) }This would reduce the structural duplication while maintaining type safety for each endpoint.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/llm/src/lib.rs(1 hunks)lib/llm/src/postprocessor.rs(1 hunks)lib/llm/src/protocols/common/postprocessor.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
lib/llm/src/postprocessor.rs (1)
lib/llm/src/model_card.rs (1)
mdcsum(201-204)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (.)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (4)
lib/llm/src/protocols/common/postprocessor.rs (2)
4-4: LGTM! Clean extension of PostprocessedResponse.The addition of
parsed_componentsas an optional field maintains backward compatibility while enabling structured parsing of tool calls and reasoning content.Also applies to: 32-34
36-41: LGTM! Well-structured component container.The
ParsedComponentsstruct provides a clean separation of concerns for text, reasoning, and tool calls with appropriate optional fields.lib/llm/src/lib.rs (1)
34-34: LGTM! Proper module exposure.The new
postprocessormodule is correctly added to the public API.lib/llm/src/postprocessor.rs (1)
32-55: LGTM! Solid constructor with proper validation.The constructor correctly validates that
ModelDeploymentCardcontainsmodel_infobefore proceeding, with clear error messaging. The use ofArcfor shared ownership is appropriate.
| /// Operator implementation for Embeddings | ||
| /// This transforms EmbeddingsEngineOutput -> NvCreateEmbeddingResponse | ||
| #[async_trait] | ||
| impl | ||
| Operator< | ||
| ManyOut<Annotated<EmbeddingsEngineOutput>>, // Input from backend | ||
| ManyOut<Annotated<NvCreateEmbeddingResponse>>, // Output to client | ||
| ManyOut<Annotated<EmbeddingsEngineOutput>>, // Forward edge (pass-through) | ||
| ManyOut<Annotated<NvCreateEmbeddingResponse>>, // Backward edge | ||
| > for OpenAIPostprocessor | ||
| { | ||
| async fn generate( | ||
| &self, | ||
| request: ManyOut<Annotated<EmbeddingsEngineOutput>>, | ||
| _next: Arc< | ||
| dyn AsyncEngine< | ||
| ManyOut<Annotated<EmbeddingsEngineOutput>>, | ||
| ManyOut<Annotated<NvCreateEmbeddingResponse>>, | ||
| Error, | ||
| >, | ||
| >, | ||
| ) -> Result<ManyOut<Annotated<NvCreateEmbeddingResponse>>, Error> { | ||
| // Extract context from the input stream | ||
| let context = request.context(); | ||
|
|
||
| // Transform EmbeddingsEngineOutput to NvCreateEmbeddingResponse directly | ||
| let final_stream = request.map(|annotated| { | ||
| annotated.map_data(|_engine_output| { | ||
| // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse | ||
| // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream | ||
| todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion") | ||
| }) | ||
| }); | ||
|
|
||
| Ok(ResponseStream::new(Box::pin(final_stream), context)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Runtime panic risk and potential duplication in Embeddings operator.
The todo!() at line 233 will panic at runtime. Additionally, the comment references existing embedding transformation logic in preprocessor.rs.
- Replace the
todo!()with a proper error:
- // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
- // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
- todo!("Implement EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion")
+ // TODO: Convert EmbeddingsEngineOutput to NvCreateEmbeddingResponse
+ // This should be similar to what's in preprocessor.rs transform_embedding_postprocessor_stream
+ Err(Error::msg("EmbeddingsEngineOutput to NvCreateEmbeddingResponse conversion not yet implemented"))- Verify whether this postprocessor duplicates functionality in
preprocessor.rs:
🏁 Script executed:
#!/bin/bash
# Search for existing embedding transformation in preprocessor.rs
rg -n -A 10 -B 3 "transform_embedding_postprocessor_stream|EmbeddingsEngineOutput.*NvCreateEmbeddingResponse" --type rustLength of output: 4189
Avoid runtime panic and consolidate embedding conversion
- At line 233, remove
todo!()to prevent a panic; either invoke the existingtransform_embedding_postprocessor_streamfrompreprocessor.rs(import it) to handle the data mapping, or returnErr(Error::msg("…not yet implemented"))if conversion cannot be reused here. - Extract or call the shared conversion logic to eliminate duplication and ensure consistency with the preprocessor’s implementation.
🤖 Prompt for AI Agents
In lib/llm/src/postprocessor.rs around lines 203 to 239, remove the runtime
panic caused by todo!() at line ~233 and replace it with a real conversion:
import and call the existing transform_embedding_postprocessor_stream
implementation from preprocessor.rs to map EmbeddingsEngineOutput ->
NvCreateEmbeddingResponse (or, if reuse is impossible, return an
Err(Error::msg("Embeddings conversion not implemented")) instead); ensure the
request stream is mapped using that shared function and propagate the original
context in the returned ResponseStream so behavior matches the preprocessor's
conversion and avoids duplication or a panic.
Overview:
create a skeleton for a separate postprocessor node which can take over the responsibility of post processing backend output.
Details:
This will create a modularized OpenAIPostprocessor, which will not have to deal with the complexities of preprocessing, and we can add general parsing/jailing functionality into llm/postprocessor.rs.
We would have complete control over what the
PostprocessedResponseresponse would look like allowing for more extensible code.Where should the reviewer start?
Postprocessor.rs, both
llm/postprocessor.rsandprotocols/.../postprocessor.rsSummary by CodeRabbit