Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Sep 22, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Unified end-position detection for tool calls across parsers, improving streaming accuracy.
    • Automatic detection of jail/start/end markers from configuration for more reliable streaming.
  • Bug Fixes

    • More robust start/end handling of tool-call sections, including single-chunk completions.
    • Consistent emission of pass-through and trailing content.
    • Reduced stray content by clearing buffers reliably after matches.
  • Refactor

    • Tool-call parsing APIs are now synchronous, simplifying integration and reducing overhead.
    • Centralized and standardized stream emission logic for clearer, more predictable behavior.

elyasmnvidian and others added 30 commits September 10, 2025 15:30
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
…tion

- Implement JailedStream with builder pattern for clean API
- Use async-stream's stream! macro for cleaner async implementation
- Support configurable jail start/end sequences
- Integrate tool call detection and parsing
- Add comprehensive tests with shared test utilities
- Create documentation for usage patterns

The JailedStream provides a standalone solution for accumulating tokens
when certain sequences are detected (jailing) and releasing them as a
single chunk when the jail ends, enabling proper tool call handling in
streaming responses.

Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove unnecessary annotation code that was never used
- Fix efficiency issue by avoiding cloning when not jailing
- Only clone response data when actually entering jail state
- Simplify jailed behavior to not yield empty chunks
- Accumulate content silently and only yield final result
- Update tests to match new behavior

This makes the stream processing more efficient by:
1. Eliminating unnecessary clones during normal streaming
2. Removing pointless annotation objects
3. Not sending empty chunks while accumulating
4. Only yielding the final aggregated result when jail ends

Signed-off-by: Ryan Olson <rolson@nvidia.com>
…rhead

- Remove ManyOut type alias and AsyncEngineContextProvider dependency
- Change apply() to return impl Stream instead of Pin<Box<ResponseStream>>
- Remove unnecessary context extraction and re-wrapping
- Use tokio::pin!() for stack pinning instead of Box::pin() (more efficient)
- Simplify tests by removing mock context infrastructure
- Update documentation with correct usage examples

This makes JailedStream a pure stream combinator that:
1. Doesn't handle context (caller's responsibility)
2. Avoids unnecessary boxing in the library
3. Can be composed with other stream transformers
4. Lets callers decide when to add boxing/context

Performance improvements:
- No heap allocation for pinning (stack pinning instead)
- No context passing overhead
- Direct stream transformation without wrapper types

Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove old apply_tool_calling_jail_internal function
- Remove PossibleToolCallAnnotation struct and implementations
- Update apply_tool_calling_jail_with_parser to conditionally use JailedStream
- Only apply jail logic when tool_call_parser is configured
- Comment out old jail tests that are no longer applicable
- Clean up unused imports and run cargo fmt/clippy
- Change jail entry logic from if/else to evaluate both sequence and tool call conditions
- Add early exit detection via should_exit_jail_early() method
- Support two paths to enter jail: explicit sequences OR tool call detection
- Support two paths to exit jail: end sequences OR complete tool call parsing
- Add tests for dual entry paths and early exit behavior
- Improve debug logging to show which condition triggered jail start/end
- Separate context passing from stream transformations
- Change transform_embedding_postprocessor_stream to return impl Stream
- Update transform_postprocessor_stream to accept context as parameter
- Defer boxing until the final ResponseStream::new() call
- Extract context once at the beginning of generate() methods
- Reduce boxing from 5-6 locations to 1 per request
- Improves performance by eliminating intermediate heap allocations
- Only apply jail when tools are present AND tool_choice is not None
- Error if tool_choice is 'required', 'auto', or named but no parser configured
- Skip jail entirely when tool_choice is explicitly None
- Add should_apply_tool_jail() to determine jail application logic
- Refactor to separate decision logic from stream processing
- Avoid lifetime issues by evaluating conditions before stream transformation
- Removed old jail implementation tests that are no longer compatible
- Kept test_detect_tool_call_start_different_parsers as it tests parser behavior
- Old tests were testing annotation behavior that no longer exists in new JailedStream
- New tests in jail.rs provide equivalent coverage
Resolved merge conflicts in .gitignore and preprocessor.rs:
- Fixed gitignore format for Ruler generated files
- Removed duplicate jail implementation from main
- Kept optimized JailedStream with conditional application
- Fixed transform_postprocessor_stream signature usage
- All tests passing
- Remove unused imports and dead code warnings in test module
- Fix test_streaming_usage.rs function signature for transform_postprocessor_stream
- Remove test_preprocessor.rs that tested old jail implementation
- Add missing context parameter to transform calls
- All tests now pass and clippy/fmt are clean
Add 10 new test cases for JailedStream covering all parser types and edge cases:
- Hermes parser with <tool_call> markers
- Mistral parser (both variants with and without [TOOL_CALLS] marker)
- Phi4 parser with <|tool_call|> markers
- llama3_json parser with <|python_tag|> markers
- False positive detection for non-tool JSON
- Malformed JSON handling
- Partial tool call scenarios
- Empty stream behavior
- Multiple consecutive tool calls
- Fragmented tool calls across many small chunks

All tests verify silent jailing behavior (no empty chunks) and proper
tool call accumulation between start/end markers.
Fixes issue where create_unjailed_response was discarding upstream correlation
metadata (id, event, comment) from Annotated wrapper, breaking observability.

Changes:
- Capture id/event/comment fields when jail is triggered
- Pass preserved metadata to create_unjailed_response
- Attach original metadata to synthesized responses
- Add comprehensive test coverage for metadata preservation

Tests added:
- test_jailed_stream_preserves_metadata: Normal jail processing
- test_jailed_stream_preserves_metadata_on_stream_end: Stream termination
- test_jailed_stream_metadata_edge_cases: Partial/empty metadata

Addresses GitHub issue discussion r2352198367
Fixes issue where trailing content in the same chunk as jail end markers
was being lost. When a chunk contains both an end marker and additional
content (e.g., "</tool_call>trailing text"), the trailing text was
discarded instead of being emitted as a separate chunk.

Changes:
- Find exact position of end markers for precise content splitting
- Split accumulated content at marker boundary into jailed/trailing parts
- Emit jailed content as tool call response, trailing as pass-through
- Handle both explicit end sequences and early exit scenarios
- Add parser-specific end position detection for all supported formats
- Preserve metadata in trailing content chunks

Tests added:
- test_jailed_stream_trailing_content_same_chunk: End marker + trailing
- test_jailed_stream_early_exit_with_trailing: Early exit + trailing

Fixes GitHub issue discussion r2352198360
Replace HashMap-based jail state with deterministic Vec-based collection
to enable independent choice jailing and ensure consistent processing order.

Key improvements:
- Add ChoiceJailState and ChoiceJailStateCollection for deterministic ordering
- Implement independent per-choice jail/unjail logic
- Add EmissionMode configuration (Packed vs SingleChoicePerChunk)
- Support configurable emission patterns for OpenAI compatibility
- Fix double emission issue during choice unjailing
- Maintain backward compatibility for single-choice scenarios

This addresses GitHub issues r2352198339 (determinism) and enables
proper multi-choice (n>1) tool call handling with independent state
management per choice index.

Tests: 22/24 pass (2 expected multi-choice test failures for future work)
This major refactoring enables true independent choice processing where each
choice (when n>1) can jail and unjail independently with deterministic ordering.

Core changes:
- Add ChoiceEmission enum for clean emission abstraction
- Add ChoiceJailState with encapsulated jail logic per choice
- Replace HashMap with deterministic Vec-based ChoiceJailStateCollection
- Implement independent process_content() method per choice
- Add configurable EmissionMode (Packed vs SingleChoicePerChunk)
- Clean separation between state management and emission strategy

Key achievements:
- Multi-choice independent jailing tests now pass
- Deterministic choice ordering (always 0, 1, 2...)
- Metadata preservation through jail processing
- Maintains backward compatibility for single-choice scenarios

Tests: 22/24 pass (2 trailing content edge cases remaining)

Addresses GitHub issues:
- r2352198339: HashMap determinism and independent choice jailing
- Enables proper multi-choice (n>1) tool call handling
Modify emission grouping logic to separate trailing emissions from
tool/content emissions. This ensures trailing content after jail end
markers is always emitted as separate chunks, fixing the failing tests:
- test_jailed_stream_trailing_content_same_chunk
- test_jailed_stream_early_exit_with_trailing

Also remove unused methods to satisfy clippy warnings.

All 24 jail tests now pass with the independent multi-choice jailing
implementation.

Signed-off-by: Ryan Olson <rolson@nvidia.com>
Add efficient prefix matching utility to detect tool call markers split
across chunk boundaries. Solves the issue where markers like "<TOOLCALL>"
were missed when characters arrived in separate chunks.

Key improvements:
- Created utils/prefix_matcher.rs with MarkerMatcher using Aho-Corasick
- Detects partial suffixes (e.g., "n<TO" finds "<TO" as partial match)
- Prevents false positives (e.g., "n < 5" doesn't trigger jailing)
- Preserves chunk boundaries to avoid unnecessary yield consumption
- Handles complete tool calls within single combined chunks
- Maintains backward compatibility with all existing functionality

Tests added:
- test_partial_matching_suffix_detection: Verifies cross-chunk detection
- test_partial_matching_false_positive_prevention: Prevents math expressions

All 26 existing jail tests continue to pass.

Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv requested a review from a team as a code owner September 22, 2025 22:38
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the chore label Sep 22, 2025
@ayushag-nv ayushag-nv marked this pull request as draft September 22, 2025 22:39
Signed-off-by: ayushag <ayushag@nvidia.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 23, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv force-pushed the ayushag/jailstream-opt branch from 3ff1916 to 41c7ef8 Compare September 23, 2025 08:00
@ayushag-nv ayushag-nv added enhancement New feature or request and removed chore labels Sep 23, 2025
Base automatically changed from ryan/streamtool to main September 23, 2025 22:24
@github-actions github-actions bot added the chore label Sep 23, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Ayush Agarwal <ayushag@nvidia.com>
@ayushag-nv ayushag-nv marked this pull request as ready for review September 23, 2025 23:33
@ayushag-nv ayushag-nv marked this pull request as draft September 23, 2025 23:33
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Refactors OpenAI chat_completions jail streaming to centralize choice emission and tool-call handling, introduces parser end-position utilities, and converts multiple tool-calling parser APIs from async to sync. Adds dispatch for end-position detection across parsers and updates builders to derive jail markers from parser config.

Changes

Cohort / File(s) Summary of changes
OpenAI jail streaming refactor
lib/llm/src/protocols/openai/chat_completions/jail.rs
Adds helper create_choice_stream and ChoiceJailState emission helpers; refactors content processing to use helpers; standardizes tool-call vs content emission; clears partial buffers consistently; removes ad-hoc end logic; imports ChatChoiceLogprobs and unified end-position helper; builder auto-populates jail start/end sequences from parser config and common markers.
Harmony parser sync conversion
lib/parsers/src/tool_calling/harmony/harmony_parser.rs
Converts async functions to sync; replaces tokio::sync::OnceCell with std::sync::OnceLock; updates encoding initialization path; adjusts tests to non-async.
End-position helpers (Harmony)
lib/parsers/src/tool_calling/harmony/mod.rs
Adds find_tool_call_end_position_harmony(chunk: &str) -> usize returning chunk.len().
End-position helpers (JSON)
lib/parsers/src/tool_calling/json/mod.rs
Adds find_tool_call_end_position_json(chunk, parser, config) with parser-specific logic for end detection (hermes/nemotron_deci via end tokens; mistral/phi4 via last ]; default to chunk.len()).
End-position helpers (Pythonic)
lib/parsers/src/tool_calling/pythonic/mod.rs
Adds find_tool_call_end_position_pythonic(chunk: &str) -> usize returning chunk.len().
Parser dispatch: sync + end-position
lib/parsers/src/tool_calling/parsers.rs
Adds find_tool_call_end_position(chunk, parser_str) dispatcher; converts try_tool_call_parse and detect_and_parse_tool_call to sync; wires Harmony/JSON/Pythonic end-position helpers; maintains unsupported branches for Typescript/XML.
Top-level parser re-exports
lib/parsers/src/tool_calling/mod.rs
Re-exports find_tool_call_end_position publicly.
Tool-call utilities sync conversion
lib/parsers/src/tool_calling/tools.rs
Converts try_tool_call_parse_aggregate and try_tool_call_parse_stream to sync and removes awaits on detection/parse.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Stream as OpenAI Stream
  participant Jail as JailedStream
  participant Parser as ToolCall Parsers
  participant Emit as Choice Emission

  Client->>Stream: receive chunk
  Stream->>Jail: process_content(chunk)
  Jail->>Parser: find_tool_call_end_position(chunk, parser_str)
  Parser-->>Jail: end_index
  alt Tool-call markers detected
    Jail->>Jail: accumulate tool_calls / trailing
    Jail->>Emit: emit_tool_or_content(...)
  else No markers
    Jail->>Emit: emit_pass_through or content
  end
  Emit-->>Client: ChatChoiceStream chunk(s)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In streams I nibble, byte by byte,
Markers blink like stars at night.
Tools or tales? I sniff the end—
Then hop along the flow I send.
Sync now swift, my whiskers preen—
A tidy warren, fast and clean. 🐇✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description contains only the repository template headings with placeholder text and lacks any substantive summary, list of changes, files to review, rationale, or testing information, so it does not meet the repository's required description content. Populate the Overview with a brief summary of intent and high-level changes, fill Details with the specific file changes and behavioral impact, list Where should the reviewer start with key files and functions to inspect, and add Related Issues plus testing/integration notes and any migration or compatibility considerations.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "chore: JailStream Optimizations" is concise and directly describes the primary change (optimizations and refactors to the JailStream logic) and is specific enough for a teammate scanning history to understand the main intent.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/parsers/src/tool_calling/tools.rs (1)

10-16: Remove .await from sync API callsites; lower parser log level to debug.

try_tool_call_parse_aggregate and try_tool_call_parse_stream are synchronous; awaiting them will fail to compile.

  • Remove .await at: lib/llm/src/protocols/openai/chat_completions/jail.rs:641, 664-665, 723.
  • Change tracing::info! → tracing::debug! in lib/parsers/src/tool_calling/tools.rs (around lines 19–21) to avoid noisy logs on hot paths.
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)

639-645: Call the free function find_tool_call_end_position correctly

You're calling a non‑existent method on self and passing &String where Option<&str> is expected.

Apply this diff:

-                if let Ok((_, _)) =
-                    try_tool_call_parse_aggregate(accumulated_content, Some(parser)).await
+                if let Ok((_, _)) = try_tool_call_parse_aggregate(
+                    accumulated_content,
+                    Some(parser.as_str()),
+                )
+                .await
                 {
-                    let split_pos = self.find_tool_call_end_position(accumulated_content, parser);
+                    let split_pos = find_tool_call_end_position(
+                        accumulated_content,
+                        Some(parser.as_str()),
+                    );
                     (true, split_pos)
                 } else {

663-699: Avoid unstable let‑chains in if let; restructure for stable Rust

if let PATTERN = EXPR && guard is not stable. Use nested if.

Apply this diff:

-        if let Ok((tool_calls, normal_text)) =
-            try_tool_call_parse_aggregate(accumulated_content, self.tool_call_parser.as_deref())
-                .await
-            && !tool_calls.is_empty()
-        {
-            // Convert to streaming format
-            let tool_call_chunks: Vec<ChatCompletionMessageToolCallChunk> = tool_calls
-                .into_iter()
-                .enumerate()
-                .map(|(idx, tool_call)| ChatCompletionMessageToolCallChunk {
-                    index: idx as u32,
-                    id: Some(tool_call.id),
-                    r#type: Some(tool_call.r#type),
-                    function: Some(FunctionCallStream {
-                        name: Some(tool_call.function.name),
-                        arguments: Some(tool_call.function.arguments),
-                    }),
-                })
-                .collect();
-
-            // Create choice with tool calls
-            #[allow(deprecated)]
-            return ChatChoiceStream {
-                index: choice_index,
-                delta: ChatCompletionStreamResponseDelta {
-                    role: Some(Role::Assistant),
-                    content: normal_text.filter(|t| !t.is_empty()),
-                    tool_calls: Some(tool_call_chunks),
-                    function_call: None,
-                    refusal: None,
-                    reasoning_content: None,
-                },
-                finish_reason: Some(FinishReason::ToolCalls),
-                logprobs: None,
-            };
-        }
+        if let Ok((tool_calls, normal_text)) =
+            try_tool_call_parse_aggregate(accumulated_content, self.tool_call_parser.as_deref())
+                .await
+        {
+            if !tool_calls.is_empty() {
+                // Convert to streaming format
+                let tool_call_chunks: Vec<ChatCompletionMessageToolCallChunk> = tool_calls
+                    .into_iter()
+                    .enumerate()
+                    .map(|(idx, tool_call)| ChatCompletionMessageToolCallChunk {
+                        index: idx as u32,
+                        id: Some(tool_call.id),
+                        r#type: Some(tool_call.r#type),
+                        function: Some(FunctionCallStream {
+                            name: Some(tool_call.function.name),
+                            arguments: Some(tool_call.function.arguments),
+                        }),
+                    })
+                    .collect();
+
+                // Create choice with tool calls
+                #[allow(deprecated)]
+                return ChatChoiceStream {
+                    index: choice_index,
+                    delta: ChatCompletionStreamResponseDelta {
+                        role: Some(Role::Assistant),
+                        content: normal_text.filter(|t| !t.is_empty()),
+                        tool_calls: Some(tool_call_chunks),
+                        function_call: None,
+                        refusal: None,
+                        reasoning_content: None,
+                    },
+                    finish_reason: Some(FinishReason::ToolCalls),
+                    logprobs: None,
+                };
+            }
+        }
🧹 Nitpick comments (16)
lib/parsers/src/tool_calling/json/mod.rs (1)

45-71: Guard against empty end tokens; consider scanning all configured end tokens.

If the first configured end token is an empty string, chunk.find(end_token) returns 0, causing a premature end position. Also, when multiple non-empty end tokens exist, picking the earliest match is safer.

Apply:

-        "hermes" | "nemotron_deci" => {
-            if let Some(end_token) = config.tool_call_end_tokens.first() {
-                if let Some(pos) = chunk.find(end_token) {
-                    pos + end_token.len()
-                } else {
-                    chunk.len()
-                }
-            } else {
-                chunk.len()
-            }
-        }
+        "hermes" | "nemotron_deci" => {
+            let end_pos = config
+                .tool_call_end_tokens
+                .iter()
+                .filter(|t| !t.is_empty())
+                .filter_map(|t| chunk.find(t).map(|pos| pos + t.len()))
+                .min()
+                .unwrap_or(chunk.len());
+            end_pos
+        }

Please confirm ToolCallConfig::hermes() and ToolCallConfig::nemotron_deci() set a non-empty end token first (e.g., "</tool_call>", "</TOOLCALL>"), or accept the above defensive change.

lib/parsers/src/tool_calling/pythonic/mod.rs (1)

9-11: Return end of closing bracket if present.

Mirrors Mistral/Phi4 handling and avoids over-reading beyond a completed call list.

 pub fn find_tool_call_end_position_pythonic(chunk: &str) -> usize {
-    chunk.len()
+    if let Some(pos) = chunk.rfind(']') {
+        pos + 1
+    } else {
+        chunk.len()
+    }
 }
lib/parsers/src/tool_calling/harmony/mod.rs (1)

11-13: LGTM; parity with dispatcher achieved.

Returning chunk.len() is acceptable for Harmony given complete-parse behavior. Consider future extension to accept config if Harmony ever exposes end tokens.

lib/parsers/src/tool_calling/parsers.rs (2)

127-155: End-position dispatcher: sensible defaults; minor hardening optional.

  • Unknown or unimplemented parsers return chunk.len() — sane.
  • JSON path forwards parser_key to JSON-specific logic — good.

Add targeted tests for find_tool_call_end_position across: hermes, nemotron_deci, mistral, phi4 (including edge with trailing normal text), to lock behavior.


189-206: Tests now call sync APIs — fine; consider dropping tokio where unnecessary.

Async test harness remains but calls are sync; optional cleanup to #[test] for purely sync cases.

Also applies to: 202-219, 230-234, 245-250, 283-293, 299-306, 315-319, 336-337, 367-368, 386-387, 401-413, 431-433, 484-485, 511-512, 525-526, 539-540, 558-559, 572-573, 590-591, 616-617, 634-635, 668-669, 682-683, 701-702, 729-730, 746-747, 759-760, 775-776, 788-789, 801-802, 817-818, 835-836, 853-854, 864-871, 899-904, 939-947, 970-982, 987-995, 1001-1009, 1022-1030, 1040-1048, 1053-1061, 1081-1087, 1092-1098, 1106-1117, 1125-1136, 1143-1151, 1158-1166, 1172-1179, 1185-1192, 1199-1206, 1213-1218, 1224-1228, 1236-1266, 1274-1286, 1293-1309, 1325-1338, 1353-1366, 1372-1383, 1389-1401, 1407-1444

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (10)

11-19: OnceLock switch LGTM; confirm encoding choice/name.

Good move to std::sync::OnceLock. Please confirm HarmonyEncodingName::HarmonyGptOss is correct for this generic harmony tool-calling parser. If this parser is used beyond GPT‑OSS, consider parameterizing the encoding or renaming the static to a model‑agnostic identifier to avoid confusion.


137-144: Avoid needless clones and unwrap when building ToolCallResponse.

Use the owned fname and Value::to_string(); serde_json::to_string on Value shouldn’t fail, and this trims one clone and an unwrap.

Apply this diff:

-                res.push(ToolCallResponse {
-                    id: format!("call-{}", call_idx),
-                    tp: ToolCallType::Function,
-                    function: CalledFunction {
-                        name: fname.to_string(),
-                        // Safety: `Value::Object` is always valid JSON, so serialization cannot fail
-                        arguments: serde_json::to_string(&args).unwrap(),
-                    },
-                });
+                res.push(ToolCallResponse {
+                    id: format!("call-{}", call_idx),
+                    tp: ToolCallType::Function,
+                    function: CalledFunction {
+                        name: fname,
+                        arguments: args.to_string(),
+                    },
+                });

147-153: Potential panic: indexing message.content[0].

Consistency with prior parsers is noted, but this can panic if content is empty. Safer pattern keeps behavior identical when present.

Apply these diffs:

-        if message.author.role == Role::Assistant && message.channel.as_deref() == Some("analysis")
-        {
-            normal_text.push_str(match &message.content[0] {
-                Text(t) => &t.text,
-                _ => "",
-            });
-        }
+        if message.author.role == Role::Assistant && message.channel.as_deref() == Some("analysis") {
+            if let Some(Text(t)) = message.content.first() {
+                normal_text.push_str(&t.text);
+            }
+        }
-        } else if channel == Some("analysis") {
-            normal_text.push_str(match &message.content[0] {
-                Text(t) => &t.text,
-                _ => "",
-            });
-        }
+        } else if channel == Some("analysis") {
+            if let Some(Text(t)) = message.content.first() {
+                normal_text.push_str(&t.text);
+            }
+        }

Also applies to: 251-255


155-155: Avoid unnecessary String clone on return.

Return the owned String directly.

Apply these diffs:

-    Ok((res, Some(normal_text.to_string())))
+    Ok((res, Some(normal_text)))
-    Ok((res, Some(normal_text.to_string())))
+    Ok((res, Some(normal_text)))

Also applies to: 257-257


225-248: Duplicate clone/unwrap in complete parser.

Same cleanup as streaming variant.

Apply this diff:

-                res.push(ToolCallResponse {
-                    id: format!("call-{}", call_idx),
-                    tp: ToolCallType::Function,
-                    function: CalledFunction {
-                        name: fname.to_string(),
-                        // Safety: `Value::Object` is always valid JSON, so serialization cannot fail
-                        arguments: serde_json::to_string(&args).unwrap(),
-                    },
-                });
+                res.push(ToolCallResponse {
+                    id: format!("call-{}", call_idx),
+                    tp: ToolCallType::Function,
+                    function: CalledFunction {
+                        name: fname,
+                        arguments: args.to_string(),
+                    },
+                });

283-299: Tighten partial-token detection loop (remove redundant get/len slicing).

Current code allocates and then calls .get(..) only to slice the entire string again. Simplify without changing logic.

Apply these diffs:

-        config.tool_call_start_tokens.iter().any(|token| {
+        config.tool_call_start_tokens.iter().any(|token| {
             if token.is_empty() {
                 return false;
             }
             // Check if the chunk could be a prefix of this start token
-            // Handle Unicode character boundaries properly
-            for i in 1..=token.chars().count() {
-                if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) {
-                    let prefix_str = &prefix[..prefix.len()];
-                    if trimmed == prefix_str || trimmed.ends_with(prefix_str) {
-                        return true;
-                    }
-                }
-            }
+            for i in 1..=token.chars().count() {
+                let prefix: String = token.chars().take(i).collect();
+                if trimmed == prefix || trimmed.ends_with(&prefix) {
+                    return true;
+                }
+            }
             false
         })
-        let has_partial_token = config.tool_call_start_tokens.iter().any(|token| {
+        let has_partial_token = config.tool_call_start_tokens.iter().any(|token| {
             if token.is_empty() {
                 return false;
             }
             // Check if the chunk could be a prefix of this start token
-            // Handle Unicode character boundaries properly
-            for i in 1..=token.chars().count() {
-                if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) {
-                    let prefix_str = &prefix[..prefix.len()];
-                    if trimmed == prefix_str || trimmed.ends_with(prefix_str) {
-                        return true;
-                    }
-                }
-            }
+            for i in 1..=token.chars().count() {
+                let prefix: String = token.chars().take(i).collect();
+                if trimmed == prefix || trimmed.ends_with(&prefix) {
+                    return true;
+                }
+            }
             false
         });

Also applies to: 311-329


188-190: Nit: stray comment markers.

Remove the duplicate slashes.

Apply this diff:

-    // // Encode the text into tokens using harmony encoding
+    // Encode the text into tokens using harmony encoding

86-86: Nit: typo in comment.

“Iteratate” → “Iterate”.

Apply this diff:

-    // Iteratate through messages and extract tool calls if there
+    // Iterate through messages and extract tool calls if present

364-375: Tests: make sync to match new API (drop tokio/async).

These tests don’t await anything. Convert to plain #[test] for consistency.

Apply these diffs:

-    #[tokio::test]
-    async fn test_parse_tool_calls_harmony_complete_basic() {
+    #[test]
+    fn test_parse_tool_calls_harmony_complete_basic() {
         let text = r#"<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"format":"celsius","location":"San Francisco"}"#;
-        let (tool_calls, normal_content) =
-            parse_tool_calls_harmony_complete(text, &Default::default()).unwrap();
+        let (tool_calls, normal_content) =
+            parse_tool_calls_harmony_complete(text, &Default::default()).unwrap();
-    #[tokio::test]
-    async fn test_parse_tools_harmony_without_start_token() {
+    #[test]
+    fn test_parse_tools_harmony_without_start_token() {
         let text = r#"
 <|channel|>analysis<|message|>Need to use function get_current_weather.<|end|>
 <|message|>{"location":"San Francisco"}<|call|>
 "#;

Also applies to: 376-390


475-477: Nit: typo in test comment.

“warkaround” → “workaround”.

Apply this diff:

-        // This is a warkaround for now. Right now everything is treated as tool call start token.
+        // This is a workaround for now. Right now everything is treated as a tool call start token.
lib/llm/src/protocols/openai/chat_completions/jail.rs (1)

831-877: Dedup marker patterns before constructing the matcher

Prevents redundant work and ambiguous matches if the same marker is added from multiple sources.

Apply this diff:

         // Create the marker matcher (fallback to empty patterns if none configured)
-        let marker_matcher = if all_patterns.is_empty() {
+        // Dedup to avoid redundant comparisons
+        all_patterns.sort();
+        all_patterns.dedup();
+        let marker_matcher = if all_patterns.is_empty() {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c63ccea and b804181.

📒 Files selected for processing (8)
  • lib/llm/src/protocols/openai/chat_completions/jail.rs (8 hunks)
  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs (11 hunks)
  • lib/parsers/src/tool_calling/harmony/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/json/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (77 hunks)
  • lib/parsers/src/tool_calling/pythonic/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/tools.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/parsers/src/tool_calling/mod.rs
  • lib/parsers/src/tool_calling/harmony/mod.rs
  • lib/parsers/src/tool_calling/tools.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/parsers/src/tool_calling/parsers.rs
  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs
📚 Learning: 2025-09-10T05:04:58.417Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.

Applied to files:

  • lib/parsers/src/tool_calling/tools.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
🧬 Code graph analysis (5)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (4)
  • detect_and_parse_tool_call (72-96)
  • detect_tool_call_start (98-125)
  • find_tool_call_end_position (127-155)
  • try_tool_call_parse (43-69)
lib/parsers/src/tool_calling/tools.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
  • detect_and_parse_tool_call (72-96)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
lib/parsers/src/tool_calling/parsers.rs (3)
  • get_tool_parser_map (23-37)
  • detect_tool_call_start (98-125)
  • find_tool_call_end_position (127-155)
lib/parsers/src/tool_calling/tools.rs (1)
  • try_tool_call_parse_aggregate (10-42)
lib/parsers/src/tool_calling/parsers.rs (6)
lib/parsers/src/tool_calling/config.rs (5)
  • harmony (145-154)
  • pythonic (138-143)
  • default (45-55)
  • default (68-73)
  • mistral (116-125)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
  • detect_tool_call_start_harmony (260-330)
  • parse_tool_calls_harmony_complete (176-258)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
  • find_tool_call_end_position_harmony (11-13)
lib/parsers/src/tool_calling/json/mod.rs (4)
  • detect_tool_call_start_json (38-43)
  • find_tool_call_end_position_json (45-71)
  • try_tool_call_parse_json (28-36)
  • default (23-25)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
  • detect_tool_call_start_pythonic (190-198)
  • try_tool_call_parse_pythonic (162-188)
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
  • find_tool_call_end_position_pythonic (9-11)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (2)
  • new (39-56)
  • get_harmony_encoding (20-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (5)
lib/parsers/src/tool_calling/mod.rs (1)

16-19: Re-export of end-position dispatcher looks good.

Public surface now includes find_tool_call_end_position.

lib/parsers/src/tool_calling/parsers.rs (3)

5-15: Consolidated per-parser imports (incl. end-position helpers) — good.

Keeps dispatch localized and explicit.


43-69: Synchronous try_tool_call_parse conversion is consistent.

Harmony path uses parse_tool_calls_harmony_complete, JSON/Pythonic paths unchanged. Looks correct.

Confirm external crates no longer expect an async API for this function.


72-96: Sync detect_and_parse_tool_call — OK.

Defaulting None/empty to "default" parser is clear; error message enumerates available parsers.

lib/llm/src/protocols/openai/chat_completions/jail.rs (1)

78-99: Helper factory looks good

Centralizing ChatChoiceStream construction reduces duplication and error‑prone field copying.

Comment on lines +134 to +140
let pass_through_choice = create_choice_stream(
content,
choice.delta.role,
choice.index,
choice.finish_reason,
choice.logprobs.clone(),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don't move out of a borrowed choice; clone role and finish_reason

This will otherwise try to move fields from an immutably borrowed choice.

Apply this diff:

-            let pass_through_choice = create_choice_stream(
-                content,
-                choice.delta.role,
-                choice.index,
-                choice.finish_reason,
-                choice.logprobs.clone(),
-            );
+            let pass_through_choice = create_choice_stream(
+                content,
+                choice.delta.role.clone(),
+                choice.index,
+                choice.finish_reason.clone(),
+                choice.logprobs.clone(),
+            );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let pass_through_choice = create_choice_stream(
content,
choice.delta.role,
choice.index,
choice.finish_reason,
choice.logprobs.clone(),
);
let pass_through_choice = create_choice_stream(
content,
choice.delta.role.clone(),
choice.index,
choice.finish_reason.clone(),
choice.logprobs.clone(),
);
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 134 to
140, the code is attempting to move fields out of an immutably borrowed
`choice`; change the arguments passed into create_choice_stream so you pass
clones for non-Copy fields (specifically clone `choice.delta.role` and
`choice.finish_reason`) instead of moving them, e.g. call
create_choice_stream(content, choice.delta.role.clone(), choice.index,
choice.finish_reason.clone(), choice.logprobs.clone()) so nothing is moved out
of the borrowed `choice`.

Comment on lines +154 to +160
let trailing_choice = create_choice_stream(
trailing,
choice.delta.role,
choice.index,
choice.finish_reason,
choice.logprobs.clone(),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Clone role and finish_reason here as well

Same issue as above in trailing emission.

Apply this diff:

-            let trailing_choice = create_choice_stream(
-                trailing,
-                choice.delta.role,
-                choice.index,
-                choice.finish_reason,
-                choice.logprobs.clone(),
-            );
+            let trailing_choice = create_choice_stream(
+                trailing,
+                choice.delta.role.clone(),
+                choice.index,
+                choice.finish_reason.clone(),
+                choice.logprobs.clone(),
+            );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let trailing_choice = create_choice_stream(
trailing,
choice.delta.role,
choice.index,
choice.finish_reason,
choice.logprobs.clone(),
);
let trailing_choice = create_choice_stream(
trailing,
choice.delta.role.clone(),
choice.index,
choice.finish_reason.clone(),
choice.logprobs.clone(),
);
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 154 to
160, the trailing_choice creation passes choice.delta.role and
choice.finish_reason by reference (or moves) and does not clone them like
earlier choices do; update the call to clone the role and finish_reason (e.g.,
use choice.delta.role.clone() and choice.finish_reason.clone()) so the trailing
emitted choice owns its data and matches the handling used for other choice
emissions.


let (should_end, split_pos) =
jail_stream.should_end_jail(&self.accumulated_content).await;
let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Await the async end‑detection (jailed path)

Missing .await here too.

Apply this diff:

-            let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content);
+            let (should_end, split_pos) = jail_stream
+                .should_end_jail(&self.accumulated_content)
+                .await;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content);
let (should_end, split_pos) = jail_stream
.should_end_jail(&self.accumulated_content)
.await;
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around line 282, the
call to jail_stream.should_end_jail(&self.accumulated_content) is missing an
await; change it to await the async function (i.e., call .await on the returned
future) and update surrounding code if necessary to be inside an async context
so the result (should_end, split_pos) is obtained correctly.

Comment on lines +293 to 296
let unjailed_choice =
jail_stream.create_tool_call_choice(choice.index, jailed_part, choice);
Self::emit_tool_or_content(&mut emissions, unjailed_choice);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Await the async tool‑choice creation (jailed path)

Add .await.

Apply this diff:

-                let unjailed_choice =
-                    jail_stream.create_tool_call_choice(choice.index, jailed_part, choice);
+                let unjailed_choice = jail_stream
+                    .create_tool_call_choice(choice.index, jailed_part, choice)
+                    .await;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let unjailed_choice =
jail_stream.create_tool_call_choice(choice.index, jailed_part, choice);
Self::emit_tool_or_content(&mut emissions, unjailed_choice);
let unjailed_choice = jail_stream
.create_tool_call_choice(choice.index, jailed_part, choice)
.await;
Self::emit_tool_or_content(&mut emissions, unjailed_choice);
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 293 to
296, the jailed path calls jail_stream.create_tool_call_choice(...) which is
async but is not awaited; change the call to await the future (add .await) so
you assign the resolved value to unjailed_choice before passing it to
Self::emit_tool_or_content; ensure the enclosing function is async or otherwise
able to await (and adjust error/Result handling if needed).

Comment on lines +327 to +331
let final_choice = jail_stream.create_tool_call_choice(
self.index,
&self.accumulated_content,
&dummy_choice,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Await during finalize

create_tool_call_choice is async; missing .await.

Apply this diff:

-            let final_choice = jail_stream.create_tool_call_choice(
-                self.index,
-                &self.accumulated_content,
-                &dummy_choice,
-            );
+            let final_choice = jail_stream
+                .create_tool_call_choice(self.index, &self.accumulated_content, &dummy_choice)
+                .await;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let final_choice = jail_stream.create_tool_call_choice(
self.index,
&self.accumulated_content,
&dummy_choice,
);
let final_choice = jail_stream
.create_tool_call_choice(self.index, &self.accumulated_content, &dummy_choice)
.await;
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 327 to
331, the call to create_tool_call_choice is asynchronous but is invoked without
awaiting; update the call to await it (add .await) and ensure the enclosing
function is async (or otherwise able to await) and that any returned
Result/Value is handled or propagated accordingly so compilation and semantics
remain correct.

@ayushag-nv ayushag-nv closed this Sep 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore enhancement New feature or request size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants