-
Notifications
You must be signed in to change notification settings - Fork 690
chore: JailStream Optimizations #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: ayushag <ayushag@nvidia.com>
…tion - Implement JailedStream with builder pattern for clean API - Use async-stream's stream! macro for cleaner async implementation - Support configurable jail start/end sequences - Integrate tool call detection and parsing - Add comprehensive tests with shared test utilities - Create documentation for usage patterns The JailedStream provides a standalone solution for accumulating tokens when certain sequences are detected (jailing) and releasing them as a single chunk when the jail ends, enabling proper tool call handling in streaming responses. Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove unnecessary annotation code that was never used - Fix efficiency issue by avoiding cloning when not jailing - Only clone response data when actually entering jail state - Simplify jailed behavior to not yield empty chunks - Accumulate content silently and only yield final result - Update tests to match new behavior This makes the stream processing more efficient by: 1. Eliminating unnecessary clones during normal streaming 2. Removing pointless annotation objects 3. Not sending empty chunks while accumulating 4. Only yielding the final aggregated result when jail ends Signed-off-by: Ryan Olson <rolson@nvidia.com>
…rhead - Remove ManyOut type alias and AsyncEngineContextProvider dependency - Change apply() to return impl Stream instead of Pin<Box<ResponseStream>> - Remove unnecessary context extraction and re-wrapping - Use tokio::pin!() for stack pinning instead of Box::pin() (more efficient) - Simplify tests by removing mock context infrastructure - Update documentation with correct usage examples This makes JailedStream a pure stream combinator that: 1. Doesn't handle context (caller's responsibility) 2. Avoids unnecessary boxing in the library 3. Can be composed with other stream transformers 4. Lets callers decide when to add boxing/context Performance improvements: - No heap allocation for pinning (stack pinning instead) - No context passing overhead - Direct stream transformation without wrapper types Signed-off-by: Ryan Olson <rolson@nvidia.com>
- Remove old apply_tool_calling_jail_internal function - Remove PossibleToolCallAnnotation struct and implementations - Update apply_tool_calling_jail_with_parser to conditionally use JailedStream - Only apply jail logic when tool_call_parser is configured - Comment out old jail tests that are no longer applicable - Clean up unused imports and run cargo fmt/clippy
- Change jail entry logic from if/else to evaluate both sequence and tool call conditions - Add early exit detection via should_exit_jail_early() method - Support two paths to enter jail: explicit sequences OR tool call detection - Support two paths to exit jail: end sequences OR complete tool call parsing - Add tests for dual entry paths and early exit behavior - Improve debug logging to show which condition triggered jail start/end
- Separate context passing from stream transformations - Change transform_embedding_postprocessor_stream to return impl Stream - Update transform_postprocessor_stream to accept context as parameter - Defer boxing until the final ResponseStream::new() call - Extract context once at the beginning of generate() methods - Reduce boxing from 5-6 locations to 1 per request - Improves performance by eliminating intermediate heap allocations
- Only apply jail when tools are present AND tool_choice is not None - Error if tool_choice is 'required', 'auto', or named but no parser configured - Skip jail entirely when tool_choice is explicitly None - Add should_apply_tool_jail() to determine jail application logic - Refactor to separate decision logic from stream processing - Avoid lifetime issues by evaluating conditions before stream transformation
- Removed old jail implementation tests that are no longer compatible - Kept test_detect_tool_call_start_different_parsers as it tests parser behavior - Old tests were testing annotation behavior that no longer exists in new JailedStream - New tests in jail.rs provide equivalent coverage
Resolved merge conflicts in .gitignore and preprocessor.rs: - Fixed gitignore format for Ruler generated files - Removed duplicate jail implementation from main - Kept optimized JailedStream with conditional application - Fixed transform_postprocessor_stream signature usage - All tests passing
- Remove unused imports and dead code warnings in test module - Fix test_streaming_usage.rs function signature for transform_postprocessor_stream - Remove test_preprocessor.rs that tested old jail implementation - Add missing context parameter to transform calls - All tests now pass and clippy/fmt are clean
Add 10 new test cases for JailedStream covering all parser types and edge cases: - Hermes parser with <tool_call> markers - Mistral parser (both variants with and without [TOOL_CALLS] marker) - Phi4 parser with <|tool_call|> markers - llama3_json parser with <|python_tag|> markers - False positive detection for non-tool JSON - Malformed JSON handling - Partial tool call scenarios - Empty stream behavior - Multiple consecutive tool calls - Fragmented tool calls across many small chunks All tests verify silent jailing behavior (no empty chunks) and proper tool call accumulation between start/end markers.
Fixes issue where create_unjailed_response was discarding upstream correlation metadata (id, event, comment) from Annotated wrapper, breaking observability. Changes: - Capture id/event/comment fields when jail is triggered - Pass preserved metadata to create_unjailed_response - Attach original metadata to synthesized responses - Add comprehensive test coverage for metadata preservation Tests added: - test_jailed_stream_preserves_metadata: Normal jail processing - test_jailed_stream_preserves_metadata_on_stream_end: Stream termination - test_jailed_stream_metadata_edge_cases: Partial/empty metadata Addresses GitHub issue discussion r2352198367
Fixes issue where trailing content in the same chunk as jail end markers was being lost. When a chunk contains both an end marker and additional content (e.g., "</tool_call>trailing text"), the trailing text was discarded instead of being emitted as a separate chunk. Changes: - Find exact position of end markers for precise content splitting - Split accumulated content at marker boundary into jailed/trailing parts - Emit jailed content as tool call response, trailing as pass-through - Handle both explicit end sequences and early exit scenarios - Add parser-specific end position detection for all supported formats - Preserve metadata in trailing content chunks Tests added: - test_jailed_stream_trailing_content_same_chunk: End marker + trailing - test_jailed_stream_early_exit_with_trailing: Early exit + trailing Fixes GitHub issue discussion r2352198360
Replace HashMap-based jail state with deterministic Vec-based collection to enable independent choice jailing and ensure consistent processing order. Key improvements: - Add ChoiceJailState and ChoiceJailStateCollection for deterministic ordering - Implement independent per-choice jail/unjail logic - Add EmissionMode configuration (Packed vs SingleChoicePerChunk) - Support configurable emission patterns for OpenAI compatibility - Fix double emission issue during choice unjailing - Maintain backward compatibility for single-choice scenarios This addresses GitHub issues r2352198339 (determinism) and enables proper multi-choice (n>1) tool call handling with independent state management per choice index. Tests: 22/24 pass (2 expected multi-choice test failures for future work)
This major refactoring enables true independent choice processing where each choice (when n>1) can jail and unjail independently with deterministic ordering. Core changes: - Add ChoiceEmission enum for clean emission abstraction - Add ChoiceJailState with encapsulated jail logic per choice - Replace HashMap with deterministic Vec-based ChoiceJailStateCollection - Implement independent process_content() method per choice - Add configurable EmissionMode (Packed vs SingleChoicePerChunk) - Clean separation between state management and emission strategy Key achievements: - Multi-choice independent jailing tests now pass - Deterministic choice ordering (always 0, 1, 2...) - Metadata preservation through jail processing - Maintains backward compatibility for single-choice scenarios Tests: 22/24 pass (2 trailing content edge cases remaining) Addresses GitHub issues: - r2352198339: HashMap determinism and independent choice jailing - Enables proper multi-choice (n>1) tool call handling
Modify emission grouping logic to separate trailing emissions from tool/content emissions. This ensures trailing content after jail end markers is always emitted as separate chunks, fixing the failing tests: - test_jailed_stream_trailing_content_same_chunk - test_jailed_stream_early_exit_with_trailing Also remove unused methods to satisfy clippy warnings. All 24 jail tests now pass with the independent multi-choice jailing implementation. Signed-off-by: Ryan Olson <rolson@nvidia.com>
Add efficient prefix matching utility to detect tool call markers split across chunk boundaries. Solves the issue where markers like "<TOOLCALL>" were missed when characters arrived in separate chunks. Key improvements: - Created utils/prefix_matcher.rs with MarkerMatcher using Aho-Corasick - Detects partial suffixes (e.g., "n<TO" finds "<TO" as partial match) - Prevents false positives (e.g., "n < 5" doesn't trigger jailing) - Preserves chunk boundaries to avoid unnecessary yield consumption - Handles complete tool calls within single combined chunks - Maintains backward compatibility with all existing functionality Tests added: - test_partial_matching_suffix_detection: Verifies cross-chunk detection - test_partial_matching_false_positive_prevention: Prevents math expressions All 26 existing jail tests continue to pass. Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
3ff1916 to
41c7ef8
Compare
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Ayush Agarwal <ayushag@nvidia.com>
WalkthroughRefactors OpenAI chat_completions jail streaming to centralize choice emission and tool-call handling, introduces parser end-position utilities, and converts multiple tool-calling parser APIs from async to sync. Adds dispatch for end-position detection across parsers and updates builders to derive jail markers from parser config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Stream as OpenAI Stream
participant Jail as JailedStream
participant Parser as ToolCall Parsers
participant Emit as Choice Emission
Client->>Stream: receive chunk
Stream->>Jail: process_content(chunk)
Jail->>Parser: find_tool_call_end_position(chunk, parser_str)
Parser-->>Jail: end_index
alt Tool-call markers detected
Jail->>Jail: accumulate tool_calls / trailing
Jail->>Emit: emit_tool_or_content(...)
else No markers
Jail->>Emit: emit_pass_through or content
end
Emit-->>Client: ChatChoiceStream chunk(s)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
lib/parsers/src/tool_calling/tools.rs (1)
10-16: Remove.awaitfrom sync API callsites; lower parser log level to debug.try_tool_call_parse_aggregate and try_tool_call_parse_stream are synchronous; awaiting them will fail to compile.
- Remove
.awaitat: lib/llm/src/protocols/openai/chat_completions/jail.rs:641, 664-665, 723.- Change tracing::info! → tracing::debug! in lib/parsers/src/tool_calling/tools.rs (around lines 19–21) to avoid noisy logs on hot paths.
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
639-645: Call the free functionfind_tool_call_end_positioncorrectlyYou're calling a non‑existent method on
selfand passing&StringwhereOption<&str>is expected.Apply this diff:
- if let Ok((_, _)) = - try_tool_call_parse_aggregate(accumulated_content, Some(parser)).await + if let Ok((_, _)) = try_tool_call_parse_aggregate( + accumulated_content, + Some(parser.as_str()), + ) + .await { - let split_pos = self.find_tool_call_end_position(accumulated_content, parser); + let split_pos = find_tool_call_end_position( + accumulated_content, + Some(parser.as_str()), + ); (true, split_pos) } else {
663-699: Avoid unstable let‑chains inif let; restructure for stable Rust
if let PATTERN = EXPR && guardis not stable. Use nestedif.Apply this diff:
- if let Ok((tool_calls, normal_text)) = - try_tool_call_parse_aggregate(accumulated_content, self.tool_call_parser.as_deref()) - .await - && !tool_calls.is_empty() - { - // Convert to streaming format - let tool_call_chunks: Vec<ChatCompletionMessageToolCallChunk> = tool_calls - .into_iter() - .enumerate() - .map(|(idx, tool_call)| ChatCompletionMessageToolCallChunk { - index: idx as u32, - id: Some(tool_call.id), - r#type: Some(tool_call.r#type), - function: Some(FunctionCallStream { - name: Some(tool_call.function.name), - arguments: Some(tool_call.function.arguments), - }), - }) - .collect(); - - // Create choice with tool calls - #[allow(deprecated)] - return ChatChoiceStream { - index: choice_index, - delta: ChatCompletionStreamResponseDelta { - role: Some(Role::Assistant), - content: normal_text.filter(|t| !t.is_empty()), - tool_calls: Some(tool_call_chunks), - function_call: None, - refusal: None, - reasoning_content: None, - }, - finish_reason: Some(FinishReason::ToolCalls), - logprobs: None, - }; - } + if let Ok((tool_calls, normal_text)) = + try_tool_call_parse_aggregate(accumulated_content, self.tool_call_parser.as_deref()) + .await + { + if !tool_calls.is_empty() { + // Convert to streaming format + let tool_call_chunks: Vec<ChatCompletionMessageToolCallChunk> = tool_calls + .into_iter() + .enumerate() + .map(|(idx, tool_call)| ChatCompletionMessageToolCallChunk { + index: idx as u32, + id: Some(tool_call.id), + r#type: Some(tool_call.r#type), + function: Some(FunctionCallStream { + name: Some(tool_call.function.name), + arguments: Some(tool_call.function.arguments), + }), + }) + .collect(); + + // Create choice with tool calls + #[allow(deprecated)] + return ChatChoiceStream { + index: choice_index, + delta: ChatCompletionStreamResponseDelta { + role: Some(Role::Assistant), + content: normal_text.filter(|t| !t.is_empty()), + tool_calls: Some(tool_call_chunks), + function_call: None, + refusal: None, + reasoning_content: None, + }, + finish_reason: Some(FinishReason::ToolCalls), + logprobs: None, + }; + } + }
🧹 Nitpick comments (16)
lib/parsers/src/tool_calling/json/mod.rs (1)
45-71: Guard against empty end tokens; consider scanning all configured end tokens.If the first configured end token is an empty string,
chunk.find(end_token)returns 0, causing a premature end position. Also, when multiple non-empty end tokens exist, picking the earliest match is safer.Apply:
- "hermes" | "nemotron_deci" => { - if let Some(end_token) = config.tool_call_end_tokens.first() { - if let Some(pos) = chunk.find(end_token) { - pos + end_token.len() - } else { - chunk.len() - } - } else { - chunk.len() - } - } + "hermes" | "nemotron_deci" => { + let end_pos = config + .tool_call_end_tokens + .iter() + .filter(|t| !t.is_empty()) + .filter_map(|t| chunk.find(t).map(|pos| pos + t.len())) + .min() + .unwrap_or(chunk.len()); + end_pos + }Please confirm
ToolCallConfig::hermes()andToolCallConfig::nemotron_deci()set a non-empty end token first (e.g.,"</tool_call>","</TOOLCALL>"), or accept the above defensive change.lib/parsers/src/tool_calling/pythonic/mod.rs (1)
9-11: Return end of closing bracket if present.Mirrors Mistral/Phi4 handling and avoids over-reading beyond a completed call list.
pub fn find_tool_call_end_position_pythonic(chunk: &str) -> usize { - chunk.len() + if let Some(pos) = chunk.rfind(']') { + pos + 1 + } else { + chunk.len() + } }lib/parsers/src/tool_calling/harmony/mod.rs (1)
11-13: LGTM; parity with dispatcher achieved.Returning
chunk.len()is acceptable for Harmony given complete-parse behavior. Consider future extension to accept config if Harmony ever exposes end tokens.lib/parsers/src/tool_calling/parsers.rs (2)
127-155: End-position dispatcher: sensible defaults; minor hardening optional.
- Unknown or unimplemented parsers return
chunk.len()— sane.- JSON path forwards
parser_keyto JSON-specific logic — good.Add targeted tests for
find_tool_call_end_positionacross: hermes, nemotron_deci, mistral, phi4 (including edge with trailing normal text), to lock behavior.
189-206: Tests now call sync APIs — fine; consider dropping tokio where unnecessary.Async test harness remains but calls are sync; optional cleanup to
#[test]for purely sync cases.Also applies to: 202-219, 230-234, 245-250, 283-293, 299-306, 315-319, 336-337, 367-368, 386-387, 401-413, 431-433, 484-485, 511-512, 525-526, 539-540, 558-559, 572-573, 590-591, 616-617, 634-635, 668-669, 682-683, 701-702, 729-730, 746-747, 759-760, 775-776, 788-789, 801-802, 817-818, 835-836, 853-854, 864-871, 899-904, 939-947, 970-982, 987-995, 1001-1009, 1022-1030, 1040-1048, 1053-1061, 1081-1087, 1092-1098, 1106-1117, 1125-1136, 1143-1151, 1158-1166, 1172-1179, 1185-1192, 1199-1206, 1213-1218, 1224-1228, 1236-1266, 1274-1286, 1293-1309, 1325-1338, 1353-1366, 1372-1383, 1389-1401, 1407-1444
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (10)
11-19: OnceLock switch LGTM; confirm encoding choice/name.Good move to std::sync::OnceLock. Please confirm HarmonyEncodingName::HarmonyGptOss is correct for this generic harmony tool-calling parser. If this parser is used beyond GPT‑OSS, consider parameterizing the encoding or renaming the static to a model‑agnostic identifier to avoid confusion.
137-144: Avoid needless clones and unwrap when building ToolCallResponse.Use the owned fname and Value::to_string(); serde_json::to_string on Value shouldn’t fail, and this trims one clone and an unwrap.
Apply this diff:
- res.push(ToolCallResponse { - id: format!("call-{}", call_idx), - tp: ToolCallType::Function, - function: CalledFunction { - name: fname.to_string(), - // Safety: `Value::Object` is always valid JSON, so serialization cannot fail - arguments: serde_json::to_string(&args).unwrap(), - }, - }); + res.push(ToolCallResponse { + id: format!("call-{}", call_idx), + tp: ToolCallType::Function, + function: CalledFunction { + name: fname, + arguments: args.to_string(), + }, + });
147-153: Potential panic: indexing message.content[0].Consistency with prior parsers is noted, but this can panic if content is empty. Safer pattern keeps behavior identical when present.
Apply these diffs:
- if message.author.role == Role::Assistant && message.channel.as_deref() == Some("analysis") - { - normal_text.push_str(match &message.content[0] { - Text(t) => &t.text, - _ => "", - }); - } + if message.author.role == Role::Assistant && message.channel.as_deref() == Some("analysis") { + if let Some(Text(t)) = message.content.first() { + normal_text.push_str(&t.text); + } + }- } else if channel == Some("analysis") { - normal_text.push_str(match &message.content[0] { - Text(t) => &t.text, - _ => "", - }); - } + } else if channel == Some("analysis") { + if let Some(Text(t)) = message.content.first() { + normal_text.push_str(&t.text); + } + }Also applies to: 251-255
155-155: Avoid unnecessary String clone on return.Return the owned String directly.
Apply these diffs:
- Ok((res, Some(normal_text.to_string()))) + Ok((res, Some(normal_text)))- Ok((res, Some(normal_text.to_string()))) + Ok((res, Some(normal_text)))Also applies to: 257-257
225-248: Duplicate clone/unwrap in complete parser.Same cleanup as streaming variant.
Apply this diff:
- res.push(ToolCallResponse { - id: format!("call-{}", call_idx), - tp: ToolCallType::Function, - function: CalledFunction { - name: fname.to_string(), - // Safety: `Value::Object` is always valid JSON, so serialization cannot fail - arguments: serde_json::to_string(&args).unwrap(), - }, - }); + res.push(ToolCallResponse { + id: format!("call-{}", call_idx), + tp: ToolCallType::Function, + function: CalledFunction { + name: fname, + arguments: args.to_string(), + }, + });
283-299: Tighten partial-token detection loop (remove redundant get/len slicing).Current code allocates and then calls .get(..) only to slice the entire string again. Simplify without changing logic.
Apply these diffs:
- config.tool_call_start_tokens.iter().any(|token| { + config.tool_call_start_tokens.iter().any(|token| { if token.is_empty() { return false; } // Check if the chunk could be a prefix of this start token - // Handle Unicode character boundaries properly - for i in 1..=token.chars().count() { - if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) { - let prefix_str = &prefix[..prefix.len()]; - if trimmed == prefix_str || trimmed.ends_with(prefix_str) { - return true; - } - } - } + for i in 1..=token.chars().count() { + let prefix: String = token.chars().take(i).collect(); + if trimmed == prefix || trimmed.ends_with(&prefix) { + return true; + } + } false })- let has_partial_token = config.tool_call_start_tokens.iter().any(|token| { + let has_partial_token = config.tool_call_start_tokens.iter().any(|token| { if token.is_empty() { return false; } // Check if the chunk could be a prefix of this start token - // Handle Unicode character boundaries properly - for i in 1..=token.chars().count() { - if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) { - let prefix_str = &prefix[..prefix.len()]; - if trimmed == prefix_str || trimmed.ends_with(prefix_str) { - return true; - } - } - } + for i in 1..=token.chars().count() { + let prefix: String = token.chars().take(i).collect(); + if trimmed == prefix || trimmed.ends_with(&prefix) { + return true; + } + } false });Also applies to: 311-329
188-190: Nit: stray comment markers.Remove the duplicate slashes.
Apply this diff:
- // // Encode the text into tokens using harmony encoding + // Encode the text into tokens using harmony encoding
86-86: Nit: typo in comment.“Iteratate” → “Iterate”.
Apply this diff:
- // Iteratate through messages and extract tool calls if there + // Iterate through messages and extract tool calls if present
364-375: Tests: make sync to match new API (drop tokio/async).These tests don’t await anything. Convert to plain #[test] for consistency.
Apply these diffs:
- #[tokio::test] - async fn test_parse_tool_calls_harmony_complete_basic() { + #[test] + fn test_parse_tool_calls_harmony_complete_basic() { let text = r#"<|channel|>commentary to=functions.get_current_weather <|constrain|>json<|message|>{"format":"celsius","location":"San Francisco"}"#; - let (tool_calls, normal_content) = - parse_tool_calls_harmony_complete(text, &Default::default()).unwrap(); + let (tool_calls, normal_content) = + parse_tool_calls_harmony_complete(text, &Default::default()).unwrap();- #[tokio::test] - async fn test_parse_tools_harmony_without_start_token() { + #[test] + fn test_parse_tools_harmony_without_start_token() { let text = r#" <|channel|>analysis<|message|>Need to use function get_current_weather.<|end|> <|message|>{"location":"San Francisco"}<|call|> "#;Also applies to: 376-390
475-477: Nit: typo in test comment.“warkaround” → “workaround”.
Apply this diff:
- // This is a warkaround for now. Right now everything is treated as tool call start token. + // This is a workaround for now. Right now everything is treated as a tool call start token.lib/llm/src/protocols/openai/chat_completions/jail.rs (1)
831-877: Dedup marker patterns before constructing the matcherPrevents redundant work and ambiguous matches if the same marker is added from multiple sources.
Apply this diff:
// Create the marker matcher (fallback to empty patterns if none configured) - let marker_matcher = if all_patterns.is_empty() { + // Dedup to avoid redundant comparisons + all_patterns.sort(); + all_patterns.dedup(); + let marker_matcher = if all_patterns.is_empty() {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lib/llm/src/protocols/openai/chat_completions/jail.rs(8 hunks)lib/parsers/src/tool_calling/harmony/harmony_parser.rs(11 hunks)lib/parsers/src/tool_calling/harmony/mod.rs(1 hunks)lib/parsers/src/tool_calling/json/mod.rs(1 hunks)lib/parsers/src/tool_calling/mod.rs(1 hunks)lib/parsers/src/tool_calling/parsers.rs(77 hunks)lib/parsers/src/tool_calling/pythonic/mod.rs(1 hunks)lib/parsers/src/tool_calling/tools.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.
Applied to files:
lib/parsers/src/tool_calling/mod.rslib/parsers/src/tool_calling/harmony/mod.rslib/parsers/src/tool_calling/tools.rslib/llm/src/protocols/openai/chat_completions/jail.rslib/parsers/src/tool_calling/parsers.rslib/parsers/src/tool_calling/harmony/harmony_parser.rs
📚 Learning: 2025-09-10T05:04:58.417Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.
Applied to files:
lib/parsers/src/tool_calling/tools.rslib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
Applied to files:
lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Applied to files:
lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/src/protocols/openai/chat_completions/jail.rs
🧬 Code graph analysis (5)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (4)
detect_and_parse_tool_call(72-96)detect_tool_call_start(98-125)find_tool_call_end_position(127-155)try_tool_call_parse(43-69)
lib/parsers/src/tool_calling/tools.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
detect_and_parse_tool_call(72-96)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
lib/parsers/src/tool_calling/parsers.rs (3)
get_tool_parser_map(23-37)detect_tool_call_start(98-125)find_tool_call_end_position(127-155)lib/parsers/src/tool_calling/tools.rs (1)
try_tool_call_parse_aggregate(10-42)
lib/parsers/src/tool_calling/parsers.rs (6)
lib/parsers/src/tool_calling/config.rs (5)
harmony(145-154)pythonic(138-143)default(45-55)default(68-73)mistral(116-125)lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
detect_tool_call_start_harmony(260-330)parse_tool_calls_harmony_complete(176-258)lib/parsers/src/tool_calling/harmony/mod.rs (1)
find_tool_call_end_position_harmony(11-13)lib/parsers/src/tool_calling/json/mod.rs (4)
detect_tool_call_start_json(38-43)find_tool_call_end_position_json(45-71)try_tool_call_parse_json(28-36)default(23-25)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
detect_tool_call_start_pythonic(190-198)try_tool_call_parse_pythonic(162-188)lib/parsers/src/tool_calling/pythonic/mod.rs (1)
find_tool_call_end_position_pythonic(9-11)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (2)
new(39-56)get_harmony_encoding(20-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (5)
lib/parsers/src/tool_calling/mod.rs (1)
16-19: Re-export of end-position dispatcher looks good.Public surface now includes
find_tool_call_end_position.lib/parsers/src/tool_calling/parsers.rs (3)
5-15: Consolidated per-parser imports (incl. end-position helpers) — good.Keeps dispatch localized and explicit.
43-69: Synchronoustry_tool_call_parseconversion is consistent.Harmony path uses
parse_tool_calls_harmony_complete, JSON/Pythonic paths unchanged. Looks correct.Confirm external crates no longer expect an async API for this function.
72-96: Syncdetect_and_parse_tool_call— OK.Defaulting
None/empty to"default"parser is clear; error message enumerates available parsers.lib/llm/src/protocols/openai/chat_completions/jail.rs (1)
78-99: Helper factory looks goodCentralizing ChatChoiceStream construction reduces duplication and error‑prone field copying.
| let pass_through_choice = create_choice_stream( | ||
| content, | ||
| choice.delta.role, | ||
| choice.index, | ||
| choice.finish_reason, | ||
| choice.logprobs.clone(), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't move out of a borrowed choice; clone role and finish_reason
This will otherwise try to move fields from an immutably borrowed choice.
Apply this diff:
- let pass_through_choice = create_choice_stream(
- content,
- choice.delta.role,
- choice.index,
- choice.finish_reason,
- choice.logprobs.clone(),
- );
+ let pass_through_choice = create_choice_stream(
+ content,
+ choice.delta.role.clone(),
+ choice.index,
+ choice.finish_reason.clone(),
+ choice.logprobs.clone(),
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let pass_through_choice = create_choice_stream( | |
| content, | |
| choice.delta.role, | |
| choice.index, | |
| choice.finish_reason, | |
| choice.logprobs.clone(), | |
| ); | |
| let pass_through_choice = create_choice_stream( | |
| content, | |
| choice.delta.role.clone(), | |
| choice.index, | |
| choice.finish_reason.clone(), | |
| choice.logprobs.clone(), | |
| ); |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 134 to
140, the code is attempting to move fields out of an immutably borrowed
`choice`; change the arguments passed into create_choice_stream so you pass
clones for non-Copy fields (specifically clone `choice.delta.role` and
`choice.finish_reason`) instead of moving them, e.g. call
create_choice_stream(content, choice.delta.role.clone(), choice.index,
choice.finish_reason.clone(), choice.logprobs.clone()) so nothing is moved out
of the borrowed `choice`.
| let trailing_choice = create_choice_stream( | ||
| trailing, | ||
| choice.delta.role, | ||
| choice.index, | ||
| choice.finish_reason, | ||
| choice.logprobs.clone(), | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clone role and finish_reason here as well
Same issue as above in trailing emission.
Apply this diff:
- let trailing_choice = create_choice_stream(
- trailing,
- choice.delta.role,
- choice.index,
- choice.finish_reason,
- choice.logprobs.clone(),
- );
+ let trailing_choice = create_choice_stream(
+ trailing,
+ choice.delta.role.clone(),
+ choice.index,
+ choice.finish_reason.clone(),
+ choice.logprobs.clone(),
+ );📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let trailing_choice = create_choice_stream( | |
| trailing, | |
| choice.delta.role, | |
| choice.index, | |
| choice.finish_reason, | |
| choice.logprobs.clone(), | |
| ); | |
| let trailing_choice = create_choice_stream( | |
| trailing, | |
| choice.delta.role.clone(), | |
| choice.index, | |
| choice.finish_reason.clone(), | |
| choice.logprobs.clone(), | |
| ); |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 154 to
160, the trailing_choice creation passes choice.delta.role and
choice.finish_reason by reference (or moves) and does not clone them like
earlier choices do; update the call to clone the role and finish_reason (e.g.,
use choice.delta.role.clone() and choice.finish_reason.clone()) so the trailing
emitted choice owns its data and matches the handling used for other choice
emissions.
|
|
||
| let (should_end, split_pos) = | ||
| jail_stream.should_end_jail(&self.accumulated_content).await; | ||
| let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Await the async end‑detection (jailed path)
Missing .await here too.
Apply this diff:
- let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content);
+ let (should_end, split_pos) = jail_stream
+ .should_end_jail(&self.accumulated_content)
+ .await;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let (should_end, split_pos) = jail_stream.should_end_jail(&self.accumulated_content); | |
| let (should_end, split_pos) = jail_stream | |
| .should_end_jail(&self.accumulated_content) | |
| .await; |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around line 282, the
call to jail_stream.should_end_jail(&self.accumulated_content) is missing an
await; change it to await the async function (i.e., call .await on the returned
future) and update surrounding code if necessary to be inside an async context
so the result (should_end, split_pos) is obtained correctly.
| let unjailed_choice = | ||
| jail_stream.create_tool_call_choice(choice.index, jailed_part, choice); | ||
| Self::emit_tool_or_content(&mut emissions, unjailed_choice); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Await the async tool‑choice creation (jailed path)
Add .await.
Apply this diff:
- let unjailed_choice =
- jail_stream.create_tool_call_choice(choice.index, jailed_part, choice);
+ let unjailed_choice = jail_stream
+ .create_tool_call_choice(choice.index, jailed_part, choice)
+ .await;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let unjailed_choice = | |
| jail_stream.create_tool_call_choice(choice.index, jailed_part, choice); | |
| Self::emit_tool_or_content(&mut emissions, unjailed_choice); | |
| let unjailed_choice = jail_stream | |
| .create_tool_call_choice(choice.index, jailed_part, choice) | |
| .await; | |
| Self::emit_tool_or_content(&mut emissions, unjailed_choice); |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 293 to
296, the jailed path calls jail_stream.create_tool_call_choice(...) which is
async but is not awaited; change the call to await the future (add .await) so
you assign the resolved value to unjailed_choice before passing it to
Self::emit_tool_or_content; ensure the enclosing function is async or otherwise
able to await (and adjust error/Result handling if needed).
| let final_choice = jail_stream.create_tool_call_choice( | ||
| self.index, | ||
| &self.accumulated_content, | ||
| &dummy_choice, | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Await during finalize
create_tool_call_choice is async; missing .await.
Apply this diff:
- let final_choice = jail_stream.create_tool_call_choice(
- self.index,
- &self.accumulated_content,
- &dummy_choice,
- );
+ let final_choice = jail_stream
+ .create_tool_call_choice(self.index, &self.accumulated_content, &dummy_choice)
+ .await;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let final_choice = jail_stream.create_tool_call_choice( | |
| self.index, | |
| &self.accumulated_content, | |
| &dummy_choice, | |
| ); | |
| let final_choice = jail_stream | |
| .create_tool_call_choice(self.index, &self.accumulated_content, &dummy_choice) | |
| .await; |
🤖 Prompt for AI Agents
In lib/llm/src/protocols/openai/chat_completions/jail.rs around lines 327 to
331, the call to create_tool_call_choice is asynchronous but is invoked without
awaiting; update the call to await it (add .await) and ensure the enclosing
function is async (or otherwise able to await) and that any returned
Result/Value is handled or propagated accordingly so compilation and semantics
remain correct.
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes
Refactor