Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Sep 24, 2025

Overview:

  • optimizes some of the JailStream Code by moving repeated code to generic methods.
  • Move tool parser specific bits to tool parser library

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@ayushag-nv ayushag-nv requested a review from a team as a code owner September 24, 2025 04:36
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv ayushag-nv marked this pull request as draft September 24, 2025 04:36
@github-actions github-actions bot added the chore label Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv force-pushed the ayushag/jailstream-opt-v2 branch from 9abacc2 to ee4ef1e Compare September 24, 2025 05:04
@pull-request-size pull-request-size bot added size/M and removed size/L labels Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv added enhancement New feature or request and removed chore labels Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv force-pushed the ayushag/jailstream-opt-v2 branch from 7336d41 to 29e30f5 Compare September 24, 2025 05:52
@github-actions github-actions bot added the chore label Sep 24, 2025
@ayushag-nv ayushag-nv marked this pull request as ready for review September 24, 2025 05:52
@ayushag-nv ayushag-nv changed the title chore: jail stream optimizations chore: jail stream optimizations (v1) Sep 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Walkthrough

Refactors chat jail streaming to use a new helper for constructing ChatChoiceStream and delegates tool-call end detection to a shared parser utility. Adds per-parser end-position functions (JSON, Harmony, Pythonic), a dispatcher, and re-exports the end-position API. Removes an internal jail method and updates imports accordingly.

Changes

Cohort / File(s) Summary
Chat jail refactor
lib/llm/src/protocols/openai/chat_completions/jail.rs
Introduces create_choice_stream(...) helper and replaces inlined ChatChoiceStream constructions across emissions (prefix, trailing, pass-through, tool-calls). Removes internal find_tool_call_end_position usage and switches to re-exported parser function. Updates imports for find_tool_call_end_position and ChatChoiceLogprobs.
Parser end-position helpers (per backend)
lib/parsers/src/tool_calling/json/mod.rs, lib/parsers/src/tool_calling/harmony/mod.rs, lib/parsers/src/tool_calling/pythonic/mod.rs
Adds public helpers: find_tool_call_end_position_json(...), find_tool_call_end_position_harmony(...), find_tool_call_end_position_pythonic(...). JSON variant branches on parser string and config; Harmony/Pythonic return chunk.len().
Parser dispatcher and re-export
lib/parsers/src/tool_calling/parsers.rs, lib/parsers/src/tool_calling/mod.rs
Adds public dispatcher find_tool_call_end_position(chunk, parser_str) selecting JSON/Harmony/Pythonic helpers; defaults to full length for others/unknown. Re-exports find_tool_call_end_position from tool_calling module.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant ChatJail as Chat Jail Stream
  participant Parsers as tool_calling::find_tool_call_end_position
  participant JSON as json::find_tool_call_end_position_json
  participant Harmony as harmony::find_tool_call_end_position_harmony
  participant Pythonic as pythonic::find_tool_call_end_position_pythonic

  Client->>ChatJail: Stream chunk
  ChatJail->>Parsers: find_tool_call_end_position(chunk, parser_str)
  alt parser == "json/*"
    Parsers->>JSON: compute end position (tokens/brackets/config)
    JSON-->>Parsers: end_pos
  else parser == "harmony"
    Parsers->>Harmony: end by length
    Harmony-->>Parsers: end_pos
  else parser == "pythonic"
    Parsers->>Pythonic: end by length
    Pythonic-->>Parsers: end_pos
  else
    Parsers-->>Parsers: default to chunk.len()
  end
  Parsers-->>ChatJail: end_pos
  ChatJail->>ChatJail: create_choice_stream(index, role, ...)
  ChatJail-->>Client: Emit ChatChoiceStream delta
  note over ChatJail: Centralized construction via create_choice_stream
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

In streams I nibble bits so fine,
New paths decide where tools align.
A helper builds each choice with cheer—
hop, hop—consistent deltas here!
Parsers point to where to end,
and I, a rabbit, press “send.” 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains only the repository template headings with no substantive content in Overview, Details, or Where should the reviewer start, and the Related Issues field uses a placeholder ("#xxx"), so reviewers lack a summary of intent, a changelist, testing notes, and a clear starting point. Please complete the template: add an Overview summarizing goals and impact, fill Details with the specific code changes and rationale (e.g., jail.rs refactor and new parser end-position helpers), list files and key areas under "Where should the reviewer start," include test/validation steps and CI status, and replace the "#xxx" placeholder with the actual issue number or remove it.
Docstring Coverage ⚠️ Warning Docstring coverage is 53.85% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "chore: jail stream optimizations" is concise and accurately conveys the primary change in the changeset — refactoring and standardizing jail stream ChatChoiceStream construction and related optimizations — making it clear for a reviewer scanning history; it does not mention parser helper additions but still matches the main intent.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/llm/src/protocols/openai/chat_completions/jail.rs (1)

485-538: Do not reorder emissions by type; preserve original streaming order

Grouping emissions into tool/content, trailing, and pass-through and then emitting by category reorders tokens within the same incoming chunk and breaks strict streaming semantics.

  • Emit all_emissions in their original order (choice-order within the incoming response).
  • For each emission pick the metadata tuple to attach (switch metadata per-emission: preserved_metadata for tool/content/trailing when required; current_metadata for PassThrough) while preserving order.
  • For Packed mode, aggregate consecutive emissions that share identical metadata into a single packed response — do not group by type.

Location: lib/llm/src/protocols/openai/chat_completions/jail.rs (around lines 485–538)

🧹 Nitpick comments (6)
lib/llm/src/protocols/openai/chat_completions/jail.rs (3)

651-668: Avoid double parsing in should_end_jail()

You parse accumulated_content twice (early_exit already parsed, then parse again before calling find_tool_call_end_position). This adds latency and risks inconsistent results under non-deterministic parsers. Use the dispatcher directly once early_exit is true.

Apply:

-        } else if early_exit {
-            // For early exit, find where the complete tool call ends
-            if let Some(parser) = &self.tool_call_parser {
-                if let Ok((_, _)) =
-                    try_tool_call_parse_aggregate(accumulated_content, Some(parser)).await
-                {
-                    let split_pos = find_tool_call_end_position(accumulated_content, Some(parser));
-                    (true, split_pos)
-                } else {
-                    (false, accumulated_content.len())
-                }
-            } else {
-                (false, accumulated_content.len())
-            }
+        } else if early_exit {
+            // For early exit, find where the complete tool call ends
+            if let Some(parser) = &self.tool_call_parser {
+                let split_pos = find_tool_call_end_position(accumulated_content, Some(parser));
+                (true, split_pos)
+            } else {
+                (false, accumulated_content.len())
+            }

505-528: Comment mismatch: trailing emissions “always as individual chunks” is not guaranteed

emit_choice_emissions still respects emission_mode. In Packed mode, trailing emissions will be bundled, contradicting the comment.

  • Either update the comment to reflect behavior.
  • Or force SingleChoicePerChunk for trailing emissions.

Apply one:

-                        // Emit trailing content separately (always as individual chunks)
+                        // Emit trailing content separately (mode-dependent)

Or:

-                            let responses = self.emit_choice_emissions(trailing_emissions, chat_response, preserved_metadata);
+                            let saved_mode = self.emission_mode;
+                            let mut responses = Vec::new();
+                            // Force single-choice emission for trailing
+                            let tmp = JailedStream { emission_mode: EmissionMode::SingleChoicePerChunk, ..self.clone() };
+                            responses = tmp.emit_choice_emissions(trailing_emissions, chat_response, preserved_metadata);
+                            // saved_mode unused; shown for intent if you choose to refactor emit API

456-472: Multi-choice tool-call policy may diverge from prior “emit first tool-call choice only” design

Retrieved learning indicates prior jail logic intentionally emits only the first choice that contains tool calls when unjailing, dropping other accumulated choices. Current logic collects and emits tool calls for all choices in a chunk.

  • If that policy should apply here, filter tool_content_emissions to only the lowest index (or first encountered) containing tool calls, and drop the rest for that unjail event.

I can wire a minimal filter preserving pass-through and trailing emissions untouched.

lib/parsers/src/tool_calling/harmony/mod.rs (1)

11-13: Naive end-position (chunk.len()) is acceptable as a placeholder

Given Harmony uses JSON-like patterns, returning chunk.len() is fine for now. Consider documenting why end detection is not needed or add TODO if future heuristics become necessary.

lib/parsers/src/tool_calling/pythonic/mod.rs (1)

9-11: End-position default to chunk.len()

Same note as Harmony: acceptable baseline; consider a comment/TODO for future Pythonic-specific end detection if needed.

lib/parsers/src/tool_calling/json/mod.rs (1)

45-71: Cover missing parser keys and edge cases in JSON end-position finder

  • Include "llama3_json" and "deepseek_v3_1" in JSON cases. Both are present in get_tool_parser_map and likely terminate with a closing ']' for tool call arrays.
  • Consider supporting multiple end tokens (not just .first()) for hermes/nemotron_deci.
  • rfind(']') can match inside strings; acceptable heuristic, but note limitation.

Apply:

-    match parser {
-        "hermes" | "nemotron_deci" => {
+    match parser {
+        "hermes" | "nemotron_deci" => {
             if let Some(end_token) = config.tool_call_end_tokens.first() {
                 if let Some(pos) = chunk.find(end_token) {
                     pos + end_token.len()
                 } else {
                     chunk.len()
                 }
             } else {
                 chunk.len()
             }
         }
-        "mistral" | "phi4" => {
+        "mistral" | "phi4" | "llama3_json" | "deepseek_v3_1" => {
             if let Some(pos) = chunk.rfind(']') {
                 pos + 1
             } else {
                 chunk.len()
             }
         }
         _ => chunk.len(),
     }

Optionally, support multiple end tokens:

-        "hermes" | "nemotron_deci" => {
-            if let Some(end_token) = config.tool_call_end_tokens.first() {
-                if let Some(pos) = chunk.find(end_token) {
-                    pos + end_token.len()
-                } else {
-                    chunk.len()
-                }
-            } else {
-                chunk.len()
-            }
-        }
+        "hermes" | "nemotron_deci" => {
+            for end_token in &config.tool_call_end_tokens {
+                if !end_token.is_empty() {
+                    if let Some(pos) = chunk.find(end_token) {
+                        return pos + end_token.len();
+                    }
+                }
+            }
+            chunk.len()
+        }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9be2e9 and 29e30f5.

📒 Files selected for processing (6)
  • lib/llm/src/protocols/openai/chat_completions/jail.rs (10 hunks)
  • lib/parsers/src/tool_calling/harmony/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/json/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (2 hunks)
  • lib/parsers/src/tool_calling/pythonic/mod.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
PR: ai-dynamo/dynamo#2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/parsers/src/tool_calling/harmony/mod.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/parsers/src/tool_calling/mod.rs
  • lib/parsers/src/tool_calling/parsers.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T05:04:58.417Z
Learnt from: ayushag-nv
PR: ai-dynamo/dynamo#2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/jail.rs
🧬 Code graph analysis (3)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
lib/parsers/src/tool_calling/parsers.rs (3)
  • get_tool_parser_map (23-37)
  • detect_tool_call_start (98-125)
  • find_tool_call_end_position (127-155)
lib/parsers/src/tool_calling/tools.rs (1)
  • try_tool_call_parse_aggregate (10-42)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (4)
  • detect_and_parse_tool_call (72-96)
  • detect_tool_call_start (98-125)
  • find_tool_call_end_position (127-155)
  • try_tool_call_parse (43-69)
lib/parsers/src/tool_calling/parsers.rs (4)
lib/parsers/src/tool_calling/config.rs (2)
  • harmony (145-154)
  • pythonic (138-143)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
  • find_tool_call_end_position_harmony (11-13)
lib/parsers/src/tool_calling/json/mod.rs (1)
  • find_tool_call_end_position_json (45-71)
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
  • find_tool_call_end_position_pythonic (9-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (5)
lib/llm/src/protocols/openai/chat_completions/jail.rs (3)

77-99: Good abstraction: centralized ChatChoiceStream construction

Helper cleanly standardizes choice creation and removes duplication. No issues.


149-156: Consistent use of create_choice_stream across emission sites

Using the helper for prefix/trailing/pass-through/tool-call/finalization is consistent and improves maintainability.

Also applies to: 189-196, 221-228, 264-271, 309-316, 339-346, 703-710, 714-721


703-710: Role hard-coded to Assistant for tool-call emissions

Setting role to Some(Role::Assistant) is likely correct for OpenAI-compatible tool-call deltas; however, some providers stream tool calls with role None except the very first delta. Confirm downstream consumers expect role present here.

lib/parsers/src/tool_calling/mod.rs (1)

16-19: Publicly re-exporting find_tool_call_end_position is good API hygiene

Centralizes end-position helpers behind a single dispatcher. Looks good.

lib/parsers/src/tool_calling/parsers.rs (1)

5-15: LGTM: imports aligned with new end-position helpers

The import additions look correct and match the new dispatcher usage below.

Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv self-assigned this Sep 24, 2025
Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the find_tool_call_end_position wasn't handling harmony and thanks for fixing it along the way

@ayushag-nv
Copy link
Contributor Author

/ok to test abfec30

@ayushag-nv ayushag-nv enabled auto-merge (squash) September 24, 2025 06:49
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv
Copy link
Contributor Author

/ok to test 0343a26

@ayushag-nv ayushag-nv merged commit 2ae2010 into main Sep 24, 2025
17 of 18 checks passed
@ayushag-nv ayushag-nv deleted the ayushag/jailstream-opt-v2 branch September 24, 2025 07:38
athreesh pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Kyle H <kylhuang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore enhancement New feature or request size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants