-
Notifications
You must be signed in to change notification settings - Fork 690
feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser #2999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e output - Add state JSON output to gpt_oss_parser for debugging purposes - Refactor harmony parser to use StreamableParser for better token processing - Add parse_tool_calls_harmony_chunk function for chunked parsing - Update module exports to include new harmony chunk parser - Improve error handling and token processing in harmony parser
WalkthroughAdds per-model runtime configuration propagation into chat completions (from watcher to preprocessor to delta generator), updates reasoning parser handling (new “commentary” channel path), and introduces a complete-chunk Harmony tool-call parser, wiring it through exports and dispatch. Cargo.toml has a formatting-only change. One new public method and a new public function are added. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant Watcher as ModelWatcher
participant Preproc as OpenAIPreprocessor
participant Delta as DeltaGenerator
participant Reason as ReasoningParser
Client->>Watcher: PUT model (card, entry)
Watcher->>Watcher: card.runtime_config = entry.runtime_config (if present)
Client->>Preproc: NvCreateChatCompletionRequest(card)
Preproc->>Delta: set_runtime_config(card.runtime_config)
Delta->>Delta: Update options.runtime_config<br/>Rebuild reasoning_parser
Preproc->>Reason: Generate with updated parser
Reason-->>Client: Streamed deltas / final
sequenceDiagram
autonumber
actor Client
participant Parser as try_tool_call_parse (Harmony)
participant Harmony as parse_tool_calls_harmony_complete
participant Enc as HarmonyEncoding
participant Msg as parse_messages_from_completion_tokens
Client->>Parser: Input text (Harmony format)
Parser->>Harmony: parse_tool_calls_harmony_complete(text, config)
Harmony->>Enc: load_harmony_encoding(...)
Enc-->>Harmony: tokenizer/encoding or error
alt encoding loaded
Harmony->>Msg: tokens -> messages
Msg-->>Harmony: messages
Harmony->>Harmony: Extract tool calls (assistant/commentary, functions.*)<br/>Collect normal_text (assistant/analysis)
Harmony-->>Parser: (Vec<ToolCallResponse>, Option<normal_text>)
else encoding/parsing error
Harmony-->>Parser: ([], Some(original_text))
end
Parser-->>Client: Parsed tool calls + normal_text
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Pre-merge checks (3 passed)✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (1)
178-232: Handle decode errors; avoid unwrap() in streaming commentary path.decode_utf8(...).unwrap() can panic on invalid tokens. Fall back gracefully and reuse the fetched tokens slice.
- let end_token_idx = self.parser.tokens().len(); - // using the harmony decode_utf8 to translate the tokens to text - let generated_text = enc - .tokenizer() - .decode_utf8(&self.parser.tokens()[last_channel_toke_idx..end_token_idx]) - .unwrap(); - - final_text = generated_text; + let tokens = &tokens[last_channel_toke_idx..]; + match enc.tokenizer().decode_utf8(tokens) { + Ok(generated_text) => { + final_text = generated_text; + } + Err(e) => { + tracing::debug!("decode_utf8 failed in commentary path: {e}"); + // Fall back to current raw content or original text + final_text = if raw_content.is_empty() { _text.to_string() } else { raw_content }; + } + }Also consider capturing and reusing tokens = self.parser.tokens() to avoid repeated calls.
I can add a unit test covering the commentary path reconstruction if helpful.
🧹 Nitpick comments (9)
Cargo.toml (1)
84-84: No-op change; add trailing newline.lto = true is unchanged semantically. Nit: ensure the file ends with a newline to avoid tooling diffs.
lib/llm/src/discovery/watcher.rs (1)
291-295: Propagate runtime_config in the Embeddings (Tokens+Embeddings) path too for consistency.You override card.runtime_config only in the Tokens+(Chat|Completions) branch; the Tokens+Embeddings branch uses OpenAIPreprocessor::new(card.clone()) and will miss entry-provided overrides.
Apply this in the embeddings branch after obtaining card:
@@ let Some(mut card) = card else { anyhow::bail!("Missing model deployment card for embedding model"); }; + // Ensure runtime_config is populated: prefer entry value if present + if let Some(rc) = model_entry.runtime_config.clone() { + card.runtime_config = rc; + }lib/parsers/src/tool_calling/parsers.rs (1)
46-47: Short-circuit before full Harmony decode to avoid unnecessary work.Try the lightweight start-token check first; fall back to complete parsing only when likely present.
- let (results, normal_content) = parse_tool_calls_harmony_complete(message, &config.json)?; + if !detect_tool_call_start_harmony(message, &config.json) { + return Ok((vec![], Some(message.to_string()))); + } + let (results, normal_content) = + parse_tool_calls_harmony_complete(message, &config.json)?; Ok((results, normal_content))lib/llm/src/preprocessor.rs (1)
126-128: Comment vs. code mismatch.The comment mentions “allow env override” but no env override is implemented here.
- // Initialize runtime config; allow env override if not provided by backend/card + // Initialize runtime config from the ModelDeploymentCardIf env override is desired, I can wire it (e.g., via envy/figment) — say the word.
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (5)
168-175: Docstring return contract mismatches implementation.The function never returns Err; it falls back to Ok(([], Some(original_text))) on failures. Update docs accordingly.
Apply:
-/// # Returns -/// * `Ok((tool_calls, normal_text))` - Tuple containing extracted tool calls and any normal text -/// * `Err(e)` - If parsing fails due to encoding or tokenization errors +/// # Returns +/// * `Ok((tool_calls, normal_text))` - Extracted tool calls and any normal text. +/// On encoding or parsing failures, falls back to `Ok((vec![], Some(original_text)))` with a debug log.
189-191: Nit: duplicate slashes in comment.Minor cleanup.
- // // Encode the text into tokens using harmony encoding + // Encode the text into tokens using harmony encoding
236-247: Clarify the safety comment on JSON serialization.serde_json::to_string works for any Value, not just Object.
- // Safety: `Value::Object` is always valid JSON, so serialization cannot fail + // Note: Any serde_json::Value serializes to valid JSON; unwrap is safe here
206-259: Deduplicate extraction logic across streaming and complete parsers.Both functions share near-identical message-walking and extraction. Factor into a private helper to keep behavior in sync.
Example (outside this hunk):
fn extract_tool_calls_and_text(messages: &[openai_harmony::chat::Message]) -> (Vec<ToolCallResponse>, String) { /* shared impl */ }Then call it from both parse_tool_calls_harmony and parse_tool_calls_harmony_complete.
158-259: Missing unit tests for the complete-chunk parser.Add tests mirroring existing ones for parse_tool_calls_harmony to prevent regressions.
Happy to draft tests like:
- basic extraction with commentary call
- fallback behavior on encoding/parse failure (returns original text)
- multi-arg JSON and multiple tool calls
- empty analysis content (ensures no panic)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Cargo.toml(1 hunks)lib/llm/src/discovery/watcher.rs(1 hunks)lib/llm/src/preprocessor.rs(3 hunks)lib/llm/src/protocols/openai/chat_completions/delta.rs(1 hunks)lib/parsers/src/reasoning/gpt_oss_parser.rs(3 hunks)lib/parsers/src/tool_calling/harmony/harmony_parser.rs(2 hunks)lib/parsers/src/tool_calling/harmony/mod.rs(1 hunks)lib/parsers/src/tool_calling/mod.rs(1 hunks)lib/parsers/src/tool_calling/parsers.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/llm/src/discovery/watcher.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/src/protocols/openai/chat_completions/delta.rs
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.
Applied to files:
lib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T20:10:09.345Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/parsers/src/reasoning/gpt_oss_parser.rs:31-37
Timestamp: 2025-08-22T20:10:09.345Z
Learning: StreamableParser from openai-harmony does not implement the Debug trait, which prevents storing it as a field in structs that derive Debug in lib/parsers/src/reasoning/gpt_oss_parser.rs.
Applied to files:
lib/parsers/src/tool_calling/harmony/harmony_parser.rslib/parsers/src/reasoning/gpt_oss_parser.rs
🧬 Code graph analysis (8)
lib/llm/src/discovery/watcher.rs (1)
lib/llm/src/local_model.rs (1)
card(331-333)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (3)
detect_tool_call_start_harmony(261-270)parse_tool_calls_harmony(23-156)parse_tool_calls_harmony_complete(176-259)
lib/llm/src/protocols/openai/chat_completions/delta.rs (3)
lib/llm/src/local_model.rs (2)
runtime_config(179-182)runtime_config(374-376)lib/bindings/python/src/dynamo/_core.pyi (1)
ModelRuntimeConfig(465-469)lib/parsers/src/reasoning/mod.rs (1)
get_reasoning_parser_from_name(171-187)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
parse_tool_calls_harmony(23-156)parse_tool_calls_harmony_complete(176-259)
lib/parsers/src/tool_calling/parsers.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
detect_tool_call_start_harmony(261-270)parse_tool_calls_harmony_complete(176-259)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/local_model.rs (2)
runtime_config(179-182)runtime_config(374-376)lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
new(81-121)response_generator(21-30)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (2)
get_harmony_encoding(20-23)new(39-58)
lib/parsers/src/reasoning/gpt_oss_parser.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
get_harmony_encoding(16-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (9)
lib/parsers/src/tool_calling/parsers.rs (2)
5-5: Import update looks good.Switch to parse_tool_calls_harmony_complete aligns with the new API.
1-60: No duplicate definition found for parse_tool_calls_harmony_complete. rg returned a single match at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176.lib/parsers/src/tool_calling/mod.rs (1)
14-14: Public re-exports OK.Keeping both parse_tool_calls_harmony and parse_tool_calls_harmony_complete exported preserves back-compat.
lib/parsers/src/tool_calling/harmony/mod.rs (1)
7-9: Export addition looks correct.New symbol is cleanly re-exported alongside existing ones.
lib/llm/src/preprocessor.rs (2)
98-100: Good: runtime_config carried on the preprocessor.This enables per-model behavior without plumbing through every callsite.
590-592: Correct placement of runtime_config application.Applying set_runtime_config before preprocessing ensures the generator is configured for reasoning/tool parsing.
Given chat-vs-text split, this change should affect only chat completions (as intended). If any completions path relies on reasoning, confirm separately.
lib/parsers/src/reasoning/gpt_oss_parser.rs (1)
55-58: Constructor looks fine.Explicit construction with StreamableParser and structured Debug is sensible.
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
7-9: Import fix looks correct.Re-introducing StreamableParser in the openai_harmony group aligns with its usage below.
176-186: No duplicate function — single definition and intentional re-export. Single pub fn found at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176 and a pub use re-export in lib/parsers/src/tool_calling/mod.rs:14; no duplicate or collision detected.
|
result for tool calling round 1: result for tool calling round 2: result for reasoning: |
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
ayushag-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
|
/ok to test fdc9f0d |
…ser and reasoning parser (#2999) Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
This PR adds the GPT OSS frontend implementation based on commit 63d639a
Overview
This change ensures multi-turn conversations correctly handle second-round (and later) tool calls by initializing and coordinating the reasoning parser with the tool parser.
What changed
Properly handle second-round tool-calling in multi-turn conversations — subsequent tool calls are now processed correctly.
The tool parser and the reasoning parser can operate together during a conversation.
Fix an initialization bug where the reasoning parser was not being initialized properly.
Why
Previously only the first-round tool call was reliably processed: the reasoning parser was not initialized/kept in a way that allowed it to participate in later rounds, so follow-up tool calls could fail or be handled incorrectly. This change initializes and synchronizes the parser state so reasoning steps and tool parsing can interoperate across rounds.
Where should the reviewer start?
delta.rs
Related Issues
None
Summary by CodeRabbit
New Features
Bug Fixes
Chores