feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser #2999

zhongdaor-nv · 2025-09-10T21:49:33Z

This PR adds the GPT OSS frontend implementation based on commit 63d639a

Overview

This change ensures multi-turn conversations correctly handle second-round (and later) tool calls by initializing and coordinating the reasoning parser with the tool parser.

What changed

Properly handle second-round tool-calling in multi-turn conversations — subsequent tool calls are now processed correctly.

The tool parser and the reasoning parser can operate together during a conversation.

Fix an initialization bug where the reasoning parser was not being initialized properly.

Why

Previously only the first-round tool call was reliably processed: the reasoning parser was not initialized/kept in a way that allowed it to participate in later rounds, so follow-up tool calls could fail or be handled incorrectly. This change initializes and synchronizes the parser state so reasoning steps and tool parsing can interoperate across rounds.

Where should the reviewer start?

delta.rs

Related Issues

None

Summary by CodeRabbit

New Features
- Per-model runtime configuration now applies to chat responses, enabling dynamic, model-specific behavior during generation.
- Added a non-streaming Harmony tool-call parser for complete chunks, improving accuracy and performance; adopted as the default parsing path.
Bug Fixes
- Streaming reasoning now handles the “commentary” channel, recovering normal text more reliably.
Chores
- Formatting-only update to build configuration with no functional impact.

…e output - Add state JSON output to gpt_oss_parser for debugging purposes - Refactor harmony parser to use StreamableParser for better token processing - Add parse_tool_calls_harmony_chunk function for chunked parsing - Update module exports to include new harmony chunk parser - Improve error handling and token processing in harmony parser

copy-pr-bot · 2025-09-10T21:49:36Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-10T21:58:35Z

Walkthrough

Adds per-model runtime configuration propagation into chat completions (from watcher to preprocessor to delta generator), updates reasoning parser handling (new “commentary” channel path), and introduces a complete-chunk Harmony tool-call parser, wiring it through exports and dispatch. Cargo.toml has a formatting-only change. One new public method and a new public function are added.

Changes

Cohort / File(s)	Summary
Build config formatting `Cargo.toml`	Formatting-only; no semantic change to lto; removed trailing newline.
Runtime config propagation (LLM) `lib/llm/src/discovery/watcher.rs`, `lib/llm/src/preprocessor.rs`, `lib/llm/src/protocols/openai/chat_completions/delta.rs`	Watcher copies `runtime_config` from model entry into card; `OpenAIPreprocessor` now stores `runtime_config` and applies it; `DeltaGenerator` adds `pub fn set_runtime_config(...)` to update options and reconfigure `reasoning_parser`.
Reasoning parser (commentary channel) `lib/parsers/src/reasoning/gpt_oss_parser.rs`	Adds handling for “commentary” channel: decodes normal text from Harmony tokens; logs on missing encoding; retains existing “final”/default paths.
Harmony tool-call parsing (complete-chunk) + exports `lib/parsers/src/tool_calling/harmony/harmony_parser.rs`, `lib/parsers/src/tool_calling/harmony/mod.rs`, `lib/parsers/src/tool_calling/mod.rs`, `lib/parsers/src/tool_calling/parsers.rs`	Adds `parse_tool_calls_harmony_complete(text, config)` to parse full chunks into tool calls and normal text; re-exports it at module roots; switches dispatcher to call the new function. Note: function appears duplicated in file.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Watcher as ModelWatcher
  participant Preproc as OpenAIPreprocessor
  participant Delta as DeltaGenerator
  participant Reason as ReasoningParser

  Client->>Watcher: PUT model (card, entry)
  Watcher->>Watcher: card.runtime_config = entry.runtime_config (if present)
  Client->>Preproc: NvCreateChatCompletionRequest(card)
  Preproc->>Delta: set_runtime_config(card.runtime_config)
  Delta->>Delta: Update options.runtime_config<br/>Rebuild reasoning_parser
  Preproc->>Reason: Generate with updated parser
  Reason-->>Client: Streamed deltas / final

sequenceDiagram
  autonumber
  actor Client
  participant Parser as try_tool_call_parse (Harmony)
  participant Harmony as parse_tool_calls_harmony_complete
  participant Enc as HarmonyEncoding
  participant Msg as parse_messages_from_completion_tokens

  Client->>Parser: Input text (Harmony format)
  Parser->>Harmony: parse_tool_calls_harmony_complete(text, config)
  Harmony->>Enc: load_harmony_encoding(...)
  Enc-->>Harmony: tokenizer/encoding or error
  alt encoding loaded
    Harmony->>Msg: tokens -> messages
    Msg-->>Harmony: messages
    Harmony->>Harmony: Extract tool calls (assistant/commentary, functions.*)<br/>Collect normal_text (assistant/analysis)
    Harmony-->>Parser: (Vec<ToolCallResponse>, Option<normal_text>)
  else encoding/parsing error
    Harmony-->>Parser: ([], Some(original_text))
  end
  Parser-->>Client: Parsed tool calls + normal_text

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: parse normal text along with tool calls #2709 — Introduced tuple-based (tool_calls, normal_text) parsing flow that this PR adopts for Harmony complete parsing and dispatch.
feat: [vLLM] implement cli args for tool and reasoning parsers #2619 — Added runtime_config plumbing; closely related to this PR’s watcher→preprocessor→delta propagation and parser reconfiguration.
feat: added support for pythonic tool parser #2788 — Restructured tool-calling parsers and exports; overlaps with Harmony parser additions and re-exports here.

Poem

A rabbit taps keys with a whiskered grin,
Wiring configs from without to within.
Harmony hums, tools neatly appear—
Commentary whispers, now crisp and clear.
Parsers pirouette, deltas align—
Ship the carrots; the build is fine. 🥕✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title concisely highlights the primary frontend improvements (an enhanced Harmony tool-calling parser and reasoning parser) and aligns with the PR objective to add the GPT OSS frontend, making it clear and free of noise. It is specific and actionable for a reviewer scanning history. It does not call out other notable changes (for example, per-model runtime_config propagation and the duplicate parse_tool_calls_harmony_complete symbol), so consider surfacing those if they are important to reviewers.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	The PR description includes a clear Overview, a concise explanation of what changed and why, a reviewer start-point (delta.rs), and an autogenerated summary that together cover the template's intent. It does not use the exact "Details:" heading from the repository template and could more explicitly enumerate the key changed files and any related issue references. Because the required information is present and informative despite these minor heading differences, the description is mostly complete and suitable for review.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/parsers/src/reasoning/gpt_oss_parser.rs (1)

178-232: Handle decode errors; avoid unwrap() in streaming commentary path.

decode_utf8(...).unwrap() can panic on invalid tokens. Fall back gracefully and reuse the fetched tokens slice.

-                        let end_token_idx = self.parser.tokens().len();
-                        // using the harmony decode_utf8 to translate the tokens to text
-                        let generated_text = enc
-                            .tokenizer()
-                            .decode_utf8(&self.parser.tokens()[last_channel_toke_idx..end_token_idx])
-                            .unwrap();
-
-                        final_text = generated_text;
+                        let tokens = &tokens[last_channel_toke_idx..];
+                        match enc.tokenizer().decode_utf8(tokens) {
+                            Ok(generated_text) => {
+                                final_text = generated_text;
+                            }
+                            Err(e) => {
+                                tracing::debug!("decode_utf8 failed in commentary path: {e}");
+                                // Fall back to current raw content or original text
+                                final_text = if raw_content.is_empty() { _text.to_string() } else { raw_content };
+                            }
+                        }

Also consider capturing and reusing tokens = self.parser.tokens() to avoid repeated calls.

I can add a unit test covering the commentary path reconstruction if helpful.

🧹 Nitpick comments (9)

Cargo.toml (1)

84-84: No-op change; add trailing newline.

lto = true is unchanged semantically. Nit: ensure the file ends with a newline to avoid tooling diffs.
lib/llm/src/discovery/watcher.rs (1)
291-295: Propagate runtime_config in the Embeddings (Tokens+Embeddings) path too for consistency.

You override card.runtime_config only in the Tokens+(Chat|Completions) branch; the Tokens+Embeddings branch uses OpenAIPreprocessor::new(card.clone()) and will miss entry-provided overrides.

Apply this in the embeddings branch after obtaining card:
@@
         let Some(mut card) = card else {
             anyhow::bail!("Missing model deployment card for embedding model");
         };
 
+        // Ensure runtime_config is populated: prefer entry value if present
+        if let Some(rc) = model_entry.runtime_config.clone() {
+            card.runtime_config = rc;
+        }
lib/parsers/src/tool_calling/parsers.rs (1)
46-47: Short-circuit before full Harmony decode to avoid unnecessary work.

Try the lightweight start-token check first; fall back to complete parsing only when likely present.
-            let (results, normal_content) = parse_tool_calls_harmony_complete(message, &config.json)?;
+            if !detect_tool_call_start_harmony(message, &config.json) {
+                return Ok((vec![], Some(message.to_string())));
+            }
+            let (results, normal_content) =
+                parse_tool_calls_harmony_complete(message, &config.json)?;
             Ok((results, normal_content))
lib/llm/src/preprocessor.rs (1)
126-128: Comment vs. code mismatch.

The comment mentions “allow env override” but no env override is implemented here.
-        // Initialize runtime config; allow env override if not provided by backend/card
+        // Initialize runtime config from the ModelDeploymentCard
If env override is desired, I can wire it (e.g., via envy/figment) — say the word.
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (5)
168-175: Docstring return contract mismatches implementation.

The function never returns Err; it falls back to Ok(([], Some(original_text))) on failures. Update docs accordingly.

Apply:
-/// # Returns
-/// * `Ok((tool_calls, normal_text))` - Tuple containing extracted tool calls and any normal text
-/// * `Err(e)` - If parsing fails due to encoding or tokenization errors
+/// # Returns
+/// * `Ok((tool_calls, normal_text))` - Extracted tool calls and any normal text.
+///   On encoding or parsing failures, falls back to `Ok((vec![], Some(original_text)))` with a debug log.
189-191: Nit: duplicate slashes in comment.

Minor cleanup.
-    // // Encode the text into tokens using harmony encoding
+    // Encode the text into tokens using harmony encoding
236-247: Clarify the safety comment on JSON serialization.

serde_json::to_string works for any Value, not just Object.
-                        // Safety: `Value::Object` is always valid JSON, so serialization cannot fail
+                        // Note: Any serde_json::Value serializes to valid JSON; unwrap is safe here
206-259: Deduplicate extraction logic across streaming and complete parsers.

Both functions share near-identical message-walking and extraction. Factor into a private helper to keep behavior in sync.

Example (outside this hunk):
fn extract_tool_calls_and_text(messages: &[openai_harmony::chat::Message]) -> (Vec<ToolCallResponse>, String) { /* shared impl */ }
Then call it from both parse_tool_calls_harmony and parse_tool_calls_harmony_complete.

158-259: Missing unit tests for the complete-chunk parser.

Add tests mirroring existing ones for parse_tool_calls_harmony to prevent regressions.

Happy to draft tests like:

basic extraction with commentary call

fallback behavior on encoding/parse failure (returns original text)

multi-arg JSON and multiple tool calls

empty analysis content (ensures no panic)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcee4db and 63d639a.

📒 Files selected for processing (9)

Cargo.toml (1 hunks)
lib/llm/src/discovery/watcher.rs (1 hunks)
lib/llm/src/preprocessor.rs (3 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1 hunks)
lib/parsers/src/reasoning/gpt_oss_parser.rs (3 hunks)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2 hunks)
lib/parsers/src/tool_calling/harmony/mod.rs (1 hunks)
lib/parsers/src/tool_calling/mod.rs (1 hunks)
lib/parsers/src/tool_calling/parsers.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/llm/src/discovery/watcher.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/delta.rs

📚 Learning: 2025-08-25T22:04:45.205Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

lib/llm/src/preprocessor.rs

📚 Learning: 2025-08-22T20:10:09.345Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/parsers/src/reasoning/gpt_oss_parser.rs:31-37
Timestamp: 2025-08-22T20:10:09.345Z
Learning: StreamableParser from openai-harmony does not implement the Debug trait, which prevents storing it as a field in structs that derive Debug in lib/parsers/src/reasoning/gpt_oss_parser.rs.

Applied to files:

lib/parsers/src/tool_calling/harmony/harmony_parser.rs
lib/parsers/src/reasoning/gpt_oss_parser.rs

🧬 Code graph analysis (8)

lib/llm/src/discovery/watcher.rs (1)

lib/llm/src/local_model.rs (1)

card (331-333)

lib/parsers/src/tool_calling/harmony/mod.rs (1)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (3)

detect_tool_call_start_harmony (261-270)

parse_tool_calls_harmony (23-156)

parse_tool_calls_harmony_complete (176-259)

lib/llm/src/protocols/openai/chat_completions/delta.rs (3)

lib/llm/src/local_model.rs (2)

runtime_config (179-182)

runtime_config (374-376)

lib/bindings/python/src/dynamo/_core.pyi (1)

ModelRuntimeConfig (465-469)

lib/parsers/src/reasoning/mod.rs (1)

get_reasoning_parser_from_name (171-187)

lib/parsers/src/tool_calling/mod.rs (1)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

parse_tool_calls_harmony (23-156)

parse_tool_calls_harmony_complete (176-259)

lib/parsers/src/tool_calling/parsers.rs (1)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

detect_tool_call_start_harmony (261-270)

parse_tool_calls_harmony_complete (176-259)

lib/llm/src/preprocessor.rs (2)

lib/llm/src/local_model.rs (2)

runtime_config (179-182)

runtime_config (374-376)

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

new (81-121)

response_generator (21-30)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)

lib/parsers/src/reasoning/gpt_oss_parser.rs (2)

get_harmony_encoding (20-23)

new (39-58)

lib/parsers/src/reasoning/gpt_oss_parser.rs (1)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)

get_harmony_encoding (16-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (9)

lib/parsers/src/tool_calling/parsers.rs (2)

5-5: Import update looks good.

Switch to parse_tool_calls_harmony_complete aligns with the new API.

1-60: No duplicate definition found for parse_tool_calls_harmony_complete. rg returned a single match at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176.

lib/parsers/src/tool_calling/mod.rs (1)

14-14: Public re-exports OK.

Keeping both parse_tool_calls_harmony and parse_tool_calls_harmony_complete exported preserves back-compat.

lib/parsers/src/tool_calling/harmony/mod.rs (1)

7-9: Export addition looks correct.

New symbol is cleanly re-exported alongside existing ones.

lib/llm/src/preprocessor.rs (2)

98-100: Good: runtime_config carried on the preprocessor.

This enables per-model behavior without plumbing through every callsite.

590-592: Correct placement of runtime_config application.

Applying set_runtime_config before preprocessing ensures the generator is configured for reasoning/tool parsing.

Given chat-vs-text split, this change should affect only chat completions (as intended). If any completions path relies on reasoning, confirm separately.

lib/parsers/src/reasoning/gpt_oss_parser.rs (1)

55-58: Constructor looks fine.

Explicit construction with StreamableParser and structured Debug is sensible.

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

7-9: Import fix looks correct.

Re-introducing StreamableParser in the openai_harmony group aligns with its usage below.

176-186: No duplicate function — single definition and intentional re-export. Single pub fn found at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176 and a pub use re-export in lib/parsers/src/tool_calling/mod.rs:14; no duplicate or collision detected.

lib/llm/src/protocols/openai/chat_completions/delta.rs

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

zhongdaor-nv · 2025-09-11T01:22:09Z

result for tool calling round 1:
{ "id": "chatcmpl-82b82790-b817-4e4f-b681-64911f51a8d6", "choices": [ { "index": 0, "message": { "tool_calls": [ { "id": "call-1", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"format\":\"celsius\",\"location\":\"San Francisco, CA\"}" } } ], "role": "assistant", "reasoning_content": "User asks for weather. We should use the get_current_weather function. Use format \"celsius\" or \"fahrenheit\"? Maybe prefer Celsius? Many prefer Celsius. Just pick. Use San Francisco, CA. Use function." }, "finish_reason": "tool_calls" } ], "created": 1757544729, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 187, "completion_tokens": 77, "total_tokens": 264 } }

result for tool calling round 2:
{ "id": "chatcmpl-a0438530-61ca-4687-9748-cc976ef0d556", "choices": [ { "index": 0, "message": { "content": "The current weather in San Francisco is about **20 °C** (approximately 68 °F).", "role": "assistant", "reasoning_content": "We need to provide answer. The weather: 20°C approximately." }, "finish_reason": "stop" } ], "created": 1757544755, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 233, "completion_tokens": 47, "total_tokens": 280 } }

result for reasoning:
{ "id": "chatcmpl-2e8b403e-7624-4384-8160-d141031ef874", "choices": [ { "index": 0, "message": { "content": "### Fuel needed for a 20‑minute race\n\n| Item | Value | Units |\n|------|-------|-------|\n| **Lap time (qualifying)** | 2 min 04.317 s | sec |\n| **Race duration** | 20 min | 1200 s |\n| **Fuel consumption** | 2.73 L per lap | L/lap |\n| **Number of laps possible** | ⌊1200 s ÷ 124.317 s⌋ = 9 | laps |\n| **Fuel for those 9 laps** | 9 × 2.73 L | 24.57 L |\n\n---\n\n#### Why 9 laps?\n\n20 min = 1200 s. \nOne lap takes 2 min 4.317 s = 124.317 s. \n1200 / 124.317 ≈ 9.65. \nOnly whole laps count, so you can safely complete 9 laps and will stop before the 10th.\n\n#### Fuel recommendation\n\n- **Strict minimum**: 24.57 L. \n- **Typical race‑practice**: round up to a comfortable whole‑number buffer (e.g., 25 L or 26 L). \n- **Safety margin**: Add ~10 % extra to cover unforeseen pit‑stop or delay: \n 24.57 L × 1.10 ≈ 27 L.\n\n---\n\n**Answer:** For a 20‑minute race with a 2.73 L/lap fuel consumption and a 2:04.317 lap time, you’ll need at least **~24.6 L of fuel** – practically round it to **25 – 26 L** to stay safe.", "role": "assistant", "reasoning_content": "We need fuel calculation: given lap time? Qualifying time 2:04.317 presumably lap length time. But race duration 20 minutes: want number of laps that fit in 20 min: 20 min = 1200 seconds. 2:04.317 = 124.317 sec per lap. number of laps = floor(1200/124.317) = approx 9.65 so 9 laps. Fuel consumption each lap 2.73 L so total fuel: 9*2.73=24.57 liters. But maybe want extra for pit stops? They just ask \"how many liters of fuel to take\". Usually you need to consider fuel for whole race plus perhaps extra buffer. Might mention at least 24.57 liters, round up 25 liters, or 26 for buffer. Provide explanation." }, "finish_reason": "stop" } ], "created": 1757544769, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 236, "completion_tokens": 561, "total_tokens": 797 } }

lib/parsers/src/reasoning/gpt_oss_parser.rs

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

lib/llm/src/discovery/watcher.rs

lib/llm/src/preprocessor.rs

lib/llm/src/protocols/openai/chat_completions/delta.rs

lib/parsers/src/reasoning/gpt_oss_parser.rs

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

lib/parsers/src/tool_calling/parsers.rs

…ntend

Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

lib/parsers/src/reasoning/gpt_oss_parser.rs

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

lib/llm/src/protocols/openai/chat_completions/delta.rs

lib/parsers/src/reasoning/gpt_oss_parser.rs

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

ayushag-nv

lgtm !

zhongdaor-nv · 2025-09-18T23:03:53Z

/ok to test fdc9f0d

…ser and reasoning parser (#2999) Signed-off-by: zhongdaor <zhongdaor@nvidia.com>

zhongdaor-nv added 5 commits September 10, 2025 01:37

Update GPT OSS parser and related components

4aba7a0

tmp

de1a915

tmp

471f1cb

remove code for debugging

63d639a

zhongdaor-nv requested a review from a team as a code owner September 10, 2025 21:49

pull-request-size bot added the size/L label Sep 10, 2025

zhongdaor-nv changed the title ~~Add GPT OSS Frontend~~ feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser Sep 10, 2025

github-actions bot added the feat label Sep 10, 2025

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/protocols/openai/chat_completions/delta.rs Show resolved Hide resolved

lib/parsers/src/tool_calling/harmony/harmony_parser.rs Outdated Show resolved Hide resolved

zhongdaor-nv added 5 commits September 10, 2025 15:21

resolve coderabit comment

244d31c

cargo fmt

f50ed91

coderabbit

855fb29

fix unit test

97f8f65

Merge branch 'main' into zhongdaor/gpt-oss-frontend

bd56609

zhongdaor-nv added 2 commits September 10, 2025 23:09

Merge branch 'main' into zhongdaor/gpt-oss-frontend

33e7286

Merge branch 'main' into zhongdaor/gpt-oss-frontend

c9485d7