Skip to content

Conversation

@zhongdaor-nv
Copy link
Contributor

@zhongdaor-nv zhongdaor-nv commented Sep 10, 2025

This PR adds the GPT OSS frontend implementation based on commit 63d639a

Overview

This change ensures multi-turn conversations correctly handle second-round (and later) tool calls by initializing and coordinating the reasoning parser with the tool parser.

What changed

Properly handle second-round tool-calling in multi-turn conversations — subsequent tool calls are now processed correctly.

The tool parser and the reasoning parser can operate together during a conversation.

Fix an initialization bug where the reasoning parser was not being initialized properly.

Why

Previously only the first-round tool call was reliably processed: the reasoning parser was not initialized/kept in a way that allowed it to participate in later rounds, so follow-up tool calls could fail or be handled incorrectly. This change initializes and synchronizes the parser state so reasoning steps and tool parsing can interoperate across rounds.

Where should the reviewer start?

delta.rs

Related Issues

None

Summary by CodeRabbit

  • New Features

    • Per-model runtime configuration now applies to chat responses, enabling dynamic, model-specific behavior during generation.
    • Added a non-streaming Harmony tool-call parser for complete chunks, improving accuracy and performance; adopted as the default parsing path.
  • Bug Fixes

    • Streaming reasoning now handles the “commentary” channel, recovering normal text more reliably.
  • Chores

    • Formatting-only update to build configuration with no functional impact.

…e output

- Add state JSON output to gpt_oss_parser for debugging purposes
- Refactor harmony parser to use StreamableParser for better token processing
- Add parse_tool_calls_harmony_chunk function for chunked parsing
- Update module exports to include new harmony chunk parser
- Improve error handling and token processing in harmony parser
@zhongdaor-nv zhongdaor-nv requested a review from a team as a code owner September 10, 2025 21:49
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zhongdaor-nv zhongdaor-nv changed the title Add GPT OSS Frontend feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser Sep 10, 2025
@github-actions github-actions bot added the feat label Sep 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adds per-model runtime configuration propagation into chat completions (from watcher to preprocessor to delta generator), updates reasoning parser handling (new “commentary” channel path), and introduces a complete-chunk Harmony tool-call parser, wiring it through exports and dispatch. Cargo.toml has a formatting-only change. One new public method and a new public function are added.

Changes

Cohort / File(s) Summary
Build config formatting
Cargo.toml
Formatting-only; no semantic change to lto; removed trailing newline.
Runtime config propagation (LLM)
lib/llm/src/discovery/watcher.rs, lib/llm/src/preprocessor.rs, lib/llm/src/protocols/openai/chat_completions/delta.rs
Watcher copies runtime_config from model entry into card; OpenAIPreprocessor now stores runtime_config and applies it; DeltaGenerator adds pub fn set_runtime_config(...) to update options and reconfigure reasoning_parser.
Reasoning parser (commentary channel)
lib/parsers/src/reasoning/gpt_oss_parser.rs
Adds handling for “commentary” channel: decodes normal text from Harmony tokens; logs on missing encoding; retains existing “final”/default paths.
Harmony tool-call parsing (complete-chunk) + exports
lib/parsers/src/tool_calling/harmony/harmony_parser.rs, lib/parsers/src/tool_calling/harmony/mod.rs, lib/parsers/src/tool_calling/mod.rs, lib/parsers/src/tool_calling/parsers.rs
Adds parse_tool_calls_harmony_complete(text, config) to parse full chunks into tool calls and normal text; re-exports it at module roots; switches dispatcher to call the new function. Note: function appears duplicated in file.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Watcher as ModelWatcher
  participant Preproc as OpenAIPreprocessor
  participant Delta as DeltaGenerator
  participant Reason as ReasoningParser

  Client->>Watcher: PUT model (card, entry)
  Watcher->>Watcher: card.runtime_config = entry.runtime_config (if present)
  Client->>Preproc: NvCreateChatCompletionRequest(card)
  Preproc->>Delta: set_runtime_config(card.runtime_config)
  Delta->>Delta: Update options.runtime_config<br/>Rebuild reasoning_parser
  Preproc->>Reason: Generate with updated parser
  Reason-->>Client: Streamed deltas / final
Loading
sequenceDiagram
  autonumber
  actor Client
  participant Parser as try_tool_call_parse (Harmony)
  participant Harmony as parse_tool_calls_harmony_complete
  participant Enc as HarmonyEncoding
  participant Msg as parse_messages_from_completion_tokens

  Client->>Parser: Input text (Harmony format)
  Parser->>Harmony: parse_tool_calls_harmony_complete(text, config)
  Harmony->>Enc: load_harmony_encoding(...)
  Enc-->>Harmony: tokenizer/encoding or error
  alt encoding loaded
    Harmony->>Msg: tokens -> messages
    Msg-->>Harmony: messages
    Harmony->>Harmony: Extract tool calls (assistant/commentary, functions.*)<br/>Collect normal_text (assistant/analysis)
    Harmony-->>Parser: (Vec<ToolCallResponse>, Option<normal_text>)
  else encoding/parsing error
    Harmony-->>Parser: ([], Some(original_text))
  end
  Parser-->>Client: Parsed tool calls + normal_text
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit taps keys with a whiskered grin,
Wiring configs from without to within.
Harmony hums, tools neatly appear—
Commentary whispers, now crisp and clear.
Parsers pirouette, deltas align—
Ship the carrots; the build is fine. 🥕✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely highlights the primary frontend improvements (an enhanced Harmony tool-calling parser and reasoning parser) and aligns with the PR objective to add the GPT OSS frontend, making it clear and free of noise. It is specific and actionable for a reviewer scanning history. It does not call out other notable changes (for example, per-model runtime_config propagation and the duplicate parse_tool_calls_harmony_complete symbol), so consider surfacing those if they are important to reviewers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed The PR description includes a clear Overview, a concise explanation of what changed and why, a reviewer start-point (delta.rs), and an autogenerated summary that together cover the template's intent. It does not use the exact "Details:" heading from the repository template and could more explicitly enumerate the key changed files and any related issue references. Because the required information is present and informative despite these minor heading differences, the description is mostly complete and suitable for review.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (1)

178-232: Handle decode errors; avoid unwrap() in streaming commentary path.

decode_utf8(...).unwrap() can panic on invalid tokens. Fall back gracefully and reuse the fetched tokens slice.

-                        let end_token_idx = self.parser.tokens().len();
-                        // using the harmony decode_utf8 to translate the tokens to text
-                        let generated_text = enc
-                            .tokenizer()
-                            .decode_utf8(&self.parser.tokens()[last_channel_toke_idx..end_token_idx])
-                            .unwrap();
-
-                        final_text = generated_text;
+                        let tokens = &tokens[last_channel_toke_idx..];
+                        match enc.tokenizer().decode_utf8(tokens) {
+                            Ok(generated_text) => {
+                                final_text = generated_text;
+                            }
+                            Err(e) => {
+                                tracing::debug!("decode_utf8 failed in commentary path: {e}");
+                                // Fall back to current raw content or original text
+                                final_text = if raw_content.is_empty() { _text.to_string() } else { raw_content };
+                            }
+                        }

Also consider capturing and reusing tokens = self.parser.tokens() to avoid repeated calls.

I can add a unit test covering the commentary path reconstruction if helpful.

🧹 Nitpick comments (9)
Cargo.toml (1)

84-84: No-op change; add trailing newline.

lto = true is unchanged semantically. Nit: ensure the file ends with a newline to avoid tooling diffs.

lib/llm/src/discovery/watcher.rs (1)

291-295: Propagate runtime_config in the Embeddings (Tokens+Embeddings) path too for consistency.

You override card.runtime_config only in the Tokens+(Chat|Completions) branch; the Tokens+Embeddings branch uses OpenAIPreprocessor::new(card.clone()) and will miss entry-provided overrides.

Apply this in the embeddings branch after obtaining card:

@@
         let Some(mut card) = card else {
             anyhow::bail!("Missing model deployment card for embedding model");
         };
 
+        // Ensure runtime_config is populated: prefer entry value if present
+        if let Some(rc) = model_entry.runtime_config.clone() {
+            card.runtime_config = rc;
+        }
lib/parsers/src/tool_calling/parsers.rs (1)

46-47: Short-circuit before full Harmony decode to avoid unnecessary work.

Try the lightweight start-token check first; fall back to complete parsing only when likely present.

-            let (results, normal_content) = parse_tool_calls_harmony_complete(message, &config.json)?;
+            if !detect_tool_call_start_harmony(message, &config.json) {
+                return Ok((vec![], Some(message.to_string())));
+            }
+            let (results, normal_content) =
+                parse_tool_calls_harmony_complete(message, &config.json)?;
             Ok((results, normal_content))
lib/llm/src/preprocessor.rs (1)

126-128: Comment vs. code mismatch.

The comment mentions “allow env override” but no env override is implemented here.

-        // Initialize runtime config; allow env override if not provided by backend/card
+        // Initialize runtime config from the ModelDeploymentCard

If env override is desired, I can wire it (e.g., via envy/figment) — say the word.

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (5)

168-175: Docstring return contract mismatches implementation.

The function never returns Err; it falls back to Ok(([], Some(original_text))) on failures. Update docs accordingly.

Apply:

-/// # Returns
-/// * `Ok((tool_calls, normal_text))` - Tuple containing extracted tool calls and any normal text
-/// * `Err(e)` - If parsing fails due to encoding or tokenization errors
+/// # Returns
+/// * `Ok((tool_calls, normal_text))` - Extracted tool calls and any normal text.
+///   On encoding or parsing failures, falls back to `Ok((vec![], Some(original_text)))` with a debug log.

189-191: Nit: duplicate slashes in comment.

Minor cleanup.

-    // // Encode the text into tokens using harmony encoding
+    // Encode the text into tokens using harmony encoding

236-247: Clarify the safety comment on JSON serialization.

serde_json::to_string works for any Value, not just Object.

-                        // Safety: `Value::Object` is always valid JSON, so serialization cannot fail
+                        // Note: Any serde_json::Value serializes to valid JSON; unwrap is safe here

206-259: Deduplicate extraction logic across streaming and complete parsers.

Both functions share near-identical message-walking and extraction. Factor into a private helper to keep behavior in sync.

Example (outside this hunk):

fn extract_tool_calls_and_text(messages: &[openai_harmony::chat::Message]) -> (Vec<ToolCallResponse>, String) { /* shared impl */ }

Then call it from both parse_tool_calls_harmony and parse_tool_calls_harmony_complete.


158-259: Missing unit tests for the complete-chunk parser.

Add tests mirroring existing ones for parse_tool_calls_harmony to prevent regressions.

Happy to draft tests like:

  • basic extraction with commentary call
  • fallback behavior on encoding/parse failure (returns original text)
  • multi-arg JSON and multiple tool calls
  • empty analysis content (ensures no panic)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcee4db and 63d639a.

📒 Files selected for processing (9)
  • Cargo.toml (1 hunks)
  • lib/llm/src/discovery/watcher.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (3 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (1 hunks)
  • lib/parsers/src/reasoning/gpt_oss_parser.rs (3 hunks)
  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2 hunks)
  • lib/parsers/src/tool_calling/harmony/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/llm/src/discovery/watcher.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/delta.rs
📚 Learning: 2025-08-25T22:04:45.205Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2700
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:19-28
Timestamp: 2025-08-25T22:04:45.205Z
Learning: The response_generator() method exists on multiple request types in the codebase: NvCreateChatCompletionRequest (for chat completions) and NvCreateCompletionRequest (for text completions). When making signature changes, it's important to distinguish between these different object types as they have separate implementations and call sites.

Applied to files:

  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T20:10:09.345Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/parsers/src/reasoning/gpt_oss_parser.rs:31-37
Timestamp: 2025-08-22T20:10:09.345Z
Learning: StreamableParser from openai-harmony does not implement the Debug trait, which prevents storing it as a field in structs that derive Debug in lib/parsers/src/reasoning/gpt_oss_parser.rs.

Applied to files:

  • lib/parsers/src/tool_calling/harmony/harmony_parser.rs
  • lib/parsers/src/reasoning/gpt_oss_parser.rs
🧬 Code graph analysis (8)
lib/llm/src/discovery/watcher.rs (1)
lib/llm/src/local_model.rs (1)
  • card (331-333)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (3)
  • detect_tool_call_start_harmony (261-270)
  • parse_tool_calls_harmony (23-156)
  • parse_tool_calls_harmony_complete (176-259)
lib/llm/src/protocols/openai/chat_completions/delta.rs (3)
lib/llm/src/local_model.rs (2)
  • runtime_config (179-182)
  • runtime_config (374-376)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • ModelRuntimeConfig (465-469)
lib/parsers/src/reasoning/mod.rs (1)
  • get_reasoning_parser_from_name (171-187)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
  • parse_tool_calls_harmony (23-156)
  • parse_tool_calls_harmony_complete (176-259)
lib/parsers/src/tool_calling/parsers.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
  • detect_tool_call_start_harmony (261-270)
  • parse_tool_calls_harmony_complete (176-259)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/local_model.rs (2)
  • runtime_config (179-182)
  • runtime_config (374-376)
lib/llm/src/protocols/openai/chat_completions/delta.rs (2)
  • new (81-121)
  • response_generator (21-30)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
lib/parsers/src/reasoning/gpt_oss_parser.rs (2)
  • get_harmony_encoding (20-23)
  • new (39-58)
lib/parsers/src/reasoning/gpt_oss_parser.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
  • get_harmony_encoding (16-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (9)
lib/parsers/src/tool_calling/parsers.rs (2)

5-5: Import update looks good.

Switch to parse_tool_calls_harmony_complete aligns with the new API.


1-60: No duplicate definition found for parse_tool_calls_harmony_complete. rg returned a single match at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176.

lib/parsers/src/tool_calling/mod.rs (1)

14-14: Public re-exports OK.

Keeping both parse_tool_calls_harmony and parse_tool_calls_harmony_complete exported preserves back-compat.

lib/parsers/src/tool_calling/harmony/mod.rs (1)

7-9: Export addition looks correct.

New symbol is cleanly re-exported alongside existing ones.

lib/llm/src/preprocessor.rs (2)

98-100: Good: runtime_config carried on the preprocessor.

This enables per-model behavior without plumbing through every callsite.


590-592: Correct placement of runtime_config application.

Applying set_runtime_config before preprocessing ensures the generator is configured for reasoning/tool parsing.

Given chat-vs-text split, this change should affect only chat completions (as intended). If any completions path relies on reasoning, confirm separately.

lib/parsers/src/reasoning/gpt_oss_parser.rs (1)

55-58: Constructor looks fine.

Explicit construction with StreamableParser and structured Debug is sensible.

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

7-9: Import fix looks correct.

Re-introducing StreamableParser in the openai_harmony group aligns with its usage below.


176-186: No duplicate function — single definition and intentional re-export. Single pub fn found at lib/parsers/src/tool_calling/harmony/harmony_parser.rs:176 and a pub use re-export in lib/parsers/src/tool_calling/mod.rs:14; no duplicate or collision detected.

@zhongdaor-nv
Copy link
Contributor Author

result for tool calling round 1:
{ "id": "chatcmpl-82b82790-b817-4e4f-b681-64911f51a8d6", "choices": [ { "index": 0, "message": { "tool_calls": [ { "id": "call-1", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"format\":\"celsius\",\"location\":\"San Francisco, CA\"}" } } ], "role": "assistant", "reasoning_content": "User asks for weather. We should use the get_current_weather function. Use format \"celsius\" or \"fahrenheit\"? Maybe prefer Celsius? Many prefer Celsius. Just pick. Use San Francisco, CA. Use function." }, "finish_reason": "tool_calls" } ], "created": 1757544729, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 187, "completion_tokens": 77, "total_tokens": 264 } }

result for tool calling round 2:
{ "id": "chatcmpl-a0438530-61ca-4687-9748-cc976ef0d556", "choices": [ { "index": 0, "message": { "content": "The current weather in San Francisco is about **20 °C** (approximately 68 °F).", "role": "assistant", "reasoning_content": "We need to provide answer. The weather: 20°C approximately." }, "finish_reason": "stop" } ], "created": 1757544755, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 233, "completion_tokens": 47, "total_tokens": 280 } }

result for reasoning:
{ "id": "chatcmpl-2e8b403e-7624-4384-8160-d141031ef874", "choices": [ { "index": 0, "message": { "content": "### Fuel needed for a 20‑minute race\n\n| Item | Value | Units |\n|------|-------|-------|\n| **Lap time (qualifying)** | 2 min 04.317 s | sec |\n| **Race duration** | 20 min | 1200 s |\n| **Fuel consumption** | 2.73 L per lap | L/lap |\n| **Number of laps possible** | ⌊1200 s ÷ 124.317 s⌋ = 9 | laps |\n| **Fuel for those 9 laps** | 9 × 2.73 L | 24.57 L |\n\n---\n\n#### Why 9 laps?\n\n20 min = 1200 s. \nOne lap takes 2 min 4.317 s = 124.317 s. \n1200 / 124.317 ≈ 9.65. \nOnly whole laps count, so you can safely complete 9 laps and will stop before the 10th.\n\n#### Fuel recommendation\n\n- **Strict minimum**: 24.57 L. \n- **Typical race‑practice**: round up to a comfortable whole‑number buffer (e.g., 25 L or 26 L). \n- **Safety margin**: Add ~10 % extra to cover unforeseen pit‑stop or delay: \n 24.57 L × 1.10 ≈ 27 L.\n\n---\n\n**Answer:** For a 20‑minute race with a 2.73 L/lap fuel consumption and a 2:04.317 lap time, you’ll need at least **~24.6 L of fuel** – practically round it to **25 – 26 L** to stay safe.", "role": "assistant", "reasoning_content": "We need fuel calculation: given lap time? Qualifying time 2:04.317 presumably lap length time. But race duration 20 minutes: want number of laps that fit in 20 min: 20 min = 1200 seconds. 2:04.317 = 124.317 sec per lap. number of laps = floor(1200/124.317) = approx 9.65 so 9 laps. Fuel consumption each lap 2.73 L so total fuel: 9*2.73=24.57 liters. But maybe want extra for pit stops? They just ask \"how many liters of fuel to take\". Usually you need to consider fuel for whole race plus perhaps extra buffer. Might mention at least 24.57 liters, round up 25 liters, or 26 for buffer. Provide explanation." }, "finish_reason": "stop" } ], "created": 1757544769, "model": "openai/gpt-oss-20b", "object": "chat.completion", "usage": { "prompt_tokens": 236, "completion_tokens": 561, "total_tokens": 797 } }

@zhongdaor-nv zhongdaor-nv enabled auto-merge (squash) September 16, 2025 00:34
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Copy link
Contributor

@ayushag-nv ayushag-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

@zhongdaor-nv
Copy link
Contributor Author

/ok to test fdc9f0d

@zhongdaor-nv zhongdaor-nv merged commit 6675bfc into main Sep 18, 2025
13 checks passed
@zhongdaor-nv zhongdaor-nv deleted the zhongdaor/gpt-oss-frontend branch September 18, 2025 23:44
hhzhang16 pushed a commit that referenced this pull request Sep 19, 2025
…ser and reasoning parser (#2999)

Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants