feat: added support for pythonic tool parser #2788

ayushag-nv · 2025-08-29T19:46:51Z

Overview:

Implements basic pythonic parser using AST parsing. Supports all three cases for function parsing

constants as arg values
list as arg values
dicts as arg values

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added Pythonic tool-call parsing, including detection within mixed text.
- Introduced configurable tool-call parser with presets for common formats (Hermes, Nemotron Deci, Llama 3 JSON, Mistral, Phi-4).
Refactor
- Reorganized parsing modules and public re-exports; unified entry point (try_tool_call_parse) replaces JSON-specific variant.
Dependencies
- Added rustpython-parser and num-traits.
Tests
- Expanded tests to cover Pythonic parsing scenarios.
Chores
- Updated license allow list (LGPL-3.0, CC0-1.0, Unicode-DFS-2016).

Signed-off-by: ayushag <ayushag@nvidia.com>

copy-pr-bot · 2025-08-29T19:46:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ayushag-nv · 2025-08-29T19:49:42Z

Created this one because the earlier one got messed up due to rebase issue: #2726

coderabbitai · 2025-08-29T19:55:28Z

Walkthrough

Adds a Pythonic tool-call parser and integrates it into the parsing flow. Splits configuration types into a new config module, updates exports, and adjusts imports. Introduces presets for parser configs. Updates dependencies to support Python parsing. Adjusts license allowlist.

Changes

Cohort / File(s)	Summary
Licensing configuration `deny.toml`	Expanded license allowlist (LGPL-3.0, CC0-1.0, Unicode-DFS-2016); normalized NCSA formatting.
Dependency updates `lib/parsers/Cargo.toml`	Added dependencies: `rustpython-parser = "0.4.0"`, `num-traits = "0.2"`.
Config module introduction `lib/parsers/src/tool_calling/config.rs`	New module defining `ToolCallParserType`, `JsonParserConfig` (with defaults), and `ToolCallConfig` with presets (`hermes`, `nemotron_deci`, `llama3_json`, `mistral`, `phi4`, `pythonic`).
Parser wiring and flow `lib/parsers/src/tool_calling/parsers.rs`	Moved config types to `config`; integrated Pythonic parsing via `try_tool_call_parse_pythonic`; extended detect-and-parse to include Pythonic; tests updated and added.
New Pythonic parser `lib/parsers/src/tool_calling/pythonic_parser.rs`	New parser that extracts Python-style function-call lists; exposes `parse_tool_calls` and `try_tool_call_parse_pythonic`; supports constants, lists, dicts; returns tool calls and optional normal text.
Module organization and exports `lib/parsers/src/tool_calling/mod.rs`, `lib/parsers/src/tool_calling/tools.rs`	Added `config` and `pythonic_parser` modules; adjusted public re-exports; removed some json-specific re-exports; unified entry points (`detect_and_parse_tool_call`, `try_tool_call_parse`).
Import path adjustment `lib/parsers/src/tool_calling/json_parser.rs`	Updated import to `super::config::JsonParserConfig`; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant ToolCall as ToolCall::detect_and_parse_tool_call
  participant Config as ToolCallConfig
  participant Parsers as parsers
  participant JSON as json_parser
  participant Py as pythonic_parser

  Caller->>ToolCall: detect_and_parse_tool_call(message, config)
  ToolCall->>Config: Read format (ToolCallParserType)
  alt ParserType == Json
    ToolCall->>JSON: try_tool_call_parse_json(message, JsonParserConfig)
    JSON-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
  else ParserType == Pythonic
    ToolCall->>Py: try_tool_call_parse_pythonic(message)
    Py-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
  else Other (Harmony/TS/XML)
    ToolCall-->>Caller: Error/Unimplemented
  end
  ToolCall-->>Caller: (tool_calls, normal_text)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: parse normal text along with tool calls #2709 — Adjusts try_tool_call_parse/detect_and_parse_tool_call return types to include normal text; directly overlaps with flow integrated here.
feat: added parsers lib #2542 — Prior reorganization in lib/parsers (parsers/json_parser/tools/Cargo.toml); this PR builds on and extends that structure.
chore: Refactor tool calling for wider support in the future #2393 — Expands tool-calling parser surface (ToolCallConfig/ParserType and JSON/pythonic paths), closely aligned with this PR’s restructuring.

Poem

In brackets, calls align—hip hop hooray!
A bunny parses Python all the way. 🐇
Names and kwargs, tidy and neat,
Configs split out, a modular treat.
Licenses blessed, crates in tow—
Onward we hop, to parsing we go!

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (19)

lib/parsers/Cargo.toml (1)
38-39: Loosen version pin; consider feature-gating rustpython-parser.

Prefer a caret range to pick up compatible bugfixes, and make the Python parser dependency optional to avoid pulling it in for consumers who don’t need it.
-rustpython-parser = "0.4.0"
-num-traits = "0.2"
+rustpython-parser = "0.4"
+num-traits = "0.2"
If feasible, add a crate feature and gate the dep:
[features]
default = ["json"]
json = []
pythonic = ["dep:rustpython-parser"]

[dependencies]
rustpython-parser = { version = "0.4", optional = true }
Please confirm num-traits is used by the Pythonic path; remove if unused.
deny.toml (1)

36-39: Justify adding LGPL-3.0, CC0-1.0, Unicode-DFS-2016 to the allowlist

CC0-1.0 and Unicode-DFS-2016 are common for Unicode and data crates; LGPL-3.0 is a copyleft license that may impose downstream obligations.

If only a small set of dependencies require LGPL-3.0, prefer adding per-crate exceptions instead of a global allowlist.

Automated audit failed in this environment—please run cargo deny list locally to identify which crates need these licenses and confirm whether a global allowlist is appropriate or if targeted exceptions suffice.
lib/parsers/src/tool_calling/json_parser.rs (3)
195-199: Validate token list alignment.

zip silently truncates to the shorter list. Add a length check to avoid mispaired start/end tokens.

Example:
assert_eq!(
    tool_call_start_tokens.len(),
    tool_call_end_tokens.len(),
    "mismatched start/end token lists"
);
77-87: Single-token parsing ignores array starts.

In <|python_tag|> mode, only segments starting with { are considered; arrays [ will be skipped.

Consider accepting [ and validating via serde_json::from_str::<Value>().

307-308: Preserve extracted normal text on parse miss.

Fallback returns trimmed instead of the previously extracted normal_text, which can reintroduce tool payload into the content.

Prefer returning Some(normal_text) here.
lib/parsers/src/tool_calling/tools.rs (1)

17-21: Demote log level to avoid noise.

The info-level logs on every parse attempt can spam callers.

Switch to debug! unless this is operationally required.

lib/parsers/src/tool_calling/mod.rs (1)

4-4: Consider keeping pythonic_parser private.

If no public items are intended from pythonic_parser, use mod pythonic_parser; to avoid exposing internal APIs.
lib/parsers/src/tool_calling/pythonic_parser.rs (7)
12-17: Avoid hard-coded tags; source from config or constants.

Hard-coding <|python_start|>/<|python_end|> couples behavior to literals and bypasses ToolCallConfig. Prefer a shared constant or pass allowed tags via config to keep formats centralized.

31-33: Graceful parsing: prefer tolerant fallbacks over bubbling parse errors.

parse(src, Mode::Expression, ...) errors should degrade to Ok(([], Some(original))) in the outer API, not Err. This matches JSON parser’s behavior.

56-60: Log/handle positional args to avoid silent drops.

Positional args are currently ignored without a trace. Add a debug log (or support constants) to avoid surprising losses.

Example:
-        let (func, keywords) = match elt {
-            Expr::Call(call) => (&call.func, &call.keywords),
+        let (func, keywords, args) = match elt {
+            Expr::Call(call) => (&call.func, &call.keywords, &call.args),
             _ => continue,
         };
+        if !args.is_empty() {
+            tracing::debug!("Skipping positional args in pythonic call for {:?}", func);
+        }
61-64: Optional: accept attribute callees (module.func).

If desired, accept Expr::Attribute (e.g., tools.search.query) and extract terminal name or full path.

86-95: Use UUIDs for IDs to align with JSON path and avoid collisions.

JSON parser uses UUIDs; here IDs restart at 1. Switch to UUID for consistency.

Apply:
+use uuid::Uuid;
@@
-        res.push(ToolCallResponse {
-            id: format!("call-{}", idx + 1),
+        res.push(ToolCallResponse {
+            id: format!("call-{}", Uuid::new_v4()),
99-149: Strengthen const_expr edge cases (floats, tuples, dict keys).

Non-finite floats (NaN/Inf) are not valid JSON; json!(f) can fail at serialization time.

Consider supporting tuples as lists.

For dict keys, coercing non-strings via to_string() can surprise; you may want to error instead.

Apply:
-            Constant::Float(f) => json!(f),
+            Constant::Float(f) => {
+                if let Some(n) = Number::from_f64(*f) {
+                    Value::Number(n)
+                } else {
+                    return Err("non-finite float not supported in JSON".into());
+                }
+            }
@@
-        // Handle Python lists as expressions, not constants
+        // Handle Python lists/tuples as expressions, not constants
         Expr::List(expr_list) => {
             let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> =
                 expr_list.elts.iter().map(|e| const_expr(e)).collect();
             Ok(json!(list_values?))
         }
+        Expr::Tuple(expr_tuple) => {
+            let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> =
+                expr_tuple.elts.iter().map(|e| const_expr(e)).collect();
+            Ok(json!(list_values?))
+        }
@@
-                        other => other.to_string(),
+                        other => {
+                            return Err(format!("non-string dict key not supported: {}", other).into())
+                        },
188-205: Tests: add coverage for underscores and non-finite floats.

Please add:

function _do(a=1) and arg _x=2 to ensure regex accepts leading underscores.

float edge-case value=float('inf') -> expect skip/log.

I can draft these tests if helpful.
lib/parsers/src/tool_calling/config.rs (3)
18-36: Scope of JsonParserConfig looks good; consider deriving Eq/PartialEq.

Deriving Eq/PartialEq helps in tests/config comparisons; no behavior change.
-#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+#[derive(Clone, Debug, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
 pub struct JsonParserConfig {
69-82: Hermes end token embeds a leading newline; confirm parser contract.

The end token is "\n</tool_call>". This is unusual and may be brittle if models omit the newline. Consider trimming around matches or using "</tool_call>" as the token.

131-136: Pythonic config currently unused by parser; align tags via config.

pythonic() doesn’t convey the pythonic start/end tags used in strip_text(). Consider adding pythonic tag settings to config and threading them into the pythonic parser to avoid drift.

Happy to wire this through and update strip_text to consume config.
lib/parsers/src/tool_calling/parsers.rs (2)
69-75: Test helper duplication across modules.

extract_name_and_args appears in multiple modules. Consider centralizing in a shared test util to reduce duplication.

1094-1110: Unignore the pythonic + normal text test.

The pythonic parser already returns prefix normal text; enable this test to prevent regressions.

Apply:
-    #[ignore]
     fn test_pythonic_parser_with_constants_and_normal_text() {

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 295964e and 9779831.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

deny.toml (1 hunks)
lib/parsers/Cargo.toml (1 hunks)
lib/parsers/src/tool_calling/config.rs (1 hunks)
lib/parsers/src/tool_calling/json_parser.rs (1 hunks)
lib/parsers/src/tool_calling/mod.rs (1 hunks)
lib/parsers/src/tool_calling/parsers.rs (5 hunks)
lib/parsers/src/tool_calling/pythonic_parser.rs (1 hunks)
lib/parsers/src/tool_calling/tools.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

lib/parsers/src/tool_calling/tools.rs (1)

lib/parsers/src/tool_calling/parsers.rs (1)

detect_and_parse_tool_call (36-63)

lib/parsers/src/tool_calling/pythonic_parser.rs (1)

lib/parsers/src/tool_calling/parsers.rs (1)

extract_name_and_args (72-75)

lib/parsers/src/tool_calling/mod.rs (1)

lib/parsers/src/tool_calling/parsers.rs (2)

detect_and_parse_tool_call (36-63)

try_tool_call_parse (9-33)

lib/parsers/src/tool_calling/parsers.rs (3)

lib/parsers/src/tool_calling/json_parser.rs (1)

try_tool_call_parse_json (152-308)

lib/parsers/src/tool_calling/pythonic_parser.rs (1)

try_tool_call_parse_pythonic (151-177)

lib/parsers/src/tool_calling/config.rs (1)

pythonic (131-136)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (8)

lib/parsers/src/tool_calling/json_parser.rs (2)

10-10: Import path change LGTM.

Switching to super::config::JsonParserConfig matches the new module layout.

134-139: Clarify list handling behavior in json_parser.rs
Docstring states only the last tool call is returned, but the code collects and returns all parsed calls (order preserved). Confirm the intended contract and either update the docstring to “returns all parsed tool calls (order preserved)” or change the list branches to return only the last item. Applies also to lines 280–305.

lib/parsers/src/tool_calling/tools.rs (1)

4-5: Re-export adjustments look good.

ToolCallConfig and detect_and_parse_tool_call remain available via tools, aligning with the new config/parsers split.

lib/parsers/src/tool_calling/mod.rs (1)

12-13: Drop the deprecated-alias suggestion – try_tool_call_parse_json is still exposed under json_parser, so there’s no breaking change, and aliasing the generic try_tool_call_parse would misalign its signature.

Likely an incorrect or invalid review comment.

lib/parsers/src/tool_calling/pythonic_parser.rs (1)

316-343: Nice coverage for lists/dicts.

Good recursive handling validation and nested dict assertions.

lib/parsers/src/tool_calling/config.rs (1)

38-49: Default start tokens include <|python_tag|>; verify false-positive risk.

Including "<|python_tag|>" in defaults means any message with that token will trigger JSON extraction even when not a tool call. Confirm intended across models; otherwise, move it to model-specific presets only (e.g., llama3_json()).

lib/parsers/src/tool_calling/parsers.rs (2)

22-25: Wiring Pythonic path: LGTM.

Delegating on ToolCallParserType::Pythonic to try_tool_call_parse_pythonic is correct.

47-48: Expose "pythonic" preset in detect_and_parse map: LGTM.

Config registration looks consistent with other presets.

lib/parsers/src/tool_calling/pythonic_parser.rs

Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

Signed-off-by: ayushag <ayushag@nvidia.com>

Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

feat: added support for pythonic tool parser

9779831

Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv self-assigned this Aug 29, 2025

ayushag-nv requested a review from a team as a code owner August 29, 2025 19:46

pull-request-size bot added the size/XL label Aug 29, 2025

github-actions bot added the feat label Aug 29, 2025

ayushag-nv mentioned this pull request Aug 29, 2025

feat: add pythonic tool call parser support #2726

Closed

ayushag-nv requested review from grahamking and paulhendricks August 29, 2025 19:50

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

lib/parsers/src/tool_calling/pythonic_parser.rs Show resolved Hide resolved

lib/parsers/src/tool_calling/pythonic_parser.rs Show resolved Hide resolved

GuanLuo approved these changes Sep 2, 2025

View reviewed changes

lib/parsers/src/tool_calling/pythonic_parser.rs Show resolved Hide resolved

Merge branch 'main' into ayushag/pythonic-parser-redo

e33c55d

paulhendricks approved these changes Sep 2, 2025

View reviewed changes

grahamking reviewed Sep 2, 2025

View reviewed changes

lib/parsers/src/tool_calling/pythonic_parser.rs Show resolved Hide resolved

grahamking approved these changes Sep 2, 2025

View reviewed changes

chore: addressed comments

91c769d

Signed-off-by: ayushag <ayushag@nvidia.com>

ayushag-nv enabled auto-merge (squash) September 2, 2025 14:59

Merge branch 'main' into ayushag/pythonic-parser-redo

5967b5b

ayushag-nv merged commit d39d676 into main Sep 2, 2025
11 checks passed

ayushag-nv deleted the ayushag/pythonic-parser-redo branch September 2, 2025 16:22

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

feat: added support for pythonic tool parser (#2788)

9bcb157

Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>

dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025

feat: added support for pythonic tool parser (#2788)

0cb22a1

Signed-off-by: ayushag <ayushag@nvidia.com>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

feat: added support for pythonic tool parser (#2788)

6e32b42

Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>

coderabbitai bot mentioned this pull request Sep 10, 2025

feat: enhance GPT OSS frontend with improved harmony tool calling parser and reasoning parser #2999

Merged

coderabbitai bot mentioned this pull request Sep 23, 2025

chore: JailStream Optimizations #3172

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: added support for pythonic tool parser #2788

feat: added support for pythonic tool parser #2788

Uh oh!

ayushag-nv commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 29, 2025

Uh oh!

ayushag-nv commented Aug 29, 2025

Uh oh!

coderabbitai bot commented Aug 29, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: added support for pythonic tool parser #2788

feat: added support for pythonic tool parser #2788

Uh oh!

Conversation

ayushag-nv commented Aug 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 29, 2025

Uh oh!

ayushag-nv commented Aug 29, 2025

Uh oh!

coderabbitai bot commented Aug 29, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ayushag-nv commented Aug 29, 2025 •

edited by coderabbitai bot

Loading