Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Aug 29, 2025

Overview:

Implements basic pythonic parser using AST parsing. Supports all three cases for function parsing

  • constants as arg values
  • list as arg values
  • dicts as arg values

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added Pythonic tool-call parsing, including detection within mixed text.
    • Introduced configurable tool-call parser with presets for common formats (Hermes, Nemotron Deci, Llama 3 JSON, Mistral, Phi-4).
  • Refactor

    • Reorganized parsing modules and public re-exports; unified entry point (try_tool_call_parse) replaces JSON-specific variant.
  • Dependencies

    • Added rustpython-parser and num-traits.
  • Tests

    • Expanded tests to cover Pythonic parsing scenarios.
  • Chores

    • Updated license allow list (LGPL-3.0, CC0-1.0, Unicode-DFS-2016).

Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv self-assigned this Aug 29, 2025
@ayushag-nv ayushag-nv requested a review from a team as a code owner August 29, 2025 19:46
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv
Copy link
Contributor Author

Created this one because the earlier one got messed up due to rebase issue: #2726

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

Walkthrough

Adds a Pythonic tool-call parser and integrates it into the parsing flow. Splits configuration types into a new config module, updates exports, and adjusts imports. Introduces presets for parser configs. Updates dependencies to support Python parsing. Adjusts license allowlist.

Changes

Cohort / File(s) Summary
Licensing configuration
deny.toml
Expanded license allowlist (LGPL-3.0, CC0-1.0, Unicode-DFS-2016); normalized NCSA formatting.
Dependency updates
lib/parsers/Cargo.toml
Added dependencies: rustpython-parser = "0.4.0", num-traits = "0.2".
Config module introduction
lib/parsers/src/tool_calling/config.rs
New module defining ToolCallParserType, JsonParserConfig (with defaults), and ToolCallConfig with presets (hermes, nemotron_deci, llama3_json, mistral, phi4, pythonic).
Parser wiring and flow
lib/parsers/src/tool_calling/parsers.rs
Moved config types to config; integrated Pythonic parsing via try_tool_call_parse_pythonic; extended detect-and-parse to include Pythonic; tests updated and added.
New Pythonic parser
lib/parsers/src/tool_calling/pythonic_parser.rs
New parser that extracts Python-style function-call lists; exposes parse_tool_calls and try_tool_call_parse_pythonic; supports constants, lists, dicts; returns tool calls and optional normal text.
Module organization and exports
lib/parsers/src/tool_calling/mod.rs, lib/parsers/src/tool_calling/tools.rs
Added config and pythonic_parser modules; adjusted public re-exports; removed some json-specific re-exports; unified entry points (detect_and_parse_tool_call, try_tool_call_parse).
Import path adjustment
lib/parsers/src/tool_calling/json_parser.rs
Updated import to super::config::JsonParserConfig; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant ToolCall as ToolCall::detect_and_parse_tool_call
  participant Config as ToolCallConfig
  participant Parsers as parsers
  participant JSON as json_parser
  participant Py as pythonic_parser

  Caller->>ToolCall: detect_and_parse_tool_call(message, config)
  ToolCall->>Config: Read format (ToolCallParserType)
  alt ParserType == Json
    ToolCall->>JSON: try_tool_call_parse_json(message, JsonParserConfig)
    JSON-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
  else ParserType == Pythonic
    ToolCall->>Py: try_tool_call_parse_pythonic(message)
    Py-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
  else Other (Harmony/TS/XML)
    ToolCall-->>Caller: Error/Unimplemented
  end
  ToolCall-->>Caller: (tool_calls, normal_text)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In brackets, calls align—hip hop hooray!
A bunny parses Python all the way. 🐇
Names and kwargs, tidy and neat,
Configs split out, a modular treat.
Licenses blessed, crates in tow—
Onward we hop, to parsing we go!

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (19)
lib/parsers/Cargo.toml (1)

38-39: Loosen version pin; consider feature-gating rustpython-parser.

  • Prefer a caret range to pick up compatible bugfixes, and make the Python parser dependency optional to avoid pulling it in for consumers who don’t need it.
-rustpython-parser = "0.4.0"
-num-traits = "0.2"
+rustpython-parser = "0.4"
+num-traits = "0.2"

If feasible, add a crate feature and gate the dep:

[features]
default = ["json"]
json = []
pythonic = ["dep:rustpython-parser"]

[dependencies]
rustpython-parser = { version = "0.4", optional = true }

Please confirm num-traits is used by the Pythonic path; remove if unused.

deny.toml (1)

36-39: Justify adding LGPL-3.0, CC0-1.0, Unicode-DFS-2016 to the allowlist

  • CC0-1.0 and Unicode-DFS-2016 are common for Unicode and data crates; LGPL-3.0 is a copyleft license that may impose downstream obligations.
  • If only a small set of dependencies require LGPL-3.0, prefer adding per-crate exceptions instead of a global allowlist.
  • Automated audit failed in this environment—please run cargo deny list locally to identify which crates need these licenses and confirm whether a global allowlist is appropriate or if targeted exceptions suffice.
lib/parsers/src/tool_calling/json_parser.rs (3)

195-199: Validate token list alignment.

zip silently truncates to the shorter list. Add a length check to avoid mispaired start/end tokens.

Example:

assert_eq!(
    tool_call_start_tokens.len(),
    tool_call_end_tokens.len(),
    "mismatched start/end token lists"
);

77-87: Single-token parsing ignores array starts.

In <|python_tag|> mode, only segments starting with { are considered; arrays [ will be skipped.

Consider accepting [ and validating via serde_json::from_str::<Value>().


307-308: Preserve extracted normal text on parse miss.

Fallback returns trimmed instead of the previously extracted normal_text, which can reintroduce tool payload into the content.

Prefer returning Some(normal_text) here.

lib/parsers/src/tool_calling/tools.rs (1)

17-21: Demote log level to avoid noise.

The info-level logs on every parse attempt can spam callers.

Switch to debug! unless this is operationally required.

lib/parsers/src/tool_calling/mod.rs (1)

4-4: Consider keeping pythonic_parser private.

If no public items are intended from pythonic_parser, use mod pythonic_parser; to avoid exposing internal APIs.

lib/parsers/src/tool_calling/pythonic_parser.rs (7)

12-17: Avoid hard-coded tags; source from config or constants.

Hard-coding <|python_start|>/<|python_end|> couples behavior to literals and bypasses ToolCallConfig. Prefer a shared constant or pass allowed tags via config to keep formats centralized.


31-33: Graceful parsing: prefer tolerant fallbacks over bubbling parse errors.

parse(src, Mode::Expression, ...) errors should degrade to Ok(([], Some(original))) in the outer API, not Err. This matches JSON parser’s behavior.


56-60: Log/handle positional args to avoid silent drops.

Positional args are currently ignored without a trace. Add a debug log (or support constants) to avoid surprising losses.

Example:

-        let (func, keywords) = match elt {
-            Expr::Call(call) => (&call.func, &call.keywords),
+        let (func, keywords, args) = match elt {
+            Expr::Call(call) => (&call.func, &call.keywords, &call.args),
             _ => continue,
         };
+        if !args.is_empty() {
+            tracing::debug!("Skipping positional args in pythonic call for {:?}", func);
+        }

61-64: Optional: accept attribute callees (module.func).

If desired, accept Expr::Attribute (e.g., tools.search.query) and extract terminal name or full path.


86-95: Use UUIDs for IDs to align with JSON path and avoid collisions.

JSON parser uses UUIDs; here IDs restart at 1. Switch to UUID for consistency.

Apply:

+use uuid::Uuid;
@@
-        res.push(ToolCallResponse {
-            id: format!("call-{}", idx + 1),
+        res.push(ToolCallResponse {
+            id: format!("call-{}", Uuid::new_v4()),

99-149: Strengthen const_expr edge cases (floats, tuples, dict keys).

  • Non-finite floats (NaN/Inf) are not valid JSON; json!(f) can fail at serialization time.
  • Consider supporting tuples as lists.
  • For dict keys, coercing non-strings via to_string() can surprise; you may want to error instead.

Apply:

-            Constant::Float(f) => json!(f),
+            Constant::Float(f) => {
+                if let Some(n) = Number::from_f64(*f) {
+                    Value::Number(n)
+                } else {
+                    return Err("non-finite float not supported in JSON".into());
+                }
+            }
@@
-        // Handle Python lists as expressions, not constants
+        // Handle Python lists/tuples as expressions, not constants
         Expr::List(expr_list) => {
             let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> =
                 expr_list.elts.iter().map(|e| const_expr(e)).collect();
             Ok(json!(list_values?))
         }
+        Expr::Tuple(expr_tuple) => {
+            let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> =
+                expr_tuple.elts.iter().map(|e| const_expr(e)).collect();
+            Ok(json!(list_values?))
+        }
@@
-                        other => other.to_string(),
+                        other => {
+                            return Err(format!("non-string dict key not supported: {}", other).into())
+                        },

188-205: Tests: add coverage for underscores and non-finite floats.

Please add:

  • function _do(a=1) and arg _x=2 to ensure regex accepts leading underscores.
  • float edge-case value=float('inf') -> expect skip/log.

I can draft these tests if helpful.

lib/parsers/src/tool_calling/config.rs (3)

18-36: Scope of JsonParserConfig looks good; consider deriving Eq/PartialEq.

Deriving Eq/PartialEq helps in tests/config comparisons; no behavior change.

-#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+#[derive(Clone, Debug, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
 pub struct JsonParserConfig {

69-82: Hermes end token embeds a leading newline; confirm parser contract.

The end token is "\n</tool_call>". This is unusual and may be brittle if models omit the newline. Consider trimming around matches or using "</tool_call>" as the token.


131-136: Pythonic config currently unused by parser; align tags via config.

pythonic() doesn’t convey the pythonic start/end tags used in strip_text(). Consider adding pythonic tag settings to config and threading them into the pythonic parser to avoid drift.

Happy to wire this through and update strip_text to consume config.

lib/parsers/src/tool_calling/parsers.rs (2)

69-75: Test helper duplication across modules.

extract_name_and_args appears in multiple modules. Consider centralizing in a shared test util to reduce duplication.


1094-1110: Unignore the pythonic + normal text test.

The pythonic parser already returns prefix normal text; enable this test to prevent regressions.

Apply:

-    #[ignore]
     fn test_pythonic_parser_with_constants_and_normal_text() {
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 295964e and 9779831.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • deny.toml (1 hunks)
  • lib/parsers/Cargo.toml (1 hunks)
  • lib/parsers/src/tool_calling/config.rs (1 hunks)
  • lib/parsers/src/tool_calling/json_parser.rs (1 hunks)
  • lib/parsers/src/tool_calling/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (5 hunks)
  • lib/parsers/src/tool_calling/pythonic_parser.rs (1 hunks)
  • lib/parsers/src/tool_calling/tools.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/parsers/src/tool_calling/tools.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
  • detect_and_parse_tool_call (36-63)
lib/parsers/src/tool_calling/pythonic_parser.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
  • extract_name_and_args (72-75)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (2)
  • detect_and_parse_tool_call (36-63)
  • try_tool_call_parse (9-33)
lib/parsers/src/tool_calling/parsers.rs (3)
lib/parsers/src/tool_calling/json_parser.rs (1)
  • try_tool_call_parse_json (152-308)
lib/parsers/src/tool_calling/pythonic_parser.rs (1)
  • try_tool_call_parse_pythonic (151-177)
lib/parsers/src/tool_calling/config.rs (1)
  • pythonic (131-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (8)
lib/parsers/src/tool_calling/json_parser.rs (2)

10-10: Import path change LGTM.

Switching to super::config::JsonParserConfig matches the new module layout.


134-139: Clarify list handling behavior in json_parser.rs
Docstring states only the last tool call is returned, but the code collects and returns all parsed calls (order preserved). Confirm the intended contract and either update the docstring to “returns all parsed tool calls (order preserved)” or change the list branches to return only the last item. Applies also to lines 280–305.

lib/parsers/src/tool_calling/tools.rs (1)

4-5: Re-export adjustments look good.

ToolCallConfig and detect_and_parse_tool_call remain available via tools, aligning with the new config/parsers split.

lib/parsers/src/tool_calling/mod.rs (1)

12-13: Drop the deprecated-alias suggestiontry_tool_call_parse_json is still exposed under json_parser, so there’s no breaking change, and aliasing the generic try_tool_call_parse would misalign its signature.

Likely an incorrect or invalid review comment.

lib/parsers/src/tool_calling/pythonic_parser.rs (1)

316-343: Nice coverage for lists/dicts.

Good recursive handling validation and nested dict assertions.

lib/parsers/src/tool_calling/config.rs (1)

38-49: Default start tokens include <|python_tag|>; verify false-positive risk.

Including "<|python_tag|>" in defaults means any message with that token will trigger JSON extraction even when not a tool call. Confirm intended across models; otherwise, move it to model-specific presets only (e.g., llama3_json()).

lib/parsers/src/tool_calling/parsers.rs (2)

22-25: Wiring Pythonic path: LGTM.

Delegating on ToolCallParserType::Pythonic to try_tool_call_parse_pythonic is correct.


47-48: Expose "pythonic" preset in detect_and_parse map: LGTM.

Config registration looks consistent with other presets.

Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv enabled auto-merge (squash) September 2, 2025 14:59
@ayushag-nv ayushag-nv merged commit d39d676 into main Sep 2, 2025
11 checks passed
@ayushag-nv ayushag-nv deleted the ayushag/pythonic-parser-redo branch September 2, 2025 16:22
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: nnshah1 <neelays@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants