Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Aug 26, 2025

Overview:

Implements basic pythonic parser using AST parsing. Supports all three cases for function parsing

  • constants as arg values
  • list as arg values
  • dicts as arg values

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added Pythonic tool-call parsing, enabling extraction from Python-like lists of function calls.
    • Introduced presets for common formats (Hermes, Nemotron-Deci, Llama 3 JSON, Mistral, Phi-4) for quicker setup.
    • Enhanced auto-detection and configurability of start/end tokens and argument/name keys.
  • Refactor

    • Streamlined tool-call configuration and parser selection to simplify integration and customization.
  • Chores

    • Added a new parsing dependency to support Pythonic syntax.

@ayushag-nv ayushag-nv requested a review from a team as a code owner August 26, 2025 21:54
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv ayushag-nv self-assigned this Aug 26, 2025
@ayushag-nv ayushag-nv marked this pull request as draft August 26, 2025 21:54
@github-actions github-actions bot added the feat label Aug 26, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Adds a Pythonic tool-call parser using rustpython-parser, introduces a dedicated config module for parser types and presets, updates routing to enable Pythonic parsing, and reorganizes public re-exports. Also updates imports and adds a Cargo dependency.

Changes

Cohort / File(s) Summary
Dependency Update
lib/parsers/Cargo.toml
Adds dependency: rustpython-parser = "0.4.0".
Config Module Introduction
lib/parsers/src/tool_calling/config.rs
Adds ToolCallParserType, JsonParserConfig (with Default), and ToolCallConfig (with Default and presets: hermes, nemotron_deci, llama3_json, mistral, phi4, pythonic).
Parser Routing & Detection
lib/parsers/src/tool_calling/parsers.rs
Moves config types to config; wires Pythonic branch via try_tool_call_parse_pythonic; adds "pythonic" to detect_and_parse_tool_call; updates tests to import moved types.
Pythonic Parser (New Feature)
lib/parsers/src/tool_calling/pythonic_parser.rs
New module implementing Python-like tool-call extraction using rustpython AST; exposes parse_tool_calls and try_tool_call_parse_pythonic.
Public API Re-exports
lib/parsers/src/tool_calling/mod.rs, lib/parsers/src/tool_calling/tools.rs
Adds modules config, pythonic_parser; re-exports config types from config and parsing fns from parsers; removes direct JSON parser exports and some prior re-exports.
JSON Parser Import Cleanup
lib/parsers/src/tool_calling/json_parser.rs
Updates import path to super::config::JsonParserConfig; removes public re-exports of config items.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Caller
  participant Parsers as parsers::detect_and_parse_tool_call
  participant Router as parsers::try_tool_call_parse
  participant Py as pythonic_parser
  participant Json as json_parser

  Caller->>Parsers: detect_and_parse_tool_call(message)
  note right of Parsers: Build parser_map incl. "pythonic", presets
  Parsers->>Router: try_tool_call_parse(message, ToolCallConfig)

  alt format == Pythonic
    Router->>Py: try_tool_call_parse_pythonic(message)
    note over Py: Strip python tags<br/>Regex bracket extraction<br/>AST parse calls<br/>Serialize kwargs
    Py-->>Router: Vec<ToolCallResponse>, remaining
  else format == Json (etc.)
    Router->>Json: try_tool_call_parse_json(message)
    Json-->>Router: Vec<ToolCallResponse>, remaining
  end

  Router-->>Parsers: Results
  Parsers-->>Caller: Results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In brackets I found little calls to make,
A whisker-twitch, a parse for parsing’s sake.
Pythonic whispers hop through AST trees,
JSON burrows mapped with token keys.
New routes carved neat, dependencies tight—
A happy hare commits tonight. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/parsers/src/tool_calling/json_parser.rs (1)

139-146: Docstring contradicts behavior: code returns all items, not the last one.

The “Note on List Handling” states “only the last item” is returned. Below, both list branches collect and return all items in order. This discrepancy is user-facing and can mislead integrators.

Either update docs or change behavior. If the new multi-tool path is intentional (it matches tools.rs aggregate/stream helpers), update the docs:

-/// When the input contains a list of tool calls (either with `parameters` or `arguments`),
-/// only the **last item** in the list is returned. This design choice assumes that the
-/// most recent tool call in a list is the one to execute.
+/// When the input contains a list of tool calls (either with `parameters` or `arguments`),
+/// all items are returned in order. Upstream callers can choose which tool(s) to execute.
🧹 Nitpick comments (13)
lib/parsers/Cargo.toml (1)

38-38: Confirm rustpython-parser version and feature flags; consider pinning with caret and documenting MSRV.

0.4.0 appears to be the latest released version of rustpython-parser (published Aug 6, 2024). Double‑check if a newer 0.4.x exists before merging and whether you need any crate features (e.g., lalrpop, location) for AST ranges. Also consider using a caret requirement (^0.4) unless you intentionally want a strict pin.

As of today (August 26, 2025), lib.rs still lists 0.4.0 as latest. Please verify on crates.io/lib.rs:

If you decide to keep a strict pin, add a short comment explaining why (e.g., API stability, MSRV).

lib/parsers/src/tool_calling/json_parser.rs (3)

13-15: Remove “Remove this line” comments.

These TODO-like comments are easy to forget and add noise. Since you’ve already removed the re-exports, delete these lines entirely.

Apply this diff:

-// Remove this line:
-// pub use super::config::{JsonParserConfig, ToolCallConfig, ToolCallParserType};
-

282-289: Inline comments also claim “pop the last item” but code returns all items.

The inline comments in both list branches say “pop the last item,” but the code pushes all items to results.

Apply this diff to align comments with behavior:

-    // We pop the last item in the list to use.
+    // We return all items collected in order.

And similarly in the Arguments list branch:

-    // Again, we take the last item for processing.
+    // We return all items collected in order.

Also applies to: 302-307


197-229: Token handling: consider validating config pairs and short‑circuiting once a match is found.

You zip start/end tokens; if lengths diverge, some tokens are silently ignored. Also, once a match succeeds, you break, which is good—ensure you cover all configured start-only tokens before giving up.

  • Validate tool_call_start_tokens.len() == tool_call_end_tokens.len() up front and warn if not.
  • Consider checking all start-only tokens first, then paired tokens, to avoid missing single-token matches if the first pair fails.
lib/parsers/src/tool_calling/tools.rs (1)

47-77: Stream conversion looks correct; ensure id semantics are consistent across parsers.

You set id in chunks to Some(parsed.id). The JSON parser uses UUIDs; the Pythonic parser currently uses call-<index>. Consider unifying id shape across parsers for downstream consumers.

lib/parsers/src/tool_calling/mod.rs (1)

12-15: Re-exports align with the new structure.

Nice consolidation. Follow-up: consider adding a module-level doc comment outlining when to choose JSON vs Pythonic formats and how to override presets.

lib/parsers/src/tool_calling/pythonic_parser.rs (3)

12-17: Strip all known Python tags; consider making this configurable.

You remove <|python_start|>/<|python_end|>, but elsewhere we reference <|python_tag|>. Add that tag here to avoid surprises, and consider passing tokens via config later.

Apply this minimal diff:

 fn strip_text(message: &str) -> String {
     // Remove unexpected python tags if any
     message
         .replace("<|python_start|>", "")
         .replace("<|python_end|>", "")
+        .replace("<|python_tag|>", "")
 }

Longer-term: accept a config of start/end tokens similar to the JSON parser.


77-84: Unify call id format with JSON parser; avoid predictable ids.

JSON parser uses UUIDs; here you use call-<index>. Consider UUIDs for consistency.

Apply this diff and import uuid::Uuid:

-use super::response::{CalledFunction, ToolCallResponse, ToolCallType};
+use super::response::{CalledFunction, ToolCallResponse, ToolCallType};
+use uuid::Uuid;
@@
-        res.push(ToolCallResponse {
-            id: format!("call-{}", idx + 1),
+        res.push(ToolCallResponse {
+            id: format!("call-{}", Uuid::new_v4()),

127-150: Tests pass for current behavior; update if you adopt numeric ints.

If you switch to numeric ints in const_expr, adjust assertions:

-        assert_eq!(args["a"], "1");
-        assert_eq!(args["b"], "2");
+        assert_eq!(args["a"], 1);
+        assert_eq!(args["b"], 2);
@@
-        assert_eq!(args["x"], "3");
+        assert_eq!(args["x"], 3);
lib/parsers/src/tool_calling/config.rs (3)

5-16: Make enum ergonomic and future-proof (derive Eq/Copy; consider non_exhaustive).

Deriving Eq/PartialEq and Copy on a C-like enum improves matching in tests and config code. Marking it non_exhaustive avoids semver breakage when adding new formats.

Apply:

-#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
+#[derive(Clone, Copy, Debug, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
+#[non_exhaustive]
 pub enum ToolCallParserType {

38-49: Document and validate start/end token pairing to prevent silent misparsing.

The JSON parser zips start and end token vectors; mismatched lengths silently drop extras. Add a lightweight validator to catch this early in debug builds.

Add a validator inside the impl:

 impl ToolCallConfig {
+    /// Debug-only validation to ensure token arrays line up.
+    #[inline]
+    pub fn debug_validate(&self) {
+        #[cfg(debug_assertions)]
+        {
+            debug_assert_eq!(
+                self.json.tool_call_start_tokens.len(),
+                self.json.tool_call_end_tokens.len(),
+                "tool_call_start_tokens and tool_call_end_tokens must have equal length"
+            );
+        }
+    }

Then call config.debug_validate() at the start of try_tool_call_parse (parsers.rs) before matching on the format.


18-36: Remove or Wire Up Unused parallel_tool_calls_*_tokens Fields

I ran a grep search and confirmed that neither parallel_tool_calls_start_tokens nor parallel_tool_calls_end_tokens is ever read or referenced outside of their declaration and default initialization in config.rs.

• Since these fields aren’t consumed by any parser or JSON‐path logic, they serve no purpose in the current code.
• To minimize surface area and avoid dead code, it’s best to remove both fields (and their entries in the Default impl).
• If you intend to implement parallel tool‐calls support soon, you can leave a // TODO in its place or document the planned usage—but avoid shipping unused public API.

Let me know if you’d rather wire up the JSON‐path logic now instead of pruning these fields.

lib/parsers/src/tool_calling/parsers.rs (1)

15-18: Optional: Validate config invariants early.

Call the proposed ToolCallConfig::debug_validate() here to catch mismatched token arrays during development.

     match config.format {
-        ToolCallParserType::Json => {
+        ToolCallParserType::Json => {
+            config.debug_validate();
             let (results, normal_content) = try_tool_call_parse_json(message, &config.json)?;
             Ok((results, normal_content))
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8064849 and b4bd0a7.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • lib/parsers/Cargo.toml (1 hunks)
  • lib/parsers/src/tool_calling/config.rs (1 hunks)
  • lib/parsers/src/tool_calling/json_parser.rs (1 hunks)
  • lib/parsers/src/tool_calling/mod.rs (1 hunks)
  • lib/parsers/src/tool_calling/parsers.rs (4 hunks)
  • lib/parsers/src/tool_calling/pythonic_parser.rs (1 hunks)
  • lib/parsers/src/tool_calling/tools.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/parsers/src/tool_calling/pythonic_parser.rs (2)
lib/parsers/src/tool_calling/json_parser.rs (5)
  • serde_json (85-85)
  • serde_json (254-254)
  • serde_json (270-270)
  • serde_json (283-283)
  • serde_json (302-302)
lib/parsers/src/tool_calling/parsers.rs (1)
  • extract_name_and_args (72-75)
lib/parsers/src/tool_calling/tools.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
  • detect_and_parse_tool_call (36-63)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (2)
  • detect_and_parse_tool_call (36-63)
  • try_tool_call_parse (9-33)
lib/parsers/src/tool_calling/parsers.rs (3)
lib/parsers/src/tool_calling/json_parser.rs (1)
  • try_tool_call_parse_json (155-311)
lib/parsers/src/tool_calling/pythonic_parser.rs (1)
  • try_tool_call_parse_pythonic (105-125)
lib/parsers/src/tool_calling/config.rs (1)
  • pythonic (131-136)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2726/merge) by ayushag-nv.
lib/parsers/src/tool_calling/pythonic_parser.rs

[error] 1-1: Command: pre-commit run --show-diff-on-failure --color=always --all-files. Trailing whitespace detected by the pre-commit 'trailing-whitespace' hook in lib/parsers/src/tool_calling/pythonic_parser.rs; pre-commit exited with code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
lib/parsers/src/tool_calling/json_parser.rs (1)

10-10: Import path change looks good.

Switching to super::config::JsonParserConfig matches the new module layout.

lib/parsers/src/tool_calling/tools.rs (2)

10-42: Aggregate conversion preserves order and types. LGTM.

The mapping to ChatCompletionMessageToolCall looks correct and stable. Good logging around parser selection.


4-5: No internal stale imports detected

I’ve run project-wide ripgrep searches and confirmed there are no remaining tool_calling::json_parser::* or tool_calling::parsers::* imports in the codebase.

  • All old re-exports have been removed internally.
  • Before publishing this change, please manually verify that any external crates or downstream projects aren’t relying on the removed re-exported paths.
lib/parsers/src/tool_calling/mod.rs (1)

4-4: Module layout additions are good.

Adding config and pythonic_parser modules here makes the surface coherent.

Also applies to: 7-7

lib/parsers/src/tool_calling/parsers.rs (1)

69-70: Import path change looks good.

Bringing JsonParserConfig from super::super::config keeps tests consistent with the new module layout.

ayushag-nv and others added 12 commits August 27, 2025 21:04
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: mohammedabdulwahhab <furkhan324@berkeley.edu>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
…glang (#2713)

Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv ayushag-nv force-pushed the ayushag/add-pythonic-parser branch from 80cea68 to 3af260c Compare August 27, 2025 21:05
Signed-off-by: Ayush Agarwal <ayushag@nvidia.com>
@ayushag-nv ayushag-nv marked this pull request as ready for review August 27, 2025 21:12
@elyasmnvidian
Copy link
Contributor

please comment in the pythonic parser any existing limitations with the parser if it doesn't work completely

@grahamking
Copy link
Contributor

Code Rabbit makes some excellent points. Particularly we can't use unwrap() or expect(..), because that will panic and take down the process.

@ayushag-nv
Copy link
Contributor Author

Code Rabbit makes some excellent points. Particularly we can't use unwrap() or expect(..), because that will panic and take down the process.

@grahamking sounds good. I will address those.

Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
@ayushag-nv
Copy link
Contributor Author

Closing this in favor of : #2788

@ayushag-nv ayushag-nv closed this Aug 29, 2025
@devactivity-app
Copy link

Pull Request Summary by devActivity

Metrics

Cycle Time: 2d 21h 54m Coding Time: 1m Pickup Time: 9m Review Time: 2d 21h 43m

Achievements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.