-
Notifications
You must be signed in to change notification settings - Fork 690
feat: added support for pythonic tool parser #2788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ayushag <ayushag@nvidia.com>
|
Created this one because the earlier one got messed up due to rebase issue: #2726 |
WalkthroughAdds a Pythonic tool-call parser and integrates it into the parsing flow. Splits configuration types into a new config module, updates exports, and adjusts imports. Introduces presets for parser configs. Updates dependencies to support Python parsing. Adjusts license allowlist. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant ToolCall as ToolCall::detect_and_parse_tool_call
participant Config as ToolCallConfig
participant Parsers as parsers
participant JSON as json_parser
participant Py as pythonic_parser
Caller->>ToolCall: detect_and_parse_tool_call(message, config)
ToolCall->>Config: Read format (ToolCallParserType)
alt ParserType == Json
ToolCall->>JSON: try_tool_call_parse_json(message, JsonParserConfig)
JSON-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
else ParserType == Pythonic
ToolCall->>Py: try_tool_call_parse_pythonic(message)
Py-->>ToolCall: (Vec<ToolCallResponse>, Option<String>)
else Other (Harmony/TS/XML)
ToolCall-->>Caller: Error/Unimplemented
end
ToolCall-->>Caller: (tool_calls, normal_text)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (19)
lib/parsers/Cargo.toml (1)
38-39: Loosen version pin; consider feature-gatingrustpython-parser.
- Prefer a caret range to pick up compatible bugfixes, and make the Python parser dependency optional to avoid pulling it in for consumers who don’t need it.
-rustpython-parser = "0.4.0" -num-traits = "0.2" +rustpython-parser = "0.4" +num-traits = "0.2"If feasible, add a crate feature and gate the dep:
[features] default = ["json"] json = [] pythonic = ["dep:rustpython-parser"] [dependencies] rustpython-parser = { version = "0.4", optional = true }Please confirm
num-traitsis used by the Pythonic path; remove if unused.deny.toml (1)
36-39: Justify adding LGPL-3.0, CC0-1.0, Unicode-DFS-2016 to the allowlist
- CC0-1.0 and Unicode-DFS-2016 are common for Unicode and data crates; LGPL-3.0 is a copyleft license that may impose downstream obligations.
- If only a small set of dependencies require LGPL-3.0, prefer adding per-crate exceptions instead of a global allowlist.
- Automated audit failed in this environment—please run
cargo deny listlocally to identify which crates need these licenses and confirm whether a global allowlist is appropriate or if targeted exceptions suffice.lib/parsers/src/tool_calling/json_parser.rs (3)
195-199: Validate token list alignment.
zipsilently truncates to the shorter list. Add a length check to avoid mispaired start/end tokens.Example:
assert_eq!( tool_call_start_tokens.len(), tool_call_end_tokens.len(), "mismatched start/end token lists" );
77-87: Single-token parsing ignores array starts.In
<|python_tag|>mode, only segments starting with{are considered; arrays[will be skipped.Consider accepting
[and validating viaserde_json::from_str::<Value>().
307-308: Preserve extracted normal text on parse miss.Fallback returns
trimmedinstead of the previously extractednormal_text, which can reintroduce tool payload into the content.Prefer returning
Some(normal_text)here.lib/parsers/src/tool_calling/tools.rs (1)
17-21: Demote log level to avoid noise.The info-level logs on every parse attempt can spam callers.
Switch to
debug!unless this is operationally required.lib/parsers/src/tool_calling/mod.rs (1)
4-4: Consider keepingpythonic_parserprivate.If no public items are intended from
pythonic_parser, usemod pythonic_parser;to avoid exposing internal APIs.lib/parsers/src/tool_calling/pythonic_parser.rs (7)
12-17: Avoid hard-coded tags; source from config or constants.Hard-coding <|python_start|>/<|python_end|> couples behavior to literals and bypasses ToolCallConfig. Prefer a shared constant or pass allowed tags via config to keep formats centralized.
31-33: Graceful parsing: prefer tolerant fallbacks over bubbling parse errors.
parse(src, Mode::Expression, ...)errors should degrade to Ok(([], Some(original))) in the outer API, not Err. This matches JSON parser’s behavior.
56-60: Log/handle positional args to avoid silent drops.Positional args are currently ignored without a trace. Add a debug log (or support constants) to avoid surprising losses.
Example:
- let (func, keywords) = match elt { - Expr::Call(call) => (&call.func, &call.keywords), + let (func, keywords, args) = match elt { + Expr::Call(call) => (&call.func, &call.keywords, &call.args), _ => continue, }; + if !args.is_empty() { + tracing::debug!("Skipping positional args in pythonic call for {:?}", func); + }
61-64: Optional: accept attribute callees (module.func).If desired, accept
Expr::Attribute(e.g., tools.search.query) and extract terminal name or full path.
86-95: Use UUIDs for IDs to align with JSON path and avoid collisions.JSON parser uses UUIDs; here IDs restart at 1. Switch to UUID for consistency.
Apply:
+use uuid::Uuid; @@ - res.push(ToolCallResponse { - id: format!("call-{}", idx + 1), + res.push(ToolCallResponse { + id: format!("call-{}", Uuid::new_v4()),
99-149: Strengthen const_expr edge cases (floats, tuples, dict keys).
- Non-finite floats (NaN/Inf) are not valid JSON; json!(f) can fail at serialization time.
- Consider supporting tuples as lists.
- For dict keys, coercing non-strings via to_string() can surprise; you may want to error instead.
Apply:
- Constant::Float(f) => json!(f), + Constant::Float(f) => { + if let Some(n) = Number::from_f64(*f) { + Value::Number(n) + } else { + return Err("non-finite float not supported in JSON".into()); + } + } @@ - // Handle Python lists as expressions, not constants + // Handle Python lists/tuples as expressions, not constants Expr::List(expr_list) => { let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> = expr_list.elts.iter().map(|e| const_expr(e)).collect(); Ok(json!(list_values?)) } + Expr::Tuple(expr_tuple) => { + let list_values: Result<Vec<Value>, Box<dyn std::error::Error>> = + expr_tuple.elts.iter().map(|e| const_expr(e)).collect(); + Ok(json!(list_values?)) + } @@ - other => other.to_string(), + other => { + return Err(format!("non-string dict key not supported: {}", other).into()) + },
188-205: Tests: add coverage for underscores and non-finite floats.Please add:
- function
_do(a=1)and arg_x=2to ensure regex accepts leading underscores.- float edge-case
value=float('inf')-> expect skip/log.I can draft these tests if helpful.
lib/parsers/src/tool_calling/config.rs (3)
18-36: Scope of JsonParserConfig looks good; consider deriving Eq/PartialEq.Deriving Eq/PartialEq helps in tests/config comparisons; no behavior change.
-#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] +#[derive(Clone, Debug, PartialEq, Eq, serde::Serialize, serde::Deserialize)] pub struct JsonParserConfig {
69-82: Hermes end token embeds a leading newline; confirm parser contract.The end token is "\n</tool_call>". This is unusual and may be brittle if models omit the newline. Consider trimming around matches or using "</tool_call>" as the token.
131-136: Pythonic config currently unused by parser; align tags via config.
pythonic()doesn’t convey the pythonic start/end tags used in strip_text(). Consider adding pythonic tag settings to config and threading them into the pythonic parser to avoid drift.Happy to wire this through and update strip_text to consume config.
lib/parsers/src/tool_calling/parsers.rs (2)
69-75: Test helper duplication across modules.
extract_name_and_argsappears in multiple modules. Consider centralizing in a shared test util to reduce duplication.
1094-1110: Unignore the pythonic + normal text test.The pythonic parser already returns prefix normal text; enable this test to prevent regressions.
Apply:
- #[ignore] fn test_pythonic_parser_with_constants_and_normal_text() {
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
deny.toml(1 hunks)lib/parsers/Cargo.toml(1 hunks)lib/parsers/src/tool_calling/config.rs(1 hunks)lib/parsers/src/tool_calling/json_parser.rs(1 hunks)lib/parsers/src/tool_calling/mod.rs(1 hunks)lib/parsers/src/tool_calling/parsers.rs(5 hunks)lib/parsers/src/tool_calling/pythonic_parser.rs(1 hunks)lib/parsers/src/tool_calling/tools.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/parsers/src/tool_calling/tools.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
detect_and_parse_tool_call(36-63)
lib/parsers/src/tool_calling/pythonic_parser.rs (1)
lib/parsers/src/tool_calling/parsers.rs (1)
extract_name_and_args(72-75)
lib/parsers/src/tool_calling/mod.rs (1)
lib/parsers/src/tool_calling/parsers.rs (2)
detect_and_parse_tool_call(36-63)try_tool_call_parse(9-33)
lib/parsers/src/tool_calling/parsers.rs (3)
lib/parsers/src/tool_calling/json_parser.rs (1)
try_tool_call_parse_json(152-308)lib/parsers/src/tool_calling/pythonic_parser.rs (1)
try_tool_call_parse_pythonic(151-177)lib/parsers/src/tool_calling/config.rs (1)
pythonic(131-136)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (8)
lib/parsers/src/tool_calling/json_parser.rs (2)
10-10: Import path change LGTM.Switching to
super::config::JsonParserConfigmatches the new module layout.
134-139: Clarify list handling behavior in json_parser.rs
Docstring states only the last tool call is returned, but the code collects and returns all parsed calls (order preserved). Confirm the intended contract and either update the docstring to “returns all parsed tool calls (order preserved)” or change the list branches to return only the last item. Applies also to lines 280–305.lib/parsers/src/tool_calling/tools.rs (1)
4-5: Re-export adjustments look good.
ToolCallConfiganddetect_and_parse_tool_callremain available viatools, aligning with the new config/parsers split.lib/parsers/src/tool_calling/mod.rs (1)
12-13: Drop the deprecated-alias suggestion –try_tool_call_parse_jsonis still exposed underjson_parser, so there’s no breaking change, and aliasing the generictry_tool_call_parsewould misalign its signature.Likely an incorrect or invalid review comment.
lib/parsers/src/tool_calling/pythonic_parser.rs (1)
316-343: Nice coverage for lists/dicts.Good recursive handling validation and nested dict assertions.
lib/parsers/src/tool_calling/config.rs (1)
38-49: Default start tokens include <|python_tag|>; verify false-positive risk.Including "<|python_tag|>" in defaults means any message with that token will trigger JSON extraction even when not a tool call. Confirm intended across models; otherwise, move it to model-specific presets only (e.g., llama3_json()).
lib/parsers/src/tool_calling/parsers.rs (2)
22-25: Wiring Pythonic path: LGTM.Delegating on ToolCallParserType::Pythonic to try_tool_call_parse_pythonic is correct.
47-48: Expose "pythonic" preset in detect_and_parse map: LGTM.Config registration looks consistent with other presets.
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: Krishnan Prashanth <kprashanth@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
Implements basic pythonic parser using AST parsing. Supports all three cases for function parsing
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Refactor
Dependencies
Tests
Chores