-
Notifications
You must be signed in to change notification settings - Fork 692
feat: add deepseek_v3_1 tool parser with lib refactoring #2832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ayushag <ayushag@nvidia.com>
WalkthroughAdds a DeepSeek v3.1 JSON tool-call parser and a JsonParserType dispatcher. Extends JsonParserConfig with parser_type and ToolCallConfig with deepseek_v3_1(). Renames the basic JSON parser function, wires new harmony/pythonic module facades, updates parser registry, and adds tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Parsers as parsers::detect_and_parse_tool_call
participant ToolCfg as ToolCallConfig
participant JSON as json::try_tool_call_parse_json
participant Basic as base_json_parser
participant Deepseek as deepseek_parser
Caller->>Parsers: detect_and_parse_tool_call(message, parser_key)
Parsers->>ToolCfg: lookup(parser_key)
ToolCfg-->>Parsers: ToolCallConfig{json.parser_type,...}
alt parser == "json"
Parsers->>JSON: try_tool_call_parse_json(message, JsonParserConfig)
alt JsonParserType::Basic
JSON->>Basic: try_tool_call_parse_basic_json(...)
Basic-->>JSON: (tool_calls, normal_text)
else JsonParserType::DeepseekV31
JSON->>Deepseek: parse_tool_calls_deepseek_v3_1(...)
Deepseek-->>JSON: (tool_calls, normal_text)
end
JSON-->>Parsers: (tool_calls, normal_text)
else other parser
Note over Parsers: Delegates to harmony/pythonic as configured
end
Parsers-->>Caller: (tool_calls, normal_text)
sequenceDiagram
autonumber
participant Deepseek as deepseek_parser
participant Regex as Cached Regexes
participant Serde as serde_json
Deepseek->>Deepseek: trim input
alt missing or no start token
Deepseek-->>Deepseek: return ([], Some(trimmed))
else tokens present
Deepseek->>Regex: get outer and inner patterns
loop each matched block
Deepseek->>Regex: extract name, args JSON
alt valid JSON
Deepseek->>Serde: parse args
Serde-->>Deepseek: Value
Deepseek-->>Deepseek: push ToolCallResponse{id:n, type:Function, name, arguments}
else invalid/missing
Note over Deepseek: skip block
end
end
Deepseek-->>Deepseek: return (tool_calls, Some(pre_text))
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/parsers/src/tool_calling/json/base_json_parser.rs (2)
307-308: Bug: wrong normal_text returned on parse miss.On failure to parse, the function drops the previously extracted normal_text and returns the entire trimmed input, which is inconsistent with earlier logic and with other parsers.
Apply this diff:
- Ok((vec![], Some(trimmed.to_string()))) + Ok((vec![], Some(normal_text)))
69-99: Single-token parsing can hijack control flow; also misses array payloads.
- Returning Some(rest-of-string) when no valid JSON is found causes the caller to break early and attempt to parse arbitrary text as JSON, skipping other tokens and losing fallback behavior.
- Segments starting with '[' (arrays) are ignored.
Apply this diff:
- // Split on the start token and keep only JSON-looking segments + // Split on the start token and keep only JSON-looking segments let mut items: Vec<String> = Vec::new(); for seg in input.split(start_token) { let s = seg.trim(); if s.is_empty() { continue; } - // Only consider segments that start like JSON - if s.starts_with('{') { - // Trim trailing non-JSON by cutting at the last closing brace/bracket - if let Some(pos) = s.rfind('}') { - let candidate = &s[..=pos].trim(); - // Keep only valid JSON candidates - if serde_json::from_str::<serde_json::Value>(candidate).is_ok() { - items.push(candidate.to_string()); - } - } - } + // Only consider segments that start like JSON objects or arrays + if s.starts_with('{') || s.starts_with('[') { + let close = if s.starts_with('[') { ']' } else { '}' }; + // Trim trailing non-JSON by cutting at the last closing brace/bracket + if let Some(pos) = s.rfind(close) { + let candidate = s[..=pos].trim(); + // Keep only valid JSON candidates + if serde_json::from_str::<serde_json::Value>(candidate).is_ok() { + items.push(candidate.to_string()); + } + } + } } - if items.is_empty() { - // Remove everything up to and including the first occurrence of the start token - if let Some(idx) = input.find(start_token) { - let rest = &input[idx + start_token.len()..]; - return Some(rest.trim_start().to_string()); - } else { - // Shouldn't happen because we checked contains() above, but be defensive - return None; - } - } + if items.is_empty() { + // No valid JSON segments found after the start token + return None; + } Some(format!("[{}]", items.join(",")))
🧹 Nitpick comments (10)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)
33-40: Tiny allocation nit: collect matches more idiomatically.You can collect directly without the mutable push loop.
Apply this diff:
fn get_regex_matches(message: &str) -> Vec<String> { - let re = get_pythonic_regex(); - let mut matches = Vec::new(); - for cap in re.find_iter(message) { - matches.push(cap.as_str().to_string()); - } - matches + let re = get_pythonic_regex(); + re.find_iter(message).map(|m| m.as_str().to_string()).collect() }lib/parsers/src/tool_calling/json/base_json_parser.rs (4)
195-221: Safer pairing of start/end tokens.zip silently drops extra tokens if vectors differ. Index with min length to avoid surprises and consider logging a warning when lengths mismatch.
Apply this diff:
- for (start_token, end_token) in tool_call_start_tokens - .iter() - .zip(tool_call_end_tokens.iter()) - { + let pairs_len = tool_call_start_tokens.len().min(tool_call_end_tokens.len()); + if tool_call_start_tokens.len() != tool_call_end_tokens.len() { + tracing::warn!( + "Mismatched tool-call token vectors: starts={}, ends={}. Using min length={}", + tool_call_start_tokens.len(), + tool_call_end_tokens.len(), + pairs_len + ); + } + for i in 0..pairs_len { + let start_token = &tool_call_start_tokens[i]; + let end_token = &tool_call_end_tokens[i];
111-151: Docs out of sync with signature and behavior.Return section and list-handling note describe an older API. Please update to reflect (Vec, Option) and that all items are returned in order.
Proposed doc tweaks (partial):
-/// # Return -/// -/// - `Ok(Some(ToolCallResponse))` if parsing succeeds -/// - `Ok(None)` if input format is unrecognized or invalid JSON -/// - `Err(...)` if JSON is valid but deserialization or argument re-serialization fails +/// # Return +/// +/// - `Ok((Vec<ToolCallResponse>, Option<String>))` +/// - Vec<ToolCallResponse>: zero or more parsed tool calls (in order) +/// - Option<String>: "normal" text (prefix before the first tool-call token), Some("") if none +/// - `Err(...)` only if argument re-serialization fails @@ -/// When the input contains a list of tool calls (either with `parameters` or `arguments`), -/// only the **last item** in the list is returned. This design choice assumes that the -/// most recent tool call in a list is the one to execute. +/// When the input contains a list of tool calls (either with `parameters` or `arguments`), +/// all items are returned in order.
231-240: Inconsistent call-id format vs. other parsers.Other parsers use monotonically increasing "call-1", "call-2". Here it's "call-". Consider aligning for consistency or documenting the difference.
If you prefer counters:
- Ok(ToolCallResponse { - id: format!("call-{}", Uuid::new_v4()), + Ok(ToolCallResponse { + id: format!("call-{}", Uuid::new_v4()), // consider a per-message counter like in Harmony/Pythonic
13-26: Optional: unify parameters/arguments parsing to reduce duplication.A single enum with serde aliases can collapse the two structs.
Sketch:
#[derive(Deserialize)] #[serde(untagged)] enum FnShape { Params { name: String, parameters: HashMap<String, Value> }, Args { name: String, arguments: HashMap<String, Value> }, }Then parse Vec and normalize.
lib/parsers/src/tool_calling/parsers.rs (1)
1176-1188: Basic DeepSeek v3.1 path test is solid; please add a multiline-JSON case.DeepSeek outputs often include newlines/pretty JSON. Add a test where arguments span multiple lines to guard regressions.
Would you like me to push a test snippet for multiline args?
lib/parsers/src/tool_calling/json/mod.rs (1)
22-30: Dispatcher is correct; consider a non_exhaustive enum as a minor hardening.Optional: mark JsonParserType as #[non_exhaustive] to keep match sites future-proof in downstream crates.
lib/parsers/src/tool_calling/json/deepseek_parser.rs (3)
45-53: Restrict parsing to the wrapper region for robustness.You check for the start token but then scan the entire message; stray tool-call blocks before the wrapper could be captured. Slice to the region between the first start and configured end token when present.
Example adjustment (outside this hunk):
- Find indices of the first start token and the first end token after it.
- Run regex over that substring only; fall back to full trim if end token missing.
99-106: Normal text handling matches other parsers; optional enhancement.Currently returns only prefix text. If you need trailing text too, consider returning both or concatenating them; otherwise this is consistent with Basic JSON.
118-135: Tests cover basics; please add a multiline-arguments case.Add a case like:
- function: get_current_weather
- arguments with pretty JSON and newlines
This will validate the DOTALL fix.I can supply the test body if helpful.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (9)
lib/parsers/src/tool_calling/config.rs(4 hunks)lib/parsers/src/tool_calling/harmony/mod.rs(1 hunks)lib/parsers/src/tool_calling/json/base_json_parser.rs(1 hunks)lib/parsers/src/tool_calling/json/deepseek_parser.rs(1 hunks)lib/parsers/src/tool_calling/json/mod.rs(1 hunks)lib/parsers/src/tool_calling/mod.rs(1 hunks)lib/parsers/src/tool_calling/parsers.rs(4 hunks)lib/parsers/src/tool_calling/pythonic/mod.rs(1 hunks)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
lib/parsers/src/tool_calling/mod.rs (5)
lib/parsers/src/tool_calling/config.rs (2)
harmony(143-152)pythonic(136-141)lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
parse_tool_calls_harmony(22-159)lib/parsers/src/tool_calling/json/mod.rs (1)
try_tool_call_parse_json(22-30)lib/parsers/src/tool_calling/parsers.rs (2)
detect_and_parse_tool_call(63-87)try_tool_call_parse(35-60)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)
try_tool_call_parse_pythonic(162-188)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
parse_tool_calls_harmony(22-159)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
lib/parsers/src/tool_calling/config.rs (2)
default(43-53)default(66-71)
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)
try_tool_call_parse_pythonic(162-188)
lib/parsers/src/tool_calling/json/mod.rs (2)
lib/parsers/src/tool_calling/json/base_json_parser.rs (1)
try_tool_call_parse_basic_json(152-308)lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
parse_tool_calls_deepseek_v3_1(30-107)
lib/parsers/src/tool_calling/parsers.rs (5)
lib/parsers/src/tool_calling/config.rs (3)
harmony(143-152)pythonic(136-141)deepseek_v3_1(154-164)lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
parse_tool_calls_harmony(22-159)extract_name_and_args(165-168)lib/parsers/src/tool_calling/json/mod.rs (1)
try_tool_call_parse_json(22-30)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
try_tool_call_parse_pythonic(162-188)extract_name_and_args(194-197)lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
extract_name_and_args(113-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (11)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)
25-25: No-op whitespace change — OK to merge.No functional impact.
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
4-7: Re-exports look clean and maintain public API stability.No issues.
lib/parsers/src/tool_calling/harmony/mod.rs (1)
4-7: Module facade is straightforward and consistent with pythonic/json.No concerns.
lib/parsers/src/tool_calling/mod.rs (1)
14-18: Public re-exports look good and keep callers stable.Dispatcher try_tool_call_parse_json now fronts Basic vs. DeepseekV31.
lib/parsers/src/tool_calling/parsers.rs (3)
5-7: Module import reorg looks good.Clear separation across harmony/json/pythonic modules improves readability and encapsulation.
25-25: Registering deepseek_v3_1 in the global parser map is correct.No collisions; consistent with other keys.
115-116: Availability test updated correctly.List includes deepseek_v3_1; assertions are order-agnostic.
lib/parsers/src/tool_calling/config.rs (3)
4-5: New dependency on JsonParserType is fine.Keeps config-driven dispatch centralized.
51-51: Sane default.Defaulting to Basic preserves existing behavior.
154-164: DeepSeek v3.1 config constructor LGTM.Start/end tokens and parser_type are correctly wired.
lib/parsers/src/tool_calling/json/mod.rs (1)
14-20: Enum-based JSON parser dispatch is a good extensibility point.Simple to add future model-specific parsers without touching callers.
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com>
Signed-off-by: ayushag <ayushag@nvidia.com> Signed-off-by: nnshah1 <neelays@nvidia.com>
Overview:
This PR
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Refactor