Skip to content

fix: escape control characters in LLM tool call arguments JSON#2893

Merged
DOsinga merged 2 commits intoblock:mainfrom
bjulian5:jbrown/fix-tool-calls-with-malformed-json
Jul 29, 2025
Merged

fix: escape control characters in LLM tool call arguments JSON#2893
DOsinga merged 2 commits intoblock:mainfrom
bjulian5:jbrown/fix-tool-calls-with-malformed-json

Conversation

@bjulian5
Copy link
Contributor

@bjulian5 bjulian5 commented Jun 12, 2025

  • Add json_escape_control_chars_in_string() utility function
  • Apply fix to OpenAI and Databricks response parsers
  • Enhance error messages with raw and processed arguments
  • Add comprehensive test coverage for control character escaping

Testing

  • ✅ Added unit tests for the new utility function covering various control character scenarios
  • ✅ Updated existing tests to verify empty argument handling
  • ✅ Manual testing with both OpenAI providers
  • ✅ Verified that valid JSON arguments are unchanged
  • ✅ Confirmed that structural JSON elements (quotes, backslashes) are preserved

Backwards Compatibility

This change is fully backwards compatible. It only affects the processing of malformed JSON arguments that would have previously failed to parse. Valid JSON arguments continue to work exactly as before.

Related Issues

Fixes #2892

@michaelneale michaelneale requested a review from salman1993 June 13, 2025 06:26
@michaelneale
Copy link
Collaborator

looks nice - @salman1993 is active in there with goose-llm crate so probably good to get a blessing there.

@bjulian5 bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 2dd1d0e to 97ca274 Compare June 13, 2025 17:22
Copy link
Contributor

@salman1993 salman1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose-llm crate is currently not used by goose. can you please describe where / how you ran into this error? #2892

i have a feeling the bug is in goose crate and not goose-llm

@bjulian5
Copy link
Contributor Author

@salman1993 I ran into this error using the goose cli. I fixed in the goose crate and found that the exact same code was in goose-llm so added a fix there as well.

@bjulian5 bjulian5 requested a review from salman1993 June 27, 2025 14:54
@bjulian5
Copy link
Contributor Author

bjulian5 commented Jul 1, 2025

@salman1993 bump 🙏

@taniacryptid
Copy link
Contributor

Let me bump this to the goose dev team for you!

@taniacryptid taniacryptid requested review from DOsinga, alexhancock, blackgirlbytes and spencrmartin and removed request for salman1993 July 7, 2025 21:16
@angiejones angiejones force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 97ca274 to 8f36f4e Compare July 7, 2025 21:30
@angiejones
Copy link
Collaborator

angiejones commented Jul 7, 2025

I removed the changes to crates/goose-llm. @salman1993 can you take another look? other users are indicating this PR will fix their issues as well

Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the escaping level one higher - see my remark and then it should be good to go

// Escape literal control characters in the arguments string to make it valid JSON.
// This handles cases where the LLM might output raw newlines or other control
// characters within string values in the JSON arguments.
let escaped_arguments = json_escape_control_chars_in_string(&arguments_str);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what's happening here is that we receive a json document from the provider that almost certainly is correct json, but that the arguments element in it is doubly encoded json and that second tag is not always entirely correct?

if so I think we should slightly change the approach here - move the entire json parsing into utils and call it safely_parse_json and first try if it parses without your replacements and only if it doesn't do the escape replacements.

arguments could be something like {"key1": "value1",\n"key2": "value"} which contains a \n but is perfectly fine, while you are trying to fix {"key1": "value1\n","key2": "value"} which is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DOsinga updated the PR with your suggestions. PTAL @DOsinga @salman1993 🙏

@angiejones angiejones assigned DOsinga and unassigned salman1993 and angiejones Jul 7, 2025
@bjulian5 bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from e450aab to 5fb38da Compare July 8, 2025 17:50
@bjulian5 bjulian5 requested review from DOsinga and salman1993 July 11, 2025 14:45
@DOsinga DOsinga removed the request for review from salman1993 July 18, 2025 10:09
@DOsinga
Copy link
Collaborator

DOsinga commented Jul 19, 2025

can you do the DCO thing and we can get this in 🙏 https://github.com/block/goose/pull/2893/checks?check_run_id=46249519944

@michaelneale
Copy link
Collaborator

oh this is a nice one, kudos

- Add json_escape_control_chars_in_string() utility function
- Apply fix to OpenAI and Databricks response parsers
- Enhance error messages with raw and processed arguments
- Add comprehensive test coverage for control character escaping

Signed-off-by: Julian Brown <contact@julianbrown.dev>
Signed-off-by: Julian Brown <contact@julianbrown.dev>
@bjulian5 bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 52b5e31 to a93c619 Compare July 22, 2025 02:13
@bjulian5
Copy link
Contributor Author

@DOsinga done!

@DOsinga DOsinga merged commit bc25308 into block:main Jul 29, 2025
7 checks passed
@DOsinga
Copy link
Collaborator

DOsinga commented Jul 29, 2025

thanks!

@veriditin
Copy link

veriditin commented Jul 29, 2025

Hi! I've been tracking this issue and it's nice that most control characters are now properly escaped, but I think a few things were left on the table.

The quotation mark " can in MOST cases also be unambiguously escaped without breaking the JSON document. I also find my LLM of choice (qwen-3) makes the most mistakes in this category, so that would be nice to get fixed as well.

JSON is overly strictly defined in this sense. Escaping the quotation mark is only strictly necessary if the first non-white space character is a "meaningful" one, i.e. one of ,, :, ], }. If it is not, we know for sure we are still inside a String type and can replace the " with \". If it is followed by a meaning character (ignoring whitespace), we cannot know for sure so we should leave it alone.

{"contents": "yeah, this is an awesome string one might say "super" cool"} can be unambiguously escaped to: {"contents": "yeah, this is an awesome string one might say \"super\" cool"} as the first quotation mark within the contents value is followed by an s and the second one is followed by a c.

I additionally find that Qwen-3 will sometimes fail to properly close the JSON, i.e. to only use one } instead of }} if the final object is nested inside another object. I find it useful to keep track of the amount of opened objects and lists (in a stack / in order) and if the string ends with objects/list still open, to simply close them. So:

{"name": "John", "age": 30, "city": "New York" would be fixed to {"name": "John", "age": 30, "city": "New York"}

and combining the two:

{"items": [{"name": "item1", "desc": "A "great" item"}, {"name": "item2"} would be fixed to {"items": [{"name": "item1", "desc": "A \"great\" item"}, {"name": "item2"}]}

@DOsinga
Copy link
Collaborator

DOsinga commented Jul 29, 2025

good points, @veriditin - but at that point I think that search & replace with regexps is not going to do it and we're going to need a tolerant json parser - https://crates.io/crates/json_partial looks like an option? not sure it does your " in the middle of a string though

@veriditin
Copy link

veriditin commented Jul 29, 2025

Looking at their examples as to what kind of errors can be fixed:

image

I think the comma fixer in this crate might clash with the quotation mark fix I'm suggesting (which in my experience is a more common problem than missing commas). So.. it seems this would need some careful consideration as to which types of problems happen significantly often and can be fixed independently of each other.. It also doesn't feel like this crate is making the best trade-offs for the goose/agentic use-case.

michaelneale added a commit that referenced this pull request Jul 30, 2025
* main:
  chore: small refactor on agent.rs (#3703)
  docs: Add GitMCP Tutorial to Extensions Library (#3716)
  chore: Speed up CI (#3711)
  Fix tool vector tests (#3709)
  docs: GitMCP Tutorial  (#3708)
  Remove unused dependencies (#3626)
  feat: update Groq models for better tool calling support (#3676)
  chore: remove ffi libraries and related code (#3699)
  only run google analytics in prod (#3395)
  Fix typo in quickstart document (#3447)
  fix: pricing estimation for OpenRouter in goose-cli (#3675)
  fix: escape control characters in LLM tool call arguments JSON (#2893)
  feat(githubcopilot): add ability to fetch supported models (#2717)
  Create a message ID for tool response messages (#3591)
  fix: Fixed 404 broken link to extensions page in index.md (#3623)
atarantino pushed a commit to atarantino/goose that referenced this pull request Aug 5, 2025
…#2893)

Signed-off-by: Julian Brown <contact@julianbrown.dev>
Co-authored-by: Julian Brown <jbrown@stripe.com>
Signed-off-by: Adam Tarantino <tarantino.adam@hey.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM tool calls fail with JSON parsing errors due to unescaped control characters in arguments

8 participants