fix: escape control characters in LLM tool call arguments JSON by bjulian5 · Pull Request #2893 · block/goose

bjulian5 · 2025-06-12T21:30:40Z

Add json_escape_control_chars_in_string() utility function
Apply fix to OpenAI and Databricks response parsers
Enhance error messages with raw and processed arguments
Add comprehensive test coverage for control character escaping

Testing

✅ Added unit tests for the new utility function covering various control character scenarios
✅ Updated existing tests to verify empty argument handling
✅ Manual testing with both OpenAI providers
✅ Verified that valid JSON arguments are unchanged
✅ Confirmed that structural JSON elements (quotes, backslashes) are preserved

Backwards Compatibility

This change is fully backwards compatible. It only affects the processing of malformed JSON arguments that would have previously failed to parse. Valid JSON arguments continue to work exactly as before.

Related Issues

Fixes #2892

michaelneale · 2025-06-13T06:27:32Z

looks nice - @salman1993 is active in there with goose-llm crate so probably good to get a blessing there.

salman1993

goose-llm crate is currently not used by goose. can you please describe where / how you ran into this error? #2892

i have a feeling the bug is in goose crate and not goose-llm

bjulian5 · 2025-06-26T22:24:39Z

@salman1993 I ran into this error using the goose cli. I fixed in the goose crate and found that the exact same code was in goose-llm so added a fix there as well.

bjulian5 · 2025-07-01T13:41:27Z

@salman1993 bump 🙏

taniacryptid · 2025-07-07T21:15:26Z

Let me bump this to the goose dev team for you!

angiejones · 2025-07-07T21:33:04Z

I removed the changes to crates/goose-llm. @salman1993 can you take another look? other users are indicating this PR will fix their issues as well

DOsinga

I think we should move the escaping level one higher - see my remark and then it should be good to go

DOsinga · 2025-07-07T21:37:17Z

crates/goose/src/providers/formats/databricks.rs

+                // Escape literal control characters in the arguments string to make it valid JSON.
+                // This handles cases where the LLM might output raw newlines or other control
+                // characters within string values in the JSON arguments.
+                let escaped_arguments = json_escape_control_chars_in_string(&arguments_str);


so what's happening here is that we receive a json document from the provider that almost certainly is correct json, but that the arguments element in it is doubly encoded json and that second tag is not always entirely correct?

if so I think we should slightly change the approach here - move the entire json parsing into utils and call it safely_parse_json and first try if it parses without your replacements and only if it doesn't do the escape replacements.

arguments could be something like {"key1": "value1",\n"key2": "value"} which contains a \n but is perfectly fine, while you are trying to fix {"key1": "value1\n","key2": "value"} which is not

@DOsinga updated the PR with your suggestions. PTAL @DOsinga @salman1993 🙏

DOsinga · 2025-07-19T14:51:17Z

can you do the DCO thing and we can get this in 🙏 https://github.com/block/goose/pull/2893/checks?check_run_id=46249519944

michaelneale · 2025-07-21T00:54:28Z

oh this is a nice one, kudos

- Add json_escape_control_chars_in_string() utility function - Apply fix to OpenAI and Databricks response parsers - Enhance error messages with raw and processed arguments - Add comprehensive test coverage for control character escaping Signed-off-by: Julian Brown <contact@julianbrown.dev>

Signed-off-by: Julian Brown <contact@julianbrown.dev>

bjulian5 · 2025-07-22T02:13:43Z

@DOsinga done!

DOsinga · 2025-07-29T09:05:13Z

thanks!

veriditin · 2025-07-29T14:37:38Z

Hi! I've been tracking this issue and it's nice that most control characters are now properly escaped, but I think a few things were left on the table.

The quotation mark " can in MOST cases also be unambiguously escaped without breaking the JSON document. I also find my LLM of choice (qwen-3) makes the most mistakes in this category, so that would be nice to get fixed as well.

JSON is overly strictly defined in this sense. Escaping the quotation mark is only strictly necessary if the first non-white space character is a "meaningful" one, i.e. one of ,, :, ], }. If it is not, we know for sure we are still inside a String type and can replace the " with \". If it is followed by a meaning character (ignoring whitespace), we cannot know for sure so we should leave it alone.

{"contents": "yeah, this is an awesome string one might say "super" cool"} can be unambiguously escaped to: {"contents": "yeah, this is an awesome string one might say \"super\" cool"} as the first quotation mark within the contents value is followed by an s and the second one is followed by a c.

I additionally find that Qwen-3 will sometimes fail to properly close the JSON, i.e. to only use one } instead of }} if the final object is nested inside another object. I find it useful to keep track of the amount of opened objects and lists (in a stack / in order) and if the string ends with objects/list still open, to simply close them. So:

{"name": "John", "age": 30, "city": "New York" would be fixed to {"name": "John", "age": 30, "city": "New York"}

and combining the two:

{"items": [{"name": "item1", "desc": "A "great" item"}, {"name": "item2"} would be fixed to {"items": [{"name": "item1", "desc": "A \"great\" item"}, {"name": "item2"}]}

DOsinga · 2025-07-29T15:36:12Z

good points, @veriditin - but at that point I think that search & replace with regexps is not going to do it and we're going to need a tolerant json parser - https://crates.io/crates/json_partial looks like an option? not sure it does your " in the middle of a string though

veriditin · 2025-07-29T21:12:48Z

Looking at their examples as to what kind of errors can be fixed:

I think the comma fixer in this crate might clash with the quotation mark fix I'm suggesting (which in my experience is a more common problem than missing commas). So.. it seems this would need some careful consideration as to which types of problems happen significantly often and can be fixed independently of each other.. It also doesn't feel like this crate is making the best trade-offs for the goose/agentic use-case.

* main: chore: small refactor on agent.rs (#3703) docs: Add GitMCP Tutorial to Extensions Library (#3716) chore: Speed up CI (#3711) Fix tool vector tests (#3709) docs: GitMCP Tutorial (#3708) Remove unused dependencies (#3626) feat: update Groq models for better tool calling support (#3676) chore: remove ffi libraries and related code (#3699) only run google analytics in prod (#3395) Fix typo in quickstart document (#3447) fix: pricing estimation for OpenRouter in goose-cli (#3675) fix: escape control characters in LLM tool call arguments JSON (#2893) feat(githubcopilot): add ability to fetch supported models (#2717) Create a message ID for tool response messages (#3591) fix: Fixed 404 broken link to extensions page in index.md (#3623)

…#2893) Signed-off-by: Julian Brown <contact@julianbrown.dev> Co-authored-by: Julian Brown <jbrown@stripe.com> Signed-off-by: Adam Tarantino <tarantino.adam@hey.com>

michaelneale requested a review from salman1993 June 13, 2025 06:26

bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 2dd1d0e to 97ca274 Compare June 13, 2025 17:22

salman1993 requested changes Jun 25, 2025

View reviewed changes

bjulian5 requested a review from salman1993 June 27, 2025 14:54

taniacryptid requested review from DOsinga, alexhancock, blackgirlbytes and spencrmartin and removed request for salman1993 July 7, 2025 21:16

angiejones force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 97ca274 to 8f36f4e Compare July 7, 2025 21:30

angiejones assigned salman1993 and angiejones Jul 7, 2025

DOsinga approved these changes Jul 7, 2025

View reviewed changes

angiejones assigned DOsinga and unassigned salman1993 and angiejones Jul 7, 2025

bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from e450aab to 5fb38da Compare July 8, 2025 17:50

bjulian5 requested review from DOsinga and salman1993 July 11, 2025 14:45

DOsinga approved these changes Jul 16, 2025

View reviewed changes

DOsinga removed the request for review from salman1993 July 18, 2025 10:09

salman1993 approved these changes Jul 18, 2025

View reviewed changes

PR suggestions

a93c619

Signed-off-by: Julian Brown <contact@julianbrown.dev>

bjulian5 force-pushed the jbrown/fix-tool-calls-with-malformed-json branch from 52b5e31 to a93c619 Compare July 22, 2025 02:13

DOsinga merged commit bc25308 into block:main Jul 29, 2025
7 checks passed

Conversation

bjulian5 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Backwards Compatibility

Related Issues

Uh oh!

michaelneale commented Jun 13, 2025

Uh oh!

salman1993 left a comment

Choose a reason for hiding this comment

Uh oh!

bjulian5 commented Jun 26, 2025

Uh oh!

bjulian5 commented Jul 1, 2025

Uh oh!

taniacryptid commented Jul 7, 2025

Uh oh!

angiejones commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

DOsinga Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

bjulian5 Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

DOsinga commented Jul 19, 2025

Uh oh!

michaelneale commented Jul 21, 2025

Uh oh!

bjulian5 commented Jul 22, 2025

Uh oh!

Uh oh!

DOsinga commented Jul 29, 2025

Uh oh!

veriditin commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DOsinga commented Jul 29, 2025

Uh oh!

veriditin commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

bjulian5 commented Jun 12, 2025 •

edited

Loading

angiejones commented Jul 7, 2025 •

edited

Loading

veriditin commented Jul 29, 2025 •

edited

Loading

veriditin commented Jul 29, 2025 •

edited

Loading