Skip to content

Comments

fix: handle truncated tool calls that break conversation alternation#7424

Merged
tlongwell-block merged 1 commit intomainfrom
fix/trailing-assistant-root-cause
Feb 23, 2026
Merged

fix: handle truncated tool calls that break conversation alternation#7424
tlongwell-block merged 1 commit intomainfrom
fix/trailing-assistant-root-cause

Conversation

@tlongwell-block
Copy link
Collaborator

@tlongwell-block tlongwell-block commented Feb 22, 2026

Summary

When a model response hits max_tokens mid-tool-call, the streaming handler creates a tool_request with a parse error (truncated JSON). The agent loop counts it as a tool request but silently skips it — no tool request message, no tool response added to the conversation. The text preamble from the truncated response becomes a dangling trailing assistant message. The loop continues, and on the next iteration the conversation is sent to the API ending with an assistant message, which crashes on providers that reject prefill (Claude 4.6+, Snowflake, others).

This was latent — older models silently tolerated trailing assistant messages via prefill support. Claude Opus 4.6 (Feb 5) removed prefill as a documented breaking change, exposing it. Multiple users have reported the resulting "does not support assistant message prefill" errors since early February.

The fix

When a tool call can't be parsed, append a plain user text message telling the model the call was likely truncated and to try breaking the task into smaller steps. This maintains conversation alternation without using tool protocol types that serialize inconsistently across providers.

Type of Change

  • Bug fix

Testing

All existing tests pass. The bug requires the full agent streaming loop to reproduce (provider stream → truncated tool call → silent skip → trailing assistant), so a unit test is not practical without significant test infrastructure investment.

See also

@tlongwell-block tlongwell-block force-pushed the fix/trailing-assistant-root-cause branch from 1d8ab66 to b452098 Compare February 23, 2026 00:09
@tlongwell-block tlongwell-block changed the title fix: clear stale messages after retry resets conversation fix: handle truncated tool calls that break conversation alternation Feb 23, 2026
@tlongwell-block tlongwell-block marked this pull request as ready for review February 23, 2026 13:49
@tlongwell-block
Copy link
Collaborator Author

Testing with my local goose

  ━━╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ 9% 18k/200k
🪿 Now I want you to help me test my fix. YOU are built with this change included in YOUR code. Do you understand that? I want you to aknowledge that, then I want to to attempt to directly write a giant file to ~/scratch. Ideally, you should see that error from the PR and it should tell you to try doing it in multiple tool calls. Don't do that, just let me know what error you get and if the PR fix is working as intended.



Yes, I understand — I'm running on a build of goose that includes PR #7424. So if my response gets truncated mid-tool-call due to hitting `max_tokens`, I should see that new error message telling me the tool call couldn't be parsed and to break the task into smaller steps.

Let me try to write a massive file in a single tool call. I'll generate something absurdly large to try to trigger the truncation.-32602: Could not interpret tool use parameters for id toolu_018gedtVovzfTUzZo4KThmnm: EOF while parsing an object at line 1 column 75[system: Tool call could not be parsed: -32602: Could not interpret tool use parameters for id toolu_018gedtVovzfTUzZo4KThmnm: EOF while parsing an object at line 1 column 75. The response may have been truncated. Try breaking the task into smaller steps.]There it is! The PR fix is working exactly as intended. Here's what happened:

1. **I attempted** to write a massive file in a single tool call
2. **My response hit `max_tokens`** mid-tool-call, truncating the JSON payload
3. **Before the fix**: This would have been silently skipped, leaving a dangling assistant message, and the next API call would crash with `"does not support assistant message prefill"`
4. **With the fix**: I got the new error message:

> `[system: Tool call could not be parsed: -32602: Could not interpret tool use parameters for id toolu_018gedtVovzfTUzZo4KThmnm: EOF while parsing an object at line 1 column 75. The response may have been truncated. Try breaking the task into smaller steps.]`

Key observations:
- **Conversation alternation is maintained** — the error was injected as a user message, so the conversation stays valid (assistant → user)
- **I didn't crash** — the agent loop continued normally and I'm able to respond to you right now
- **The error is actionable** — it tells me the parse failed, why (truncated JSON / EOF), and what to do about it (break into smaller steps)
- **The error code `-32602`** (invalid params) confirms it was indeed truncated JSON that couldn't be deserialized

@tlongwell-block
Copy link
Collaborator Author

/goose

@github-actions
Copy link
Contributor

PR #7424 Review: fix: handle truncated tool calls that break conversation alternation

Summary: This PR correctly fixes a conversation alternation issue where truncated tool calls (from max_tokens limits) create trailing assistant messages that crash newer providers like Claude 4.6. The fix adds a user message when tool parsing fails, maintaining proper alternation. The approach is sound and minimal.

✅ Highlights

  1. Correct Root Cause Fix: The fix addresses the actual problem - when tool_call.is_err(), the original filtered_response is still yielded (line 1202), creating an assistant message. Previously nothing was added for error cases, leaving the conversation with a trailing assistant message. Now a user message is added to maintain alternation.

  2. Works with Existing Infrastructure: The merge_consecutive_messages function (conversation/mod.rs:401) handles the case where multiple error tool calls could create consecutive user messages - they get merged automatically.

  3. Informative Error Message: The error includes both the original error detail and actionable guidance ("Try breaking the task into smaller steps").

🟢 Suggestions

  1. Consider Using with_system_notification (line 1447-1455):

    The nearby ContextLengthExceeded handling (line 1468-1472) uses with_system_notification(SystemNotificationType::InlineMessage, ...) for similar error messaging. However, that sets agent_visible: false, which would defeat the purpose here since the LLM needs to see this error to adjust. The current approach using Message::user().with_text(...) is actually correct because the LLM needs visibility into the error.

    That said, the "[system: ...]" prefix is a new pattern. Consider if there's a more structured way to indicate this is a system-generated message, though the current approach works fine.

  2. Minor: Error Message Clarity:

    The error includes e (the ErrorData) which may contain technical details. The user-facing suggestion is good. Could consider:

    format!(
        "[system: The model's response was truncated while generating a tool call. Try breaking the task into smaller steps.]\n\nTechnical details: {}",
        e
    )

    This separates the actionable advice from the technical details. This is very minor and the current format is acceptable.

Verification Notes

Traced the data flow:

  • Line 1202: filtered_response (with error tool request) is YIELDED → goes to UI/server's all_messages
  • Line 1206-1213: Response is NOT added to messages_to_add when num_tool_requests > 0
  • Line 1422-1441: For OK requests, creates assistant+user message pairs
  • Line 1447-1455 (NEW): For ERR requests, creates user error message

Mixed OK/ERR scenario verified: If there are 3 tool requests (OK, ERR, OK):

  1. OK → assistant msg + user tool response
  2. ERR → user error message
  3. OK → assistant msg + user tool response

The consecutive user messages (from #1 response + #2 error) are handled by merge_consecutive_messages.


Review generated by goose

@tlongwell-block tlongwell-block added this pull request to the merge queue Feb 23, 2026
@DOsinga
Copy link
Collaborator

DOsinga commented Feb 23, 2026

this is good. we could recover from this though if we would do the pre-fill thing, yeah? or do we actually have that, it is just that using anthropic through databricks kills this?

@tlongwell-block
Copy link
Collaborator Author

this is good. we could recover from this though if we would do the pre-fill thing, yeah? or do we actually have that, it is just that using anthropic through databricks kills this?

The agent should recover automatically after getting the error message and continue working with smaller tools calls without user intervention

Merged via the queue into main with commit 33af644 Feb 23, 2026
35 of 41 checks passed
@tlongwell-block tlongwell-block deleted the fix/trailing-assistant-root-cause branch February 23, 2026 14:46
@DOsinga
Copy link
Collaborator

DOsinga commented Feb 23, 2026

The agent should recover automatically after getting the error message and continue working with smaller tools calls without user intervention

no, I understand. it is just that if I remember correctly, anthropic has this thing where you can start a request with the the assistant pre-filled and so if you run out of output tokens, you can repost what you had so far as prefill and you can create arbitrarily large replies. I wonder if that is already supported, just not through databricks

zanesq added a commit that referenced this pull request Feb 23, 2026
…oviders

* 'main' of github.com:block/goose:
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  Client settings (#7381)
  Fix settings tabs getting cut off in narrow windows (#7379)

# Conflicts:
#	ui/desktop/src/components/settings/dictation/DictationSettings.tsx
michaelneale added a commit that referenced this pull request Feb 23, 2026
…xt-edit

* origin/main: (35 commits)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  ...
lifeizhou-ap added a commit that referenced this pull request Feb 24, 2026
* main:
  Simplified custom model flow with canonical models (#6934)
  feat: simplify the text editor to be more like pi (#7426)
  docs: add YouTube short embed to Neighborhood extension tutorial (#7456)
  fix: flake.nix build failure and deprecation warning (#7408)
  feat(claude-code): add permission prompt routing for approve mode (#7420)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
lifeizhou-ap added a commit that referenced this pull request Feb 24, 2026
* main: (171 commits)
  fix: TLDR CLI tab in Neighborhood MCP docs (#7461)
  fix(summon): restore skill supporting files and directory path in load output (#7457)
  Simplified custom model flow with canonical models (#6934)
  feat: simplify the text editor to be more like pi (#7426)
  docs: add YouTube short embed to Neighborhood extension tutorial (#7456)
  fix: flake.nix build failure and deprecation warning (#7408)
  feat(claude-code): add permission prompt routing for approve mode (#7420)
  docs: generate manpages (#7443)
  Blog/goose v1 25 0 release (#7433)
  fix: detect truncated LLM responses in apps extension (#7354)
  fix: removed unnecessary version for goose acp macro dependency (#7428)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  ...
aharvard added a commit that referenced this pull request Feb 24, 2026
* origin/main: (49 commits)
  add flag to hide select voice providers (#7406)
  New navigation settings layout options and styling (#6645)
  refactor: MCP-compliant theme tokens and CSS class rename (#7275)
  Redirect llama.cpp logs through tracing to avoid polluting CLI stdout/stderr (#7434)
  refactor: change open recipe in new window to pass recipe id (#7392)
  fix: handle truncated tool calls that break conversation alternation (#7424)
  streamline some github actions (#7430)
  Enable bedrock prompt cache (#6710)
  fix: use BEGIN IMMEDIATE to prevent SQLite deadlocks (#7429)
  Display working dir (#7419)
  dev: add cmake to hermitized env (#7399)
  refactor: remove allows_unlisted_models flag, always allow custom model entry (#7255)
  feat: expose context window utilization to agent via MOIM (#7418)
  Small model naming (#7394)
  chore(deps): bump ajv in /documentation (#7416)
  doc: groq models (#7404)
  Client settings (#7381)
  Fix settings tabs getting cut off in narrow windows (#7379)
  docs: voice dictation updates (#7396)
  [docs] Add Excalidraw MCP App Tutorial (#7401)
  ...

# Conflicts:
#	ui/desktop/src/components/McpApps/McpAppRenderer.tsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants