Skip to content

fix: improve code execution mode tool handling#6497

Closed
rabi wants to merge 1 commit intoblock:mainfrom
rabi:improve_code_execution
Closed

fix: improve code execution mode tool handling#6497
rabi wants to merge 1 commit intoblock:mainfrom
rabi:improve_code_execution

Conversation

@rabi
Copy link
Contributor

@rabi rabi commented Jan 14, 2026

Summary

  • Surface extension server instructions in system prompt by removing conditional code_execution_mode block from system.md
  • Use prefixed tool names (code_execution__*) consistently in MOIM, server instructions, and tool descriptions to prevent model confusion
  • Add explicit "NEVER use JSON.parse()" warnings since tool results are already parsed JavaScript objects
  • Set RepetitionInspector limit to 5 to prevent infinite loops when model repeatedly calls same tools

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Tested locally with Gemini and hybrid models like Nemotron.

@rabi
Copy link
Contributor Author

rabi commented Jan 16, 2026

Hey @alexhancock, PTAL. This makes code_execution work better with smaller hybrid models like Nemotron.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves code execution mode tool handling to better support models that omit tool prefixes.

Changes:

  • Added Code Execution Mode section to system prompt explaining that execute_code must be used to call other tools
  • Added fallback in dispatch to auto-prefix code_execution tools when models omit the prefix
  • Normalized extension names consistently in add_client and is_extension_enabled methods
  • Clarified in tool documentation that calls return parsed objects (never use JSON.parse)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
crates/goose/src/prompts/system.md Added Code Execution Mode section with examples and instructions for models
crates/goose/src/agents/extension_manager.rs Normalized extension names in add_client/is_extension_enabled; refactored dispatch fallback logic
crates/goose/src/agents/code_execution_extension.rs Added reminders about using execute_code; clarified that tool calls return parsed objects
crates/goose-cli/src/session/output.rs Updated rendering to handle both prefixed and non-prefixed code_execution tool names

@rabi rabi force-pushed the improve_code_execution branch from a95175e to 216dfc1 Compare January 31, 2026 08:59
claude-sonnet-4, o1, llama-3.2, deepseek-r1, etc).
These models have varying knowledge cut-off dates depending on when they were trained, but typically it's between 5-10
months prior to the current date.
{% if code_execution_mode %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this prompt info is duplicative with:

Server instructions: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR436

Tool description: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR735

It looks like right now maybe those never make it into the system prompt because of the {% if not code_execution_mode %} block?

But I am confused because I thought in my original change I did have a way to surface this info to the model.

What do you think about refactoring this such that when code mode is active the server instructions + tool descriptions end up included in this, and we use that as the way of surfacing this info to the model vs hardcoding a new copy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex. Yeah, I think the tool descriptions are sent (as part of the tool spec), but the server instructions don't surface to the system prompt when code_execution_mode is true.

The naming inconsistency i.e tools are prefixed (code_execution__execute_code) in tool list but server instructions and MOIM using unprefixed names (execute_code) confuses the model and it ends up many times making calls without prefix and going in a loop. Also, the model sees the tool signatures and wants to call them immediately, forgetting the code_execution wrapper is required.

I can change to surface the server instructions and test. However, I've seen #6765 and it surely looks promising and may work better with smaller models. if we decide to go with that, I can test that with Nemotron and abandon this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR and with my local testing it seems to work well with Nemotron-3-nano.

@rabi rabi force-pushed the improve_code_execution branch from 216dfc1 to 8e83a8c Compare February 2, 2026 05:35
Signed-off-by: rabi <ramishra@redhat.com>
@rabi rabi force-pushed the improve_code_execution branch from 8e83a8c to c6ab387 Compare February 2, 2026 06:03
@rabi rabi mentioned this pull request Feb 3, 2026
10 tasks
@rabi
Copy link
Contributor Author

rabi commented Feb 6, 2026

I've proposed a separate improvement PR after testing with pctx and smaller models.

@rabi rabi closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants