fix: improve code execution mode tool handling by rabi · Pull Request #6497 · block/goose

rabi · 2026-01-14T10:27:37Z

Summary

Surface extension server instructions in system prompt by removing conditional code_execution_mode block from system.md
Use prefixed tool names (code_execution__*) consistently in MOIM, server instructions, and tool descriptions to prevent model confusion
Add explicit "NEVER use JSON.parse()" warnings since tool results are already parsed JavaScript objects
Set RepetitionInspector limit to 5 to prevent infinite loops when model repeatedly calls same tools

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Tested locally with Gemini and hybrid models like Nemotron.

rabi · 2026-01-16T05:06:09Z

Hey @alexhancock, PTAL. This makes code_execution work better with smaller hybrid models like Nemotron.

Copilot

Pull request overview

This PR improves code execution mode tool handling to better support models that omit tool prefixes.

Changes:

Added Code Execution Mode section to system prompt explaining that execute_code must be used to call other tools
Added fallback in dispatch to auto-prefix code_execution tools when models omit the prefix
Normalized extension names consistently in add_client and is_extension_enabled methods
Clarified in tool documentation that calls return parsed objects (never use JSON.parse)

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
crates/goose/src/prompts/system.md	Added Code Execution Mode section with examples and instructions for models
crates/goose/src/agents/extension_manager.rs	Normalized extension names in add_client/is_extension_enabled; refactored dispatch fallback logic
crates/goose/src/agents/code_execution_extension.rs	Added reminders about using execute_code; clarified that tool calls return parsed objects
crates/goose-cli/src/session/output.rs	Updated rendering to handle both prefixed and non-prefixed code_execution tool names

alexhancock · 2026-01-28T16:27:22Z

crates/goose/src/prompts/system.md

 claude-sonnet-4, o1, llama-3.2, deepseek-r1, etc).
 These models have varying knowledge cut-off dates depending on when they were trained, but typically it's between 5-10
 months prior to the current date.
+{% if code_execution_mode %}


I worry that this prompt info is duplicative with:

Server instructions: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR436

Tool description: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR735

It looks like right now maybe those never make it into the system prompt because of the {% if not code_execution_mode %} block?

But I am confused because I thought in my original change I did have a way to surface this info to the model.

What do you think about refactoring this such that when code mode is active the server instructions + tool descriptions end up included in this, and we use that as the way of surfacing this info to the model vs hardcoding a new copy here.

Thanks Alex. Yeah, I think the tool descriptions are sent (as part of the tool spec), but the server instructions don't surface to the system prompt when code_execution_mode is true.

The naming inconsistency i.e tools are prefixed (code_execution__execute_code) in tool list but server instructions and MOIM using unprefixed names (execute_code) confuses the model and it ends up many times making calls without prefix and going in a loop. Also, the model sees the tool signatures and wants to call them immediately, forgetting the code_execution wrapper is required.

I can change to surface the server instructions and test. However, I've seen #6765 and it surely looks promising and may work better with smaller models. if we decide to go with that, I can test that with Nemotron and abandon this PR.

I've updated the PR and with my local testing it seems to work well with Nemotron-3-nano.

Signed-off-by: rabi <ramishra@redhat.com>

rabi · 2026-02-06T10:58:53Z

I've proposed a separate improvement PR after testing with pctx and smaller models.

DOsinga assigned alexhancock Jan 28, 2026

DOsinga requested a review from alexhancock January 28, 2026 15:40

michaelneale requested a review from Copilot January 29, 2026 04:18

Copilot started reviewing on behalf of michaelneale January 29, 2026 04:18 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

rabi force-pushed the improve_code_execution branch from a95175e to 216dfc1 Compare January 31, 2026 08:59

alexhancock reviewed Jan 31, 2026

View reviewed changes

rabi force-pushed the improve_code_execution branch from 216dfc1 to 8e83a8c Compare February 2, 2026 05:35

fix: improve code execution mode tool handling

c6ab387

Signed-off-by: rabi <ramishra@redhat.com>

rabi force-pushed the improve_code_execution branch from 8e83a8c to c6ab387 Compare February 2, 2026 06:03

rabi mentioned this pull request Feb 3, 2026

Use Port of Context (pctx) for code mode #6765

Merged

10 tasks

rabi closed this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve code execution mode tool handling#6497

fix: improve code execution mode tool handling#6497
rabi wants to merge 1 commit intoblock:mainfrom
rabi:improve_code_execution

rabi commented Jan 14, 2026 •

edited

Loading

Uh oh!

rabi commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

alexhancock Jan 28, 2026

Uh oh!

rabi Feb 2, 2026

Uh oh!

rabi Feb 2, 2026

Uh oh!

rabi commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rabi commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

AI Assistance

Testing

Uh oh!

rabi commented Jan 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

alexhancock Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

rabi Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

rabi Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

rabi commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rabi commented Jan 14, 2026 •

edited

Loading