fix: improve code execution mode tool handling#6497
Conversation
|
Hey @alexhancock, PTAL. This makes code_execution work better with smaller hybrid models like Nemotron. |
There was a problem hiding this comment.
Pull request overview
This PR improves code execution mode tool handling to better support models that omit tool prefixes.
Changes:
- Added Code Execution Mode section to system prompt explaining that execute_code must be used to call other tools
- Added fallback in dispatch to auto-prefix code_execution tools when models omit the prefix
- Normalized extension names consistently in add_client and is_extension_enabled methods
- Clarified in tool documentation that calls return parsed objects (never use JSON.parse)
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/goose/src/prompts/system.md | Added Code Execution Mode section with examples and instructions for models |
| crates/goose/src/agents/extension_manager.rs | Normalized extension names in add_client/is_extension_enabled; refactored dispatch fallback logic |
| crates/goose/src/agents/code_execution_extension.rs | Added reminders about using execute_code; clarified that tool calls return parsed objects |
| crates/goose-cli/src/session/output.rs | Updated rendering to handle both prefixed and non-prefixed code_execution tool names |
a95175e to
216dfc1
Compare
crates/goose/src/prompts/system.md
Outdated
| claude-sonnet-4, o1, llama-3.2, deepseek-r1, etc). | ||
| These models have varying knowledge cut-off dates depending on when they were trained, but typically it's between 5-10 | ||
| months prior to the current date. | ||
| {% if code_execution_mode %} |
There was a problem hiding this comment.
I worry that this prompt info is duplicative with:
Server instructions: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR436
Tool description: https://github.com/block/goose/pull/6497/changes#diff-40aeaabae1c639df10ef13ae9edf99b906c8bcbf761b0270597d74284b5fb80aR735
It looks like right now maybe those never make it into the system prompt because of the {% if not code_execution_mode %} block?
But I am confused because I thought in my original change I did have a way to surface this info to the model.
What do you think about refactoring this such that when code mode is active the server instructions + tool descriptions end up included in this, and we use that as the way of surfacing this info to the model vs hardcoding a new copy here.
There was a problem hiding this comment.
Thanks Alex. Yeah, I think the tool descriptions are sent (as part of the tool spec), but the server instructions don't surface to the system prompt when code_execution_mode is true.
The naming inconsistency i.e tools are prefixed (code_execution__execute_code) in tool list but server instructions and MOIM using unprefixed names (execute_code) confuses the model and it ends up many times making calls without prefix and going in a loop. Also, the model sees the tool signatures and wants to call them immediately, forgetting the code_execution wrapper is required.
I can change to surface the server instructions and test. However, I've seen #6765 and it surely looks promising and may work better with smaller models. if we decide to go with that, I can test that with Nemotron and abandon this PR.
There was a problem hiding this comment.
I've updated the PR and with my local testing it seems to work well with Nemotron-3-nano.
216dfc1 to
8e83a8c
Compare
Signed-off-by: rabi <ramishra@redhat.com>
8e83a8c to
c6ab387
Compare
|
I've proposed a separate improvement PR after testing with pctx and smaller models. |
Summary
Type of Change
AI Assistance
Testing
Tested locally with Gemini and hybrid models like Nemotron.