feat(mcp): platform extension for "code mode" MCP tool calling by alexhancock · Pull Request #6030 · block/goose

alexhancock · 2025-12-09T21:51:28Z

Implements the idea of "code mode" or "sandbox mode" for MCP

Refs
https://blog.cloudflare.com/code-mode/
https://www.anthropic.com/engineering/code-execution-with-mcp
#5899

Architecture

New code_execution platform extension
When enabled, this extension makes all other tools invisible to the model in the traditional sense
Model now has two tools read_module to read a tool's implementation code to know how to call it, and execute_code to send code to run to call tool(s)
Generates a programmatic API to all enabled MCP server tools
Has two tools
- read_module with the ability to read the source code implementing one tool call
- execute_code with instructions to the model on how it should write code
Publishes the tree of modules available in format servers/:server_name/:tool_name.js to the model via get_moim
Dispatches tool calls in a separate async thread, as the main thread running boa NativeFunctions are !Send

Diagram for tool call dispatching in present state

Copilot

Pull request overview

This PR implements a JavaScript code execution platform extension that enables the model to execute JS code with synchronous access to all MCP tools. The implementation uses the Boa JavaScript engine and provides a "sandbox mode" where tools are auto-generated as JS functions that the model can call within a single code block.

Key Changes:

New code_execution platform extension with execute_code tool for running JS code
Tool handler architecture using channels to bridge Boa's !Send context with async runtime
Extension manager refactored to collect platform clients without holding locks
Preamble generation system that converts MCP tools into JavaScript function stubs

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`crates/goose/src/agents/code_execution_extension.rs`	New 566-line extension implementing JS code execution with Boa engine, including tool binding generation and async tool dispatch
`crates/goose/src/agents/extension_manager.rs`	Added `get_prefixed_tools_excluding()` method and refactored `collect_moim()` to avoid holding lock while calling `get_moim()`
`crates/goose/src/agents/extension.rs`	Registered new code_execution extension in platform extensions registry
`crates/goose/src/agents/mod.rs`	Added module declaration for code_execution_extension
`crates/goose/Cargo.toml`	Added boa_engine 0.21.0 and boa_gc 0.21 dependencies
`Cargo.lock`	Dependency lock file updates for Boa engine and transitive dependencies

Copilot · 2025-12-09T21:55:10Z

crates/goose/src/agents/code_execution_extension.rs

+        let execute_schema = serde_json::to_value(schema_for!(ExecuteCodeParams))
+            .expect("schema")
+            .as_object()
+            .expect("object")
+            .clone();


Using expect on schema generation will panic. Since this is called during tool listing, consider handling the error gracefully by returning an error result instead.

alexhancock · 2025-12-10T01:44:49Z

This implementation saves on intermediate tool results not flowing to the model, but doesn't yet address progressive discovery of the interfaces themselves (via a tree of files, resources, etc). I will look into this tomorrow and push an update.

michaelneale · 2025-12-10T02:08:25Z

nice - also for compatibility, in this mode I think enabled: true in the config just means that it is available to the code mode environment, not the LLM (functionally the same, but implemented differently). can even start trying what it is like with all possible extensions "turned on" to see how well it works and how efficient it can be!

domdomegg · 2025-12-10T11:43:05Z

This implementation saves on intermediate tool results not flowing to the model, but doesn't yet address progressive discovery of the interfaces themselves (via a tree of files, resources, etc). I will look into this tomorrow and push an update.

I think for this we've found one of the following works well:

real or virtual filesystem, with servers as folders and tools as files
telling the model in system prompt what servers are available (and maybe tool names), and then having a tool that is like get_tools(server_name) and get_tool(server_name, tool_name)

Copilot

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 12 comments.

crates/goose/src/agents/code_execution_extension.rs

Copilot · 2025-12-11T20:17:53Z

crates/goose/src/agents/code_execution_extension.rs

+                            Ok(content) => Ok(content
+                                .iter()
+                                .filter_map(|c| match &c.raw {
+                                    RawContent::Text(t) => Some(t.text.clone()),
+                                    _ => None,
+                                })
+                                .collect::<Vec<_>>()
+                                .join("\n")),


This error response intentionally discards all non-text content from tool results. If a tool returns images, resources, or other content types, that information will be silently lost.

Consider either including non-text content in a structured way, or documenting that only text content is supported in code execution mode.

Suggested change

Ok(content) => Ok(content

.iter()

.filter_map(|c| match &c.raw {

RawContent::Text(t) => Some(t.text.clone()),

_ => None,

})

.collect::<Vec<_>>()

.join("\n")),

Ok(content) => {

let mut non_text_found = false;

let texts = content

.iter()

.filter_map(|c| match &c.raw {

RawContent::Text(t) => Some(t.text.clone()),

_ => {

non_text_found = true;

None

}

})

.collect::<Vec<_>>();

if non_text_found {

Err("Tool returned non-text content (e.g., image or resource), which is not supported in code execution mode. Only text content is supported.".to_string())

} else {

Ok(texts.join("\n"))

}

},

crates/goose/src/agents/reply_parts.rs

Copilot

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Copilot

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

DOsinga

some comments about the plumbing. I did look a bit at the meat of the thing. looks clever! will give it anohter go after a walk, but maybe we can talk about these minor things

DOsinga · 2025-12-15T16:35:30Z

crates/goose/src/agents/extension_manager.rs

    }

-    /// Get extensions info
+    /// Get extensions info for building the system prompt.


we also call this function for some reason when generating a recipe. either way, do we need to modify this at all? to me it looks like if we have code execution on, we don't even insert the extensions in the prompt (or we should).

DOsinga · 2025-12-15T16:41:25Z

crates/goose/src/agents/extension_manager.rs

-                    content.push('\n');
-                    content.push_str(&moim_content);
-                }
+        let platform_clients: Vec<(String, McpClientBox)> = {


presmably this is to avoid hanging on to the lock? nice.

does MOIM work with code execution?

presmably this is to avoid hanging on to the lock? nice.

yes, exactly

can you clarify the question about moim? i use it in a certain way for this change, but want to make sure I answer the right question.

let me have a closer look at the rest of the code. I mostly wondered if we should block the Moim for the rest of the extensions or not.

DOsinga · 2025-12-15T16:42:05Z

crates/goose/src/agents/extension_manager.rs

-            .lock()
-            .await
+        let extensions = self.extensions.lock().await;
+        let code_exec_enabled = extensions.contains_key(CODE_EXECUTION_NAME);


we try to determine this in three different ways, we should probably only use the utility function below.

DOsinga · 2025-12-15T16:43:28Z

crates/goose/src/agents/prompt_manager.rs

            })
            .collect();

+        // Detect code_execution mode: when enabled, only code_execution extension is passed


why do we need to detect that here? arent we injecting this in the builder?

DOsinga

this is awesome. I have many thoughts, left some as comments, but we should try and ship this quickly

at this point it is up to the user to use code mode by enabling that extension and then the other extensions are only accessible through this. have we tried allowing both paths?

DOsinga · 2025-12-15T18:23:26Z

crates/goose/src/agents/code_execution_extension.rs

+
+#[derive(Debug, Serialize, Deserialize, JsonSchema)]
+struct ReadModuleParams {
+    /// Module path: "server" for all tools, "server/tool" for one tool.


I can't believe I am saying this, but assuming this comment becomes a doc string in the tool, can we expand this a bit better? I had to read it twice before I understood it.

DOsinga · 2025-12-15T18:23:41Z

crates/goose/src/agents/code_execution_extension.rs

+#[derive(Debug, Serialize, Deserialize, JsonSchema)]
+struct ReadModuleParams {
+    /// Module path: "server" for all tools, "server/tool" for one tool.
+    path: String,


and maybe just call this module_path then

DOsinga · 2025-12-15T18:25:37Z

crates/goose/src/agents/code_execution_extension.rs

+                let ty = prop.and_then(|p| p.get("type")?.as_str()).unwrap_or("any");
+                (name.clone(), ty.to_string(), required)
+            })
+            .collect();


is this the best way to parse the json? do we not have a rust object that matches this that we can use to read into?

DOsinga · 2025-12-15T18:34:16Z

crates/goose/src/agents/code_execution_extension.rs

+    let tool_data: Vec<(String, String)> = server_tools
+        .iter()
+        .map(|t| (t.tool_name.clone(), t.full_name.clone()))
+        .collect();


nit: you could avoid looping over this twice and unzip in one go

DOsinga · 2025-12-15T18:46:22Z

crates/goose/src/agents/code_execution_extension.rs

+    )
+}
+
+fn create_tool_function(full_name: String) -> NativeFunction {


maybe call this full_tool_name to indicate that this is server__tool ?

DOsinga · 2025-12-15T19:08:16Z

crates/goose/src/agents/code_execution_extension.rs

+                format!("{before}\n__result__ = {};", last.trim_end_matches(';'))
+            })
+        }
+    };


this code looks fragile. it makes a good effort, but there's also a lot of corner cases to be covered here. have we considered to just inject a record_result(...) function and tell the LLM to call that to, eh, record a result?

crates/goose/src/agents/code_execution_extension.rs

DOsinga · 2025-12-15T19:26:11Z

crates/goose/src/agents/code_execution_extension.rs

+
+                Use read_module("name") to see tool signatures before calling unfamiliar tools.
+            "#},
+            server_list.join(", ")


I wonder if we should rely on read_module and search if the number of tools is not that many. LLMs can very easily consume large interfaces in one go without getting confused. We could even supply it with the interface that we inject directly so it can call that instead of:

search for the function

import the function

call the function

i did this at the beginning but @michaelneale's instinct was to move more towards a search/read/execute approach to pull as much as we could out of the context window initially

given we've done more manual testing with this approach I am going to stick with this for now. but we can iterate on this if there is a clear change we find to make that is beneficial.

yeah - could have it add in a small number, but even some popular MCPs (like github) overload it right from the start so unless we special case some built in ones... seems consistent to let it discover? (but maybe that is worth it if we note it being an issue?)

main one would be say shell/editor ones so it intrinsically knows, which would save (sometimes) one search/lookup up front (but that is all)

The main thing is we don't load up an unbounded set of tools (extension names are usually reasonable and modest, but tools in MCP world do not seem to be reasonable)

DOsinga · 2025-12-15T19:29:55Z

crates/goose/src/agents/code_execution_extension.rs

+            self.context.extension_manager.clone(),
+        ));
+
+        let js_result = tokio::task::spawn_blocking(move || run_js_module(&code, &tools, call_tx))


if the agent writes an infinite loop, are we toast here? for external MCP servers I think we timeout, but this is all in process so it hangs our agent I think

yes, but I also think as the writes more and more complex code where it potentially could wait for async things etc in one run, having a timeout is a little different than with a single tool call.

instinct on what a good timeout would be?

this really wouldn't be that different to say using shell to do writes with sed and awk or even bash type of thing is it? did that have a timeout?

Copilot

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Copilot · 2025-12-15T19:48:08Z

crates/goose/src/agents/extension_manager.rs

+        let platform_clients: Vec<(String, McpClientBox)> = {
+            let extensions = self.extensions.lock().await;
+            extensions
+                .iter()
+                .filter_map(|(name, extension)| {
+                    if let ExtensionConfig::Platform { .. } = &extension.config {
+                        Some((name.clone(), extension.get_client()))
+                    } else {
+                        None
+                    }
+                })
+                .collect()
+        };


The lock on extensions is held during the entire collection process, including calls to extension.get_client(). Consider collecting the data in two steps: first gather references under the lock, then call get_client() after releasing it to reduce lock contention.

Copilot · 2025-12-15T19:48:09Z

crates/goose/src/agents/code_execution_extension.rs

+            .await
+            .map_err(|e| format!("JS execution task failed: {e}"))?;
+
+        tool_handler.abort();


Aborting the tool handler task immediately after JS execution completes may drop in-flight tool calls. Consider waiting for pending operations with a timeout or tracking outstanding requests.

Copilot · 2025-12-15T19:48:09Z

crates/goose/src/agents/code_execution_extension.rs

+                        - Call: toolName({ param1: value, param2: value })
+                        - All calls are synchronous, return strings
+                        - Last expression is the result
+                        - No comments in code


The instruction 'No comments in code' contradicts common JavaScript practices and may confuse users. If comments should be avoided for a technical reason (e.g., parser limitations), this should be explained. Otherwise, consider removing this restriction.

Suggested change

- No comments in code

…tool

…#6030) Co-authored-by: Michael Neale <michael.neale@gmail.com>

…erer * origin/main: (26 commits) Don't persist ephemeral extensions when resuming sessions (#5974) chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939) chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898) Add Scorecard supply-chain security workflow (#5810) Don't show subagent tool when we're a subagent (#6125) Fix keyboard shortcut conflict for Focus Goose Window (#5809) feat(goose-cli): add feature to disable update (#5886) workflow: enable docs-update-recipe-ref (#6132) fix: filter tools in Ollama streaming when chat mode is enabled (#6118) feat(mcp): platform extension for "code mode" MCP tool calling (#6030) workflow: auto-update recipe-reference on release (#5988) Document recipe slash commands feature (#6075) docs: add GitHub Copilot device flow authentication details (#6123) Disallow subagents with no extensions (#5825) chore(deps): bump js-yaml in /documentation (#6093) feat: external goosed server (#5978) fix: Make datetime info message more explicit to prevent LLM confusion about current year (#6101) refactor: unify subagent and subrecipe tools into single tool (#5893) goose repo is too big for the issue solver workflow worker (#6099) fix: use system not developer role in db (#6098) ...

* 'main' of github.com:block/goose: (22 commits) OpenRouter & Xai streaming (#5873) fix: resolve mcp-hermit cleanup path expansion issue (#5953) feat: add goose PR reviewer workflow (#6124) perf: Avoid repeated MCP queries during streaming responses (#6138) Fix YAML serialization for recipes with special characters (#5796) Add more posthog analytics (privacy aware) (#6122) docs: add Sugar MCP server to extensions registry (#6077) Fix tokenState loading on new sessions (#6129) bump bedrock dep versions (#6090) Don't persist ephemeral extensions when resuming sessions (#5974) chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939) chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898) Add Scorecard supply-chain security workflow (#5810) Don't show subagent tool when we're a subagent (#6125) Fix keyboard shortcut conflict for Focus Goose Window (#5809) feat(goose-cli): add feature to disable update (#5886) workflow: enable docs-update-recipe-ref (#6132) fix: filter tools in Ollama streaming when chat mode is enabled (#6118) feat(mcp): platform extension for "code mode" MCP tool calling (#6030) workflow: auto-update recipe-reference on release (#5988) ... # Conflicts: # ui/desktop/src/App.tsx # ui/desktop/src/api/sdk.gen.ts # ui/desktop/src/components/ChatInput.tsx # ui/desktop/src/components/recipes/RecipesView.tsx

alexhancock requested review from DOsinga, Copilot and michaelneale December 9, 2025 21:51

Copilot started reviewing on behalf of alexhancock December 9, 2025 21:52 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

michaelneale self-assigned this Dec 10, 2025

alexhancock force-pushed the alexhancock/code-mode-mcp branch from fce3e6c to 77f0131 Compare December 10, 2025 01:19

Copilot AI review requested due to automatic review settings December 11, 2025 20:12

alexhancock force-pushed the alexhancock/code-mode-mcp branch from 77f0131 to c104cef Compare December 11, 2025 20:12

alexhancock changed the title ~~feat: platform extension for code execution~~ feat(mcp): platform extension for "code mode" MCP tool calling Dec 11, 2025

Copilot started reviewing on behalf of alexhancock December 11, 2025 20:12 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 12, 2025 06:25

Copilot started reviewing on behalf of michaelneale December 12, 2025 06:26 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 12, 2025 16:10

alexhancock force-pushed the alexhancock/code-mode-mcp branch from dede50c to 0cd5d8b Compare December 12, 2025 16:10

Copilot started reviewing on behalf of alexhancock December 12, 2025 16:11 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 15, 2025 14:35

alexhancock force-pushed the alexhancock/code-mode-mcp branch from ebd7225 to 315eb13 Compare December 15, 2025 14:35

Copilot started reviewing on behalf of alexhancock December 15, 2025 14:36 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

DOsinga reviewed Dec 15, 2025

View reviewed changes

block deleted a comment from Copilot AI Dec 15, 2025

Copilot AI review requested due to automatic review settings December 15, 2025 17:19

block deleted a comment from Copilot AI Dec 15, 2025

alexhancock force-pushed the alexhancock/code-mode-mcp branch from d4bbbdc to b00beb4 Compare December 15, 2025 18:11

block deleted a comment from Copilot AI Dec 15, 2025

DOsinga approved these changes Dec 15, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 15, 2025 19:46

alexhancock force-pushed the alexhancock/code-mode-mcp branch from b00beb4 to f0f5352 Compare December 15, 2025 19:46

Copilot AI reviewed Dec 15, 2025

View reviewed changes

block deleted a comment from Copilot AI Dec 15, 2025

alexhancock force-pushed the alexhancock/code-mode-mcp branch from f0f5352 to e7e6109 Compare December 15, 2025 20:44

alexhancock and others added 3 commits December 15, 2025 15:44

feat(mcp): platform extension for "code mode" MCP tool calling

b8093d0

pull extensions from prompt when in code mode and add search_modules …

492debd

…tool

chore(code-mode): addressing PR feedback

0b0380f

alexhancock force-pushed the alexhancock/code-mode-mcp branch from e7e6109 to 0b0380f Compare December 15, 2025 20:44

alexhancock merged commit 948ff49 into main Dec 15, 2025
18 checks passed

alexhancock deleted the alexhancock/code-mode-mcp branch December 15, 2025 21:47

alexhancock mentioned this pull request Dec 15, 2025

docs: blog for code mode MCP #6126

Merged

github-actions bot mentioned this pull request Dec 16, 2025

chore(release): release version 1.17.0 (minor) #6131

Closed

fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Dec 16, 2025

feat(mcp): platform extension for "code mode" MCP tool calling (block…

bf618ff

…#6030) Co-authored-by: Michael Neale <michael.neale@gmail.com>

johnmatthewtennant mentioned this pull request Dec 18, 2025

dynamic subdirectory hint loading #5759

Closed

10 tasks

alexhancock mentioned this pull request Jan 5, 2026

fix: clean up result recording for code mode #6343

Merged

Conversation

alexhancock commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

alexhancock commented Dec 10, 2025

Uh oh!

michaelneale commented Dec 10, 2025

Uh oh!

domdomegg commented Dec 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexhancock commented Dec 9, 2025 •

edited

Loading

michaelneale Dec 15, 2025 •

edited

Loading