Skip to content

feat(mcp): platform extension for "code mode" MCP tool calling#6030

Merged
alexhancock merged 3 commits intomainfrom
alexhancock/code-mode-mcp
Dec 15, 2025
Merged

feat(mcp): platform extension for "code mode" MCP tool calling#6030
alexhancock merged 3 commits intomainfrom
alexhancock/code-mode-mcp

Conversation

@alexhancock
Copy link
Collaborator

@alexhancock alexhancock commented Dec 9, 2025

Implements the idea of "code mode" or "sandbox mode" for MCP

Refs
https://blog.cloudflare.com/code-mode/
https://www.anthropic.com/engineering/code-execution-with-mcp
#5899

Architecture

  • New code_execution platform extension
  • When enabled, this extension makes all other tools invisible to the model in the traditional sense
  • Model now has two tools read_module to read a tool's implementation code to know how to call it, and execute_code to send code to run to call tool(s)
  • Generates a programmatic API to all enabled MCP server tools
  • Has two tools
    • read_module with the ability to read the source code implementing one tool call
    • execute_code with instructions to the model on how it should write code
  • Publishes the tree of modules available in format servers/:server_name/:tool_name.js to the model via get_moim
  • Dispatches tool calls in a separate async thread, as the main thread running boa NativeFunctions are !Send

Diagram for tool call dispatching in present state
tool-calling

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a JavaScript code execution platform extension that enables the model to execute JS code with synchronous access to all MCP tools. The implementation uses the Boa JavaScript engine and provides a "sandbox mode" where tools are auto-generated as JS functions that the model can call within a single code block.

Key Changes:

  • New code_execution platform extension with execute_code tool for running JS code
  • Tool handler architecture using channels to bridge Boa's !Send context with async runtime
  • Extension manager refactored to collect platform clients without holding locks
  • Preamble generation system that converts MCP tools into JavaScript function stubs

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
crates/goose/src/agents/code_execution_extension.rs New 566-line extension implementing JS code execution with Boa engine, including tool binding generation and async tool dispatch
crates/goose/src/agents/extension_manager.rs Added get_prefixed_tools_excluding() method and refactored collect_moim() to avoid holding lock while calling get_moim()
crates/goose/src/agents/extension.rs Registered new code_execution extension in platform extensions registry
crates/goose/src/agents/mod.rs Added module declaration for code_execution_extension
crates/goose/Cargo.toml Added boa_engine 0.21.0 and boa_gc 0.21 dependencies
Cargo.lock Dependency lock file updates for Boa engine and transitive dependencies

Comment on lines 332 to 336
let execute_schema = serde_json::to_value(schema_for!(ExecuteCodeParams))
.expect("schema")
.as_object()
.expect("object")
.clone();
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using expect on schema generation will panic. Since this is called during tool listing, consider handling the error gracefully by returning an error result instead.

Copilot uses AI. Check for mistakes.
@michaelneale michaelneale self-assigned this Dec 10, 2025
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from fce3e6c to 77f0131 Compare December 10, 2025 01:19
@alexhancock
Copy link
Collaborator Author

This implementation saves on intermediate tool results not flowing to the model, but doesn't yet address progressive discovery of the interfaces themselves (via a tree of files, resources, etc). I will look into this tomorrow and push an update.

@michaelneale
Copy link
Collaborator

nice - also for compatibility, in this mode I think enabled: true in the config just means that it is available to the code mode environment, not the LLM (functionally the same, but implemented differently). can even start trying what it is like with all possible extensions "turned on" to see how well it works and how efficient it can be!

@domdomegg
Copy link

This implementation saves on intermediate tool results not flowing to the model, but doesn't yet address progressive discovery of the interfaces themselves (via a tree of files, resources, etc). I will look into this tomorrow and push an update.

I think for this we've found one of the following works well:

  • real or virtual filesystem, with servers as folders and tools as files
  • telling the model in system prompt what servers are available (and maybe tool names), and then having a tool that is like get_tools(server_name) and get_tool(server_name, tool_name)

Copilot AI review requested due to automatic review settings December 11, 2025 20:12
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from 77f0131 to c104cef Compare December 11, 2025 20:12
@alexhancock alexhancock changed the title feat: platform extension for code execution feat(mcp): platform extension for "code mode" MCP tool calling Dec 11, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 12 comments.

Comment on lines 395 to 402
Ok(content) => Ok(content
.iter()
.filter_map(|c| match &c.raw {
RawContent::Text(t) => Some(t.text.clone()),
_ => None,
})
.collect::<Vec<_>>()
.join("\n")),
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error response intentionally discards all non-text content from tool results. If a tool returns images, resources, or other content types, that information will be silently lost.

Consider either including non-text content in a structured way, or documenting that only text content is supported in code execution mode.

Suggested change
Ok(content) => Ok(content
.iter()
.filter_map(|c| match &c.raw {
RawContent::Text(t) => Some(t.text.clone()),
_ => None,
})
.collect::<Vec<_>>()
.join("\n")),
Ok(content) => {
let mut non_text_found = false;
let texts = content
.iter()
.filter_map(|c| match &c.raw {
RawContent::Text(t) => Some(t.text.clone()),
_ => {
non_text_found = true;
None
}
})
.collect::<Vec<_>>();
if non_text_found {
Err("Tool returned non-text content (e.g., image or resource), which is not supported in code execution mode. Only text content is supported.".to_string())
} else {
Ok(texts.join("\n"))
}
},

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 12, 2025 06:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.

Copilot AI review requested due to automatic review settings December 12, 2025 16:10
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from dede50c to 0cd5d8b Compare December 12, 2025 16:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Copilot AI review requested due to automatic review settings December 15, 2025 14:35
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from ebd7225 to 315eb13 Compare December 15, 2025 14:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments about the plumbing. I did look a bit at the meat of the thing. looks clever! will give it anohter go after a walk, but maybe we can talk about these minor things

}

/// Get extensions info
/// Get extensions info for building the system prompt.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also call this function for some reason when generating a recipe. either way, do we need to modify this at all? to me it looks like if we have code execution on, we don't even insert the extensions in the prompt (or we should).

content.push('\n');
content.push_str(&moim_content);
}
let platform_clients: Vec<(String, McpClientBox)> = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presmably this is to avoid hanging on to the lock? nice.

does MOIM work with code execution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presmably this is to avoid hanging on to the lock? nice.

yes, exactly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify the question about moim? i use it in a certain way for this change, but want to make sure I answer the right question.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me have a closer look at the rest of the code. I mostly wondered if we should block the Moim for the rest of the extensions or not.

.lock()
.await
let extensions = self.extensions.lock().await;
let code_exec_enabled = extensions.contains_key(CODE_EXECUTION_NAME);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we try to determine this in three different ways, we should probably only use the utility function below.

})
.collect();

// Detect code_execution mode: when enabled, only code_execution extension is passed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to detect that here? arent we injecting this in the builder?

@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
Copilot AI review requested due to automatic review settings December 15, 2025 17:19
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from d4bbbdc to b00beb4 Compare December 15, 2025 18:11
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome. I have many thoughts, left some as comments, but we should try and ship this quickly

at this point it is up to the user to use code mode by enabling that extension and then the other extensions are only accessible through this. have we tried allowing both paths?


#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct ReadModuleParams {
/// Module path: "server" for all tools, "server/tool" for one tool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe I am saying this, but assuming this comment becomes a doc string in the tool, can we expand this a bit better? I had to read it twice before I understood it.

#[derive(Debug, Serialize, Deserialize, JsonSchema)]
struct ReadModuleParams {
/// Module path: "server" for all tools, "server/tool" for one tool.
path: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe just call this module_path then

let ty = prop.and_then(|p| p.get("type")?.as_str()).unwrap_or("any");
(name.clone(), ty.to_string(), required)
})
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the best way to parse the json? do we not have a rust object that matches this that we can use to read into?

let tool_data: Vec<(String, String)> = server_tools
.iter()
.map(|t| (t.tool_name.clone(), t.full_name.clone()))
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could avoid looping over this twice and unzip in one go

)
}

fn create_tool_function(full_name: String) -> NativeFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this full_tool_name to indicate that this is server__tool ?

format!("{before}\n__result__ = {};", last.trim_end_matches(';'))
})
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code looks fragile. it makes a good effort, but there's also a lot of corner cases to be covered here. have we considered to just inject a record_result(...) function and tell the LLM to call that to, eh, record a result?


Use read_module("name") to see tool signatures before calling unfamiliar tools.
"#},
server_list.join(", ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rely on read_module and search if the number of tools is not that many. LLMs can very easily consume large interfaces in one go without getting confused. We could even supply it with the interface that we inject directly so it can call that instead of:

  • search for the function
  • import the function
  • call the function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did this at the beginning but @michaelneale's instinct was to move more towards a search/read/execute approach to pull as much as we could out of the context window initially

given we've done more manual testing with this approach I am going to stick with this for now. but we can iterate on this if there is a clear change we find to make that is beneficial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah - could have it add in a small number, but even some popular MCPs (like github) overload it right from the start so unless we special case some built in ones... seems consistent to let it discover? (but maybe that is worth it if we note it being an issue?)

Copy link
Collaborator

@michaelneale michaelneale Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main one would be say shell/editor ones so it intrinsically knows, which would save (sometimes) one search/lookup up front (but that is all)

The main thing is we don't load up an unbounded set of tools (extension names are usually reasonable and modest, but tools in MCP world do not seem to be reasonable)

self.context.extension_manager.clone(),
));

let js_result = tokio::task::spawn_blocking(move || run_js_module(&code, &tools, call_tx))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the agent writes an infinite loop, are we toast here? for external MCP servers I think we timeout, but this is all in process so it hangs our agent I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but I also think as the writes more and more complex code where it potentially could wait for async things etc in one run, having a timeout is a little different than with a single tool call.

instinct on what a good timeout would be?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this really wouldn't be that different to say using shell to do writes with sed and awk or even bash type of thing is it? did that have a timeout?

Copilot AI review requested due to automatic review settings December 15, 2025 19:46
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from b00beb4 to f0f5352 Compare December 15, 2025 19:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Comment on lines +1274 to +1286
let platform_clients: Vec<(String, McpClientBox)> = {
let extensions = self.extensions.lock().await;
extensions
.iter()
.filter_map(|(name, extension)| {
if let ExtensionConfig::Platform { .. } = &extension.config {
Some((name.clone(), extension.get_client()))
} else {
None
}
})
.collect()
};
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock on extensions is held during the entire collection process, including calls to extension.get_client(). Consider collecting the data in two steps: first gather references under the lock, then call get_client() after releasing it to reduce lock contention.

Copilot uses AI. Check for mistakes.
.await
.map_err(|e| format!("JS execution task failed: {e}"))?;

tool_handler.abort();
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aborting the tool handler task immediately after JS execution completes may drop in-flight tool calls. Consider waiting for pending operations with a timeout or tracking outstanding requests.

Copilot uses AI. Check for mistakes.
- Call: toolName({ param1: value, param2: value })
- All calls are synchronous, return strings
- Last expression is the result
- No comments in code
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction 'No comments in code' contradicts common JavaScript practices and may confuse users. If comments should be avoided for a technical reason (e.g., parser limitations), this should be explained. Otherwise, consider removing this restriction.

Suggested change
- No comments in code

Copilot uses AI. Check for mistakes.
@block block deleted a comment from Copilot AI Dec 15, 2025
@block block deleted a comment from Copilot AI Dec 15, 2025
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from f0f5352 to e7e6109 Compare December 15, 2025 20:44
@alexhancock alexhancock force-pushed the alexhancock/code-mode-mcp branch from e7e6109 to 0b0380f Compare December 15, 2025 20:44
@alexhancock alexhancock merged commit 948ff49 into main Dec 15, 2025
18 checks passed
@alexhancock alexhancock deleted the alexhancock/code-mode-mcp branch December 15, 2025 21:47
fbalicchia pushed a commit to fbalicchia/goose that referenced this pull request Dec 16, 2025
…#6030)

Co-authored-by: Michael Neale <michael.neale@gmail.com>
aharvard added a commit that referenced this pull request Dec 16, 2025
…erer

* origin/main: (26 commits)
  Don't persist ephemeral extensions when resuming sessions (#5974)
  chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939)
  chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898)
  Add Scorecard supply-chain security workflow (#5810)
  Don't show subagent tool when we're a subagent (#6125)
  Fix keyboard shortcut conflict for Focus Goose Window (#5809)
  feat(goose-cli): add feature to disable update (#5886)
  workflow: enable docs-update-recipe-ref (#6132)
  fix: filter tools in Ollama streaming when chat mode is enabled (#6118)
  feat(mcp): platform extension for "code mode" MCP tool calling (#6030)
  workflow: auto-update recipe-reference on release (#5988)
  Document recipe slash commands feature (#6075)
  docs: add GitHub Copilot device flow authentication details (#6123)
  Disallow subagents with no extensions (#5825)
  chore(deps): bump js-yaml in /documentation (#6093)
  feat: external goosed server (#5978)
  fix: Make datetime info message more explicit to prevent LLM confusion about current year (#6101)
  refactor: unify subagent and subrecipe tools into single tool (#5893)
  goose repo is too big for the issue solver workflow worker (#6099)
  fix: use system not developer role in db (#6098)
  ...
zanesq added a commit that referenced this pull request Dec 16, 2025
* 'main' of github.com:block/goose: (22 commits)
  OpenRouter & Xai streaming (#5873)
  fix: resolve mcp-hermit cleanup path expansion issue (#5953)
  feat: add goose PR reviewer workflow (#6124)
  perf: Avoid repeated MCP queries during streaming responses (#6138)
  Fix YAML serialization for recipes with special characters (#5796)
  Add more posthog analytics (privacy aware) (#6122)
  docs: add Sugar MCP server to extensions registry (#6077)
  Fix tokenState loading on new sessions (#6129)
  bump bedrock dep versions (#6090)
  Don't persist ephemeral extensions when resuming sessions (#5974)
  chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939)
  chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898)
  Add Scorecard supply-chain security workflow (#5810)
  Don't show subagent tool when we're a subagent (#6125)
  Fix keyboard shortcut conflict for Focus Goose Window (#5809)
  feat(goose-cli): add feature to disable update (#5886)
  workflow: enable docs-update-recipe-ref (#6132)
  fix: filter tools in Ollama streaming when chat mode is enabled (#6118)
  feat(mcp): platform extension for "code mode" MCP tool calling (#6030)
  workflow: auto-update recipe-reference on release (#5988)
  ...

# Conflicts:
#	ui/desktop/src/App.tsx
#	ui/desktop/src/api/sdk.gen.ts
#	ui/desktop/src/components/ChatInput.tsx
#	ui/desktop/src/components/recipes/RecipesView.tsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants