feat(mcp): platform extension for "code mode" MCP tool calling#6030
feat(mcp): platform extension for "code mode" MCP tool calling#6030alexhancock merged 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a JavaScript code execution platform extension that enables the model to execute JS code with synchronous access to all MCP tools. The implementation uses the Boa JavaScript engine and provides a "sandbox mode" where tools are auto-generated as JS functions that the model can call within a single code block.
Key Changes:
- New
code_executionplatform extension withexecute_codetool for running JS code - Tool handler architecture using channels to bridge Boa's
!Sendcontext with async runtime - Extension manager refactored to collect platform clients without holding locks
- Preamble generation system that converts MCP tools into JavaScript function stubs
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
crates/goose/src/agents/code_execution_extension.rs |
New 566-line extension implementing JS code execution with Boa engine, including tool binding generation and async tool dispatch |
crates/goose/src/agents/extension_manager.rs |
Added get_prefixed_tools_excluding() method and refactored collect_moim() to avoid holding lock while calling get_moim() |
crates/goose/src/agents/extension.rs |
Registered new code_execution extension in platform extensions registry |
crates/goose/src/agents/mod.rs |
Added module declaration for code_execution_extension |
crates/goose/Cargo.toml |
Added boa_engine 0.21.0 and boa_gc 0.21 dependencies |
Cargo.lock |
Dependency lock file updates for Boa engine and transitive dependencies |
| let execute_schema = serde_json::to_value(schema_for!(ExecuteCodeParams)) | ||
| .expect("schema") | ||
| .as_object() | ||
| .expect("object") | ||
| .clone(); |
There was a problem hiding this comment.
Using expect on schema generation will panic. Since this is called during tool listing, consider handling the error gracefully by returning an error result instead.
fce3e6c to
77f0131
Compare
|
This implementation saves on intermediate tool results not flowing to the model, but doesn't yet address progressive discovery of the interfaces themselves (via a tree of files, resources, etc). I will look into this tomorrow and push an update. |
|
nice - also for compatibility, in this mode I think |
I think for this we've found one of the following works well:
|
77f0131 to
c104cef
Compare
| Ok(content) => Ok(content | ||
| .iter() | ||
| .filter_map(|c| match &c.raw { | ||
| RawContent::Text(t) => Some(t.text.clone()), | ||
| _ => None, | ||
| }) | ||
| .collect::<Vec<_>>() | ||
| .join("\n")), |
There was a problem hiding this comment.
This error response intentionally discards all non-text content from tool results. If a tool returns images, resources, or other content types, that information will be silently lost.
Consider either including non-text content in a structured way, or documenting that only text content is supported in code execution mode.
| Ok(content) => Ok(content | |
| .iter() | |
| .filter_map(|c| match &c.raw { | |
| RawContent::Text(t) => Some(t.text.clone()), | |
| _ => None, | |
| }) | |
| .collect::<Vec<_>>() | |
| .join("\n")), | |
| Ok(content) => { | |
| let mut non_text_found = false; | |
| let texts = content | |
| .iter() | |
| .filter_map(|c| match &c.raw { | |
| RawContent::Text(t) => Some(t.text.clone()), | |
| _ => { | |
| non_text_found = true; | |
| None | |
| } | |
| }) | |
| .collect::<Vec<_>>(); | |
| if non_text_found { | |
| Err("Tool returned non-text content (e.g., image or resource), which is not supported in code execution mode. Only text content is supported.".to_string()) | |
| } else { | |
| Ok(texts.join("\n")) | |
| } | |
| }, |
dede50c to
0cd5d8b
Compare
ebd7225 to
315eb13
Compare
DOsinga
left a comment
There was a problem hiding this comment.
some comments about the plumbing. I did look a bit at the meat of the thing. looks clever! will give it anohter go after a walk, but maybe we can talk about these minor things
| } | ||
|
|
||
| /// Get extensions info | ||
| /// Get extensions info for building the system prompt. |
There was a problem hiding this comment.
we also call this function for some reason when generating a recipe. either way, do we need to modify this at all? to me it looks like if we have code execution on, we don't even insert the extensions in the prompt (or we should).
| content.push('\n'); | ||
| content.push_str(&moim_content); | ||
| } | ||
| let platform_clients: Vec<(String, McpClientBox)> = { |
There was a problem hiding this comment.
presmably this is to avoid hanging on to the lock? nice.
does MOIM work with code execution?
There was a problem hiding this comment.
presmably this is to avoid hanging on to the lock? nice.
yes, exactly
There was a problem hiding this comment.
can you clarify the question about moim? i use it in a certain way for this change, but want to make sure I answer the right question.
There was a problem hiding this comment.
let me have a closer look at the rest of the code. I mostly wondered if we should block the Moim for the rest of the extensions or not.
| .lock() | ||
| .await | ||
| let extensions = self.extensions.lock().await; | ||
| let code_exec_enabled = extensions.contains_key(CODE_EXECUTION_NAME); |
There was a problem hiding this comment.
we try to determine this in three different ways, we should probably only use the utility function below.
| }) | ||
| .collect(); | ||
|
|
||
| // Detect code_execution mode: when enabled, only code_execution extension is passed |
There was a problem hiding this comment.
why do we need to detect that here? arent we injecting this in the builder?
d4bbbdc to
b00beb4
Compare
DOsinga
left a comment
There was a problem hiding this comment.
this is awesome. I have many thoughts, left some as comments, but we should try and ship this quickly
at this point it is up to the user to use code mode by enabling that extension and then the other extensions are only accessible through this. have we tried allowing both paths?
|
|
||
| #[derive(Debug, Serialize, Deserialize, JsonSchema)] | ||
| struct ReadModuleParams { | ||
| /// Module path: "server" for all tools, "server/tool" for one tool. |
There was a problem hiding this comment.
I can't believe I am saying this, but assuming this comment becomes a doc string in the tool, can we expand this a bit better? I had to read it twice before I understood it.
| #[derive(Debug, Serialize, Deserialize, JsonSchema)] | ||
| struct ReadModuleParams { | ||
| /// Module path: "server" for all tools, "server/tool" for one tool. | ||
| path: String, |
There was a problem hiding this comment.
and maybe just call this module_path then
| let ty = prop.and_then(|p| p.get("type")?.as_str()).unwrap_or("any"); | ||
| (name.clone(), ty.to_string(), required) | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
is this the best way to parse the json? do we not have a rust object that matches this that we can use to read into?
| let tool_data: Vec<(String, String)> = server_tools | ||
| .iter() | ||
| .map(|t| (t.tool_name.clone(), t.full_name.clone())) | ||
| .collect(); |
There was a problem hiding this comment.
nit: you could avoid looping over this twice and unzip in one go
| ) | ||
| } | ||
|
|
||
| fn create_tool_function(full_name: String) -> NativeFunction { |
There was a problem hiding this comment.
maybe call this full_tool_name to indicate that this is server__tool ?
| format!("{before}\n__result__ = {};", last.trim_end_matches(';')) | ||
| }) | ||
| } | ||
| }; |
There was a problem hiding this comment.
this code looks fragile. it makes a good effort, but there's also a lot of corner cases to be covered here. have we considered to just inject a record_result(...) function and tell the LLM to call that to, eh, record a result?
|
|
||
| Use read_module("name") to see tool signatures before calling unfamiliar tools. | ||
| "#}, | ||
| server_list.join(", ") |
There was a problem hiding this comment.
I wonder if we should rely on read_module and search if the number of tools is not that many. LLMs can very easily consume large interfaces in one go without getting confused. We could even supply it with the interface that we inject directly so it can call that instead of:
- search for the function
- import the function
- call the function
There was a problem hiding this comment.
i did this at the beginning but @michaelneale's instinct was to move more towards a search/read/execute approach to pull as much as we could out of the context window initially
given we've done more manual testing with this approach I am going to stick with this for now. but we can iterate on this if there is a clear change we find to make that is beneficial.
There was a problem hiding this comment.
yeah - could have it add in a small number, but even some popular MCPs (like github) overload it right from the start so unless we special case some built in ones... seems consistent to let it discover? (but maybe that is worth it if we note it being an issue?)
There was a problem hiding this comment.
main one would be say shell/editor ones so it intrinsically knows, which would save (sometimes) one search/lookup up front (but that is all)
The main thing is we don't load up an unbounded set of tools (extension names are usually reasonable and modest, but tools in MCP world do not seem to be reasonable)
| self.context.extension_manager.clone(), | ||
| )); | ||
|
|
||
| let js_result = tokio::task::spawn_blocking(move || run_js_module(&code, &tools, call_tx)) |
There was a problem hiding this comment.
if the agent writes an infinite loop, are we toast here? for external MCP servers I think we timeout, but this is all in process so it hangs our agent I think
There was a problem hiding this comment.
yes, but I also think as the writes more and more complex code where it potentially could wait for async things etc in one run, having a timeout is a little different than with a single tool call.
instinct on what a good timeout would be?
There was a problem hiding this comment.
this really wouldn't be that different to say using shell to do writes with sed and awk or even bash type of thing is it? did that have a timeout?
b00beb4 to
f0f5352
Compare
| let platform_clients: Vec<(String, McpClientBox)> = { | ||
| let extensions = self.extensions.lock().await; | ||
| extensions | ||
| .iter() | ||
| .filter_map(|(name, extension)| { | ||
| if let ExtensionConfig::Platform { .. } = &extension.config { | ||
| Some((name.clone(), extension.get_client())) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .collect() | ||
| }; |
There was a problem hiding this comment.
The lock on extensions is held during the entire collection process, including calls to extension.get_client(). Consider collecting the data in two steps: first gather references under the lock, then call get_client() after releasing it to reduce lock contention.
| .await | ||
| .map_err(|e| format!("JS execution task failed: {e}"))?; | ||
|
|
||
| tool_handler.abort(); |
There was a problem hiding this comment.
Aborting the tool handler task immediately after JS execution completes may drop in-flight tool calls. Consider waiting for pending operations with a timeout or tracking outstanding requests.
| - Call: toolName({ param1: value, param2: value }) | ||
| - All calls are synchronous, return strings | ||
| - Last expression is the result | ||
| - No comments in code |
There was a problem hiding this comment.
The instruction 'No comments in code' contradicts common JavaScript practices and may confuse users. If comments should be avoided for a technical reason (e.g., parser limitations), this should be explained. Otherwise, consider removing this restriction.
| - No comments in code |
f0f5352 to
e7e6109
Compare
e7e6109 to
0b0380f
Compare
…#6030) Co-authored-by: Michael Neale <michael.neale@gmail.com>
…erer * origin/main: (26 commits) Don't persist ephemeral extensions when resuming sessions (#5974) chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939) chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898) Add Scorecard supply-chain security workflow (#5810) Don't show subagent tool when we're a subagent (#6125) Fix keyboard shortcut conflict for Focus Goose Window (#5809) feat(goose-cli): add feature to disable update (#5886) workflow: enable docs-update-recipe-ref (#6132) fix: filter tools in Ollama streaming when chat mode is enabled (#6118) feat(mcp): platform extension for "code mode" MCP tool calling (#6030) workflow: auto-update recipe-reference on release (#5988) Document recipe slash commands feature (#6075) docs: add GitHub Copilot device flow authentication details (#6123) Disallow subagents with no extensions (#5825) chore(deps): bump js-yaml in /documentation (#6093) feat: external goosed server (#5978) fix: Make datetime info message more explicit to prevent LLM confusion about current year (#6101) refactor: unify subagent and subrecipe tools into single tool (#5893) goose repo is too big for the issue solver workflow worker (#6099) fix: use system not developer role in db (#6098) ...
* 'main' of github.com:block/goose: (22 commits) OpenRouter & Xai streaming (#5873) fix: resolve mcp-hermit cleanup path expansion issue (#5953) feat: add goose PR reviewer workflow (#6124) perf: Avoid repeated MCP queries during streaming responses (#6138) Fix YAML serialization for recipes with special characters (#5796) Add more posthog analytics (privacy aware) (#6122) docs: add Sugar MCP server to extensions registry (#6077) Fix tokenState loading on new sessions (#6129) bump bedrock dep versions (#6090) Don't persist ephemeral extensions when resuming sessions (#5974) chore(deps): bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /ui/desktop (#5939) chore(deps): bump node-forge from 1.3.1 to 1.3.2 in /documentation (#5898) Add Scorecard supply-chain security workflow (#5810) Don't show subagent tool when we're a subagent (#6125) Fix keyboard shortcut conflict for Focus Goose Window (#5809) feat(goose-cli): add feature to disable update (#5886) workflow: enable docs-update-recipe-ref (#6132) fix: filter tools in Ollama streaming when chat mode is enabled (#6118) feat(mcp): platform extension for "code mode" MCP tool calling (#6030) workflow: auto-update recipe-reference on release (#5988) ... # Conflicts: # ui/desktop/src/App.tsx # ui/desktop/src/api/sdk.gen.ts # ui/desktop/src/components/ChatInput.tsx # ui/desktop/src/components/recipes/RecipesView.tsx
Implements the idea of "code mode" or "sandbox mode" for MCP
Refs
https://blog.cloudflare.com/code-mode/
https://www.anthropic.com/engineering/code-execution-with-mcp
#5899
Architecture
code_executionplatform extensionread_moduleto read a tool's implementation code to know how to call it, andexecute_codeto send code to run to call tool(s)read_modulewith the ability to read the source code implementing one tool callexecute_codewith instructions to the model on how it should write codeservers/:server_name/:tool_name.jsto the model viaget_moim!SendDiagram for tool call dispatching in present state
