Use Port of Context (pctx) for code mode by eliasposen · Pull Request #6765 · block/goose

eliasposen · 2026-01-28T04:23:15Z

Summary

This change replaces boa with pctx for CodeMode.

pctx uses a custom deno runtime for type-checking and code execution. It also comes with type generation, mcp registration, and rust callbacks out of the box.

Some thoughts for further development:

A limitation of CodeMode the lack of documented outputSchema for many MCPs. pctx is currently exploring ways of mutating & caching a tool's output schema as it is used so the generated CodeMode interface can get better over time.
As more benchmarking is done on which scenarios favor CodeMode vs regular tool calling, one might want to manually customize which tools the code executor is allowed to service (with some default).

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Manual testing to ensure CodeMode tools are generated and used correctly.
test_providers.sh

Related Issues

Relates to #ISSUE_ID
Discussion: Discord

Screenshots/Demos (for UX changes)

Before:

After:

alexhancock · 2026-01-28T17:27:07Z

@eliasposen the core code looks good, but looks like there are a bunch of extra commits in the fork for some reason! if we can remove them then I can run the workflows and we can iterate on fixing the tests that have an issue

Signed-off-by: Elias Posen <elias@posen.ch>

eliasposen · 2026-01-28T19:25:07Z

@alexhancock was able to hard reset to most recent main & cherry pick my commits

alexhancock

The new impl in the extension looks mostly good to me as a net-deletion and complexity reduction - asked a couple Qs though because I want to preserve ToolGraph

ui/desktop/src/components/ToolCallWithResponse.tsx

crates/goose-cli/src/session/output.rs

alexhancock · 2026-01-28T19:29:17Z

crates/goose/src/agents/code_execution_extension.rs

-                    "execute_code".to_string(),
+                    "list_functions".to_string(),
                    indoc! {r#"
-                        Batch multiple MCP tool calls into ONE execution. This is the primary purpose of this tool.


I'm curious if it still does a good job batching them without these kinds of instructions?

I'm afraid I don't have any benchmarking on this yet but I have experienced batching during testing, there is still a batching instruction in the InitializeResult & moim of the extension, so I was just syncing these tool instructions to what exists currently in pctx

…ts with markdown Signed-off-by: Elias Posen <elias@posen.ch>

alexhancock

Code LGTM now. Will try it out now!

alexhancock

This LGTM

I like the diff in the code_execution_extension

303 insertions(+), 1094 deletions(-)

@michaelneale may also want to have a go

michaelneale · 2026-01-29T05:01:45Z

taking a look as this is right up my alley!

michaelneale · 2026-01-29T06:29:40Z

crates/goose/src/agents/extension.rs

            PlatformExtensionDef {
                name: code_execution_extension::EXTENSION_NAME,
-                description: "Execute JavaScript code in a sandboxed environment",
+                description: "Execute TypeScript code in a sandboxed environment",


we can probably improve this description to explain what it is really for something like "Uses a sandbox to work with extensions"

michaelneale · 2026-01-29T06:41:28Z

ok this is fantastic, feels faster and better (no data- just feel!)

so I really think this is the path forward @alexhancock and yeah - much less code to maintain. Some thoughts:

may not be related to this, but with some tool calls I saw context size go down, not up (ie todo and other things) which I wonder if is compaction of tool results somehow (maybe unrelated but as this is very new I wsas on the look out for oddness)
can we move this to a branch so we can run the full battery of tests (@eliasposen we can give you write access so you can still contribute there, no change to the commits etc)?
how can we test it with more models vs the old, I am thinking smaller models (perhaps local ones, not frontier) to see if typescript is still acceptable for them to generate (ollama type of ones, qwen3-coder and such) perhaps some manual poking around before/after?
we have some other code which bypasses codemode (ie subagents, and also skills) already - so on the goose side we can control what things are direct tool calls (seems to work well) - so that is ok still with this?
Does pctx (@eliasposen ) plan to, or do, any kind of script caching/LRU so that as it is used, it can look back and reuse scripts it may have generated before vs generate them fresh (ie they are functions already in teh context, or something like skills?) even in other sessions?

otherwise - if we can get this into a branch, updated and passing, I thin we should go with this. Saves a whole lot of fiddly code for a library.

really great stuff @eliasposen

eliasposen · 2026-01-29T16:09:51Z

Thank you @alexhancock & @michaelneale for your reviews! In response to the points above:

I wonder if this has something to do with the execute instructions including instructions to do as much filtering/data reduction in the typescript code before returning results.
Yes very happy to move this to a branch for more testing!
We are currently evaluating different benchmarking frameworks, we haven't settled on one yet but are liking the look of Apple's ToolSandbox given it's focus on tool usage against stateful APIs. Happy to hear any suggestions on this front!
I maintained the way in which the code execution extension loads active extensions so the codemode bypass should still function 👍
Yes we are planning on exactly that! I think it might also be useful to allow for such a cache between sessions (assuming the available extensions/tools are the same). As these improvements to pctx are planned/released i'd be happy to keep you both in the loop on Discord and keep this goose extension up to date.

jamadeo · 2026-01-29T16:13:11Z

@eliasposen is there any plan to allow for other runtimes with port of context? It seems like a clear improvement for goose to adopt pctx's mcp-to-ts bridge, but the switch from boa to deno seems to increase the goose binary by quite a lot (113mb -> 190mb on my system). Not a deal-breaker probably, but something to consider.

eliasposen · 2026-01-29T21:40:57Z

@jamadeo We are considering it. These were our main considerations when choosing deno:

Typescript support & transpiling - ability to create a type-check runtime to validate the whole script before execution. Any type errors will be passed back to the model which gives it a chance to fix the script before execution. In boa these would be runtime errors, potentially after changing state with a tool call.
Battle tested - boa is a relatively new project, they still call themselves "experimental" so we felt it was a safer option for the long term.
Runtime snapshotting, leading to faster startup times will all dependencies & rust-bound functions baked in.

Signed-off-by: Elias Posen <elias@posen.ch>

michaelneale · 2026-02-03T00:48:23Z

ok this is looking good. Apologise for the confusing report, but you can probably trust me that I am running various tests with open models of various sizes (and of course frontier) and pctx seems to consistently outperform (my suite here: https://github.com/michaelneale/open-model-gym - but ignore that for now) "release" -- pctx, "default" -- boa

more correct and/or faster, so this is a big yes for me!

lets go!

codefromthecrypt · 2026-02-03T02:31:06Z

I will fix the ACP thing. the fixtures were supposed to give a good error on mismatch, not panik. this is a bug I can sort out here.

rabi · 2026-02-03T04:05:35Z

My 2 cents. I did few rounds of testing with vllm/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 to see how this works with smaller models.

A. code_execution with this PR (pctx)

It keeps calling get_function_details in a loop and can't progress to the execute step.

starting session | provider: custom_vllm model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
    session id: 20260203_2
    working directory: /tmp/goose

goose is running! Enter your instructions, or try asking what goose can do.

Context: ○○○○○○○○○○ 0% (0/128000 tokens)
( O)> get last 30 issues from block/goose that don't have a PR linked sort by created

─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── list_functions |  ──────────────────────────


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: "Github.searchIssues"


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────

B. Existing boa implementation with the improvement PR #6497

starting session | provider: custom_vllm model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
    session id: 20260203_3
    working directory: /tmp/goose

goose is running! Enter your instructions, or try asking what goose can do.

Context: ○○○○○○○○○○ 0% (0/128000 tokens)
( O)> get last 30 issues from block/goose that don't have a PR linked sort by created

─── search_modules | code_execution ──────────────────────────
terms: ["search_issues","github"]


─── read_module | code_execution ──────────────────────────
module_path: github/search_issues


─── 1 tool call | execute_code ──────────────────────────
  1. github/search_issues: Search GitHub issues for block/goose without linked PRs, sorted by creation date, retrieving up to 30 results

## Quick Summary of the Recent Goose Issues (≈30 most recent)
...

codefromthecrypt · 2026-02-03T06:27:37Z

@eliasposen I pushed a copy of this branch to https://github.com/block/goose/compare/pctx and this includes the test expectation fix and also some polish to the fixtures which hid the underlying mismatch. code mode is a little different. those fixes are in this commit 9002a2e0bc17214311d9238b225fa2c181fb47aa

So, you should do this:

git fetch https://github.com/block/goose.git pctx
git cherry-pick 9002a2e0bc17214311d9238b225fa2c181fb47aa

Then rebase on latest block/goose main after you do above.

Hope this helps!

Signed-off-by: Adrian Cole <adrian@tetrate.io>

Signed-off-by: Elias Posen <elias@posen.ch>

Signed-off-by: Elias Posen <elias@posen.ch> Signed-off-by: Adrian Cole <adrian@tetrate.io> Co-authored-by: Adrian Cole <adrian@tetrate.io> Signed-off-by: Harrison <hcstebbins@gmail.com>

Signed-off-by: Elias Posen <elias@posen.ch> Signed-off-by: Adrian Cole <adrian@tetrate.io> Co-authored-by: Adrian Cole <adrian@tetrate.io>

eliasposen requested a review from a team as a code owner January 28, 2026 04:23

eliasposen force-pushed the pctx branch from 21fcd2a to 4ec9a1a Compare January 28, 2026 16:44

DOsinga requested a review from alexhancock January 28, 2026 17:24

DOsinga assigned alexhancock Jan 28, 2026

eliasposen added 2 commits January 28, 2026 14:23

use pctx for code mode

fb9ab96

Signed-off-by: Elias Posen <elias@posen.ch>

fix upstream native-tls dependency

8c4b014

Signed-off-by: Elias Posen <elias@posen.ch>

eliasposen force-pushed the pctx branch from 579e96b to 8c4b014 Compare January 28, 2026 19:23

alexhancock reviewed Jan 28, 2026

View reviewed changes

re-introduce tool_graph for rendering & render code mode inputs/outpu…

2bbbbc7

…ts with markdown Signed-off-by: Elias Posen <elias@posen.ch>

alexhancock reviewed Jan 28, 2026

View reviewed changes

alexhancock self-requested a review January 28, 2026 21:38

alexhancock reviewed Jan 28, 2026

View reviewed changes

michaelneale self-assigned this Jan 29, 2026

michaelneale reviewed Jan 29, 2026

View reviewed changes

rabi mentioned this pull request Feb 2, 2026

fix: improve code execution mode tool handling #6497

Closed

10 tasks

resolve merge conflicts

d0f46d2

Signed-off-by: Elias Posen <elias@posen.ch>

codefromthecrypt mentioned this pull request Feb 3, 2026

fix(acp): fixtures now raise content mismatch errors #6912

Merged

3 tasks

codefromthecrypt and others added 2 commits February 3, 2026 10:09

acp: adjusts diff in replay tests

ee1e083

Signed-off-by: Adrian Cole <adrian@tetrate.io>

Merge remote-tracking branch 'upstream/main' into pctx

c9adc6e

Signed-off-by: Elias Posen <elias@posen.ch>

alexhancock self-requested a review February 3, 2026 17:15

alexhancock approved these changes Feb 3, 2026

View reviewed changes

alexhancock merged commit 8631caa into block:main Feb 3, 2026
20 checks passed

kuccello pushed a commit to kuccello/goose that referenced this pull request Feb 7, 2026

Use Port of Context (pctx) for code mode (block#6765)

76d6265

Signed-off-by: Elias Posen <elias@posen.ch> Signed-off-by: Adrian Cole <adrian@tetrate.io> Co-authored-by: Adrian Cole <adrian@tetrate.io>

lifeizhou-ap mentioned this pull request Feb 9, 2026

fix: switch to windows msvc #7080

Merged

9 tasks

github-actions bot mentioned this pull request Feb 10, 2026

chore(release): release version 1.24.0 (minor) #7102

Closed

BrewTestBot mentioned this pull request Feb 12, 2026

block-goose-cli 1.24.0 Homebrew/homebrew-core#267289

Merged

Conversation

eliasposen commented Jan 28, 2026

Summary

Type of Change

AI Assistance

Testing

Related Issues

Screenshots/Demos (for UX changes)

Uh oh!

alexhancock commented Jan 28, 2026

Uh oh!

eliasposen commented Jan 28, 2026

Uh oh!

alexhancock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alexhancock Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

eliasposen Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexhancock left a comment

Choose a reason for hiding this comment

Uh oh!

alexhancock left a comment

Choose a reason for hiding this comment

Uh oh!

michaelneale commented Jan 29, 2026

Uh oh!

michaelneale Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

michaelneale commented Jan 29, 2026

Uh oh!

eliasposen commented Jan 29, 2026

Uh oh!

jamadeo commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eliasposen commented Jan 29, 2026

Uh oh!

michaelneale commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codefromthecrypt commented Feb 3, 2026

Uh oh!

rabi commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codefromthecrypt commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

eliasposen Jan 28, 2026 •

edited

Loading

jamadeo commented Jan 29, 2026 •

edited

Loading

michaelneale commented Feb 3, 2026 •

edited

Loading

rabi commented Feb 3, 2026 •

edited

Loading

codefromthecrypt commented Feb 3, 2026 •

edited

Loading