Skip to content

Use Port of Context (pctx) for code mode#6765

Merged
alexhancock merged 6 commits intoblock:mainfrom
eliasposen:pctx
Feb 3, 2026
Merged

Use Port of Context (pctx) for code mode#6765
alexhancock merged 6 commits intoblock:mainfrom
eliasposen:pctx

Conversation

@eliasposen
Copy link
Contributor

Summary

This change replaces boa with pctx for CodeMode.

pctx uses a custom deno runtime for type-checking and code execution. It also comes with type generation, mcp registration, and rust callbacks out of the box.

Some thoughts for further development:

  • A limitation of CodeMode the lack of documented outputSchema for many MCPs. pctx is currently exploring ways of mutating & caching a tool's output schema as it is used so the generated CodeMode interface can get better over time.
  • As more benchmarking is done on which scenarios favor CodeMode vs regular tool calling, one might want to manually customize which tools the code executor is allowed to service (with some default).

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

  • Manual testing to ensure CodeMode tools are generated and used correctly.
  • test_providers.sh

Related Issues

Relates to #ISSUE_ID
Discussion: Discord

Screenshots/Demos (for UX changes)

Before:

After:

@alexhancock
Copy link
Collaborator

@eliasposen the core code looks good, but looks like there are a bunch of extra commits in the fork for some reason! if we can remove them then I can run the workflows and we can iterate on fixing the tests that have an issue

Signed-off-by: Elias Posen <elias@posen.ch>
Signed-off-by: Elias Posen <elias@posen.ch>
@eliasposen
Copy link
Contributor Author

@alexhancock was able to hard reset to most recent main & cherry pick my commits

Copy link
Collaborator

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new impl in the extension looks mostly good to me as a net-deletion and complexity reduction - asked a couple Qs though because I want to preserve ToolGraph

"execute_code".to_string(),
"list_functions".to_string(),
indoc! {r#"
Batch multiple MCP tool calls into ONE execution. This is the primary purpose of this tool.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if it still does a good job batching them without these kinds of instructions?

Copy link
Contributor Author

@eliasposen eliasposen Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I don't have any benchmarking on this yet but I have experienced batching during testing, there is still a batching instruction in the InitializeResult & moim of the extension, so I was just syncing these tool instructions to what exists currently in pctx

…ts with markdown

Signed-off-by: Elias Posen <elias@posen.ch>
Copy link
Collaborator

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM now. Will try it out now!

@alexhancock alexhancock self-requested a review January 28, 2026 21:38
Copy link
Collaborator

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM

I like the diff in the code_execution_extension

303 insertions(+), 1094 deletions(-)

@michaelneale may also want to have a go

@michaelneale michaelneale self-assigned this Jan 29, 2026
@michaelneale
Copy link
Collaborator

taking a look as this is right up my alley!

PlatformExtensionDef {
name: code_execution_extension::EXTENSION_NAME,
description: "Execute JavaScript code in a sandboxed environment",
description: "Execute TypeScript code in a sandboxed environment",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably improve this description to explain what it is really for something like "Uses a sandbox to work with extensions"

@michaelneale
Copy link
Collaborator

ok this is fantastic, feels faster and better (no data- just feel!)

so I really think this is the path forward @alexhancock and yeah - much less code to maintain. Some thoughts:

  1. may not be related to this, but with some tool calls I saw context size go down, not up (ie todo and other things) which I wonder if is compaction of tool results somehow (maybe unrelated but as this is very new I wsas on the look out for oddness)

  2. can we move this to a branch so we can run the full battery of tests (@eliasposen we can give you write access so you can still contribute there, no change to the commits etc)?

  3. how can we test it with more models vs the old, I am thinking smaller models (perhaps local ones, not frontier) to see if typescript is still acceptable for them to generate (ollama type of ones, qwen3-coder and such) perhaps some manual poking around before/after?

  4. we have some other code which bypasses codemode (ie subagents, and also skills) already - so on the goose side we can control what things are direct tool calls (seems to work well) - so that is ok still with this?

  5. Does pctx (@eliasposen ) plan to, or do, any kind of script caching/LRU so that as it is used, it can look back and reuse scripts it may have generated before vs generate them fresh (ie they are functions already in teh context, or something like skills?) even in other sessions?

otherwise - if we can get this into a branch, updated and passing, I thin we should go with this. Saves a whole lot of fiddly code for a library.

really great stuff @eliasposen

@eliasposen
Copy link
Contributor Author

Thank you @alexhancock & @michaelneale for your reviews! In response to the points above:

  1. I wonder if this has something to do with the execute instructions including instructions to do as much filtering/data reduction in the typescript code before returning results.
  2. Yes very happy to move this to a branch for more testing!
  3. We are currently evaluating different benchmarking frameworks, we haven't settled on one yet but are liking the look of Apple's ToolSandbox given it's focus on tool usage against stateful APIs. Happy to hear any suggestions on this front!
  4. I maintained the way in which the code execution extension loads active extensions so the codemode bypass should still function 👍
  5. Yes we are planning on exactly that! I think it might also be useful to allow for such a cache between sessions (assuming the available extensions/tools are the same). As these improvements to pctx are planned/released i'd be happy to keep you both in the loop on Discord and keep this goose extension up to date.

@jamadeo
Copy link
Collaborator

jamadeo commented Jan 29, 2026

@eliasposen is there any plan to allow for other runtimes with port of context? It seems like a clear improvement for goose to adopt pctx's mcp-to-ts bridge, but the switch from boa to deno seems to increase the goose binary by quite a lot (113mb -> 190mb on my system). Not a deal-breaker probably, but something to consider.

@eliasposen
Copy link
Contributor Author

@jamadeo We are considering it. These were our main considerations when choosing deno:

  1. Typescript support & transpiling - ability to create a type-check runtime to validate the whole script before execution. Any type errors will be passed back to the model which gives it a chance to fix the script before execution. In boa these would be runtime errors, potentially after changing state with a tool call.
  2. Battle tested - boa is a relatively new project, they still call themselves "experimental" so we felt it was a safer option for the long term.
  3. Runtime snapshotting, leading to faster startup times will all dependencies & rust-bound functions baked in.

Signed-off-by: Elias Posen <elias@posen.ch>
@michaelneale
Copy link
Collaborator

michaelneale commented Feb 3, 2026

ok this is looking good. Apologise for the confusing report, but you can probably trust me that I am running various tests with open models of various sizes (and of course frontier) and pctx seems to consistently outperform (my suite here: https://github.com/michaelneale/open-model-gym - but ignore that for now) "release" -- pctx, "default" -- boa

image

more correct and/or faster, so this is a big yes for me!

lets go!

@codefromthecrypt
Copy link
Collaborator

I will fix the ACP thing. the fixtures were supposed to give a good error on mismatch, not panik. this is a bug I can sort out here.

@rabi
Copy link
Contributor

rabi commented Feb 3, 2026

My 2 cents. I did few rounds of testing with vllm/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 to see how this works with smaller models.

A. code_execution with this PR (pctx)

It keeps calling get_function_details in a loop and can't progress to the execute step.

starting session | provider: custom_vllm model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
    session id: 20260203_2
    working directory: /tmp/goose

goose is running! Enter your instructions, or try asking what goose can do.

Context: ○○○○○○○○○○ 0% (0/128000 tokens)
( O)> get last 30 issues from block/goose that don't have a PR linked sort by created

─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── list_functions |  ──────────────────────────


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: "Github.searchIssues"


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────
functions: ["Github.searchIssues"]


─── get_function_details |  ──────────────────────────

B. Existing boa implementation with the improvement PR #6497

starting session | provider: custom_vllm model: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
    session id: 20260203_3
    working directory: /tmp/goose

goose is running! Enter your instructions, or try asking what goose can do.

Context: ○○○○○○○○○○ 0% (0/128000 tokens)
( O)> get last 30 issues from block/goose that don't have a PR linked sort by created

─── search_modules | code_execution ──────────────────────────
terms: ["search_issues","github"]


─── read_module | code_execution ──────────────────────────
module_path: github/search_issues


─── 1 tool call | execute_code ──────────────────────────
  1. github/search_issues: Search GitHub issues for block/goose without linked PRs, sorted by creation date, retrieving up to 30 results

## Quick Summary of the Recent Goose Issues (≈30 most recent)
...

@codefromthecrypt
Copy link
Collaborator

codefromthecrypt commented Feb 3, 2026

@eliasposen I pushed a copy of this branch to https://github.com/block/goose/compare/pctx and this includes the test expectation fix and also some polish to the fixtures which hid the underlying mismatch. code mode is a little different. those fixes are in this commit 9002a2e0bc17214311d9238b225fa2c181fb47aa

So, you should do this:

git fetch https://github.com/block/goose.git pctx
git cherry-pick 9002a2e0bc17214311d9238b225fa2c181fb47aa

Then rebase on latest block/goose main after you do above.

Hope this helps!

codefromthecrypt and others added 2 commits February 3, 2026 10:09
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Elias Posen <elias@posen.ch>
@alexhancock alexhancock self-requested a review February 3, 2026 17:15
@alexhancock alexhancock merged commit 8631caa into block:main Feb 3, 2026
20 checks passed
stebbins pushed a commit to stebbins/goose that referenced this pull request Feb 4, 2026
Signed-off-by: Elias Posen <elias@posen.ch>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Harrison <hcstebbins@gmail.com>
kuccello pushed a commit to kuccello/goose that referenced this pull request Feb 7, 2026
Signed-off-by: Elias Posen <elias@posen.ch>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
@lifeizhou-ap lifeizhou-ap mentioned this pull request Feb 9, 2026
9 tasks
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Signed-off-by: Elias Posen <elias@posen.ch>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Signed-off-by: Elias Posen <elias@posen.ch>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Co-authored-by: Adrian Cole <adrian@tetrate.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants