-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Please explain the motivation behind the feature request.
goose-acp module has test fixtures that allow it to run in-process (no build) with real HTTP requests (proxy) and handle edge cases such propagation parity and the "session rename" request that is async and can happen in any order.
At a high level, this fakes the OpenAI endpoint, and returns real HTTP responses captured from goose given a request pattern.
Tests look like this and match/body pairs can be re-used across different tests which use the same prompt, but test different side effects, such as session resume.
https://github.com/block/goose/blob/main/crates/goose-acp/tests/common_tests/mod.rs
let expected_session_id = ExpectedSessionId::default();
let openai = OpenAiFixture::new(
vec![(
r#"</info-msg>\nwhat is 1+1""#.into(), // expect req body to contain this
include_str!("../test_data/openai_basic_response.txt"), // and return this
)],
expected_session_id.clone(),
)
.await;This request match approach was instead of json eq that would fail on any diff of the body. This not only reduces the size of files checked into git, but also avoids any accidentally ephemeral data. If you try to do a body-eq, even scrubbing whitespace, minor differences result in having to rewrite the entire request file.
While this approach is simple to read, it has plenty of warts, especially when making new tests that use MCP, or re-recording.
It requires manual steps at the moment
- Run the same fake MCP
Lookupserver the tests use, but listening on a predefined port - Temporarily patch
crates/goose/src/providers/utils.rsinstream_openai_compat, to dump http responses to /tmp/raw_sse.txt - run the same scenario as a test with a compiled binary
- study logs for request patterns, e.g.
~/.local/state/goose/logs/forllm_request.*.jsonl - match them to responses in /tmp/raw_sse.txt
Example run:
# Start fake MCP server
pkill -f fake_mcp_server; cargo run -p goose --example fake_mcp_server > /tmp/fake_mcp_server.log 2>&1 &
# For test_acp_with_builtin_and_mcp:
rm -rf /tmp/result.txt /tmp/raw_sse.txt ~/.local/state/goose/logs/
cargo run -p goose-cli -- run \
--with-builtin code_execution,developer \
--with-streamable-http-extension http://127.0.0.1:9753/mcp \
-t 'Search for getCode and textEditor tools. Use them to save the code to /tmp/result.txt.'Describe the solution you'd like
This design was never meant to be permanent, a VCR pattern is far easier to maintain.
In a VCR pattern, you define the scenario as a header, so that request/response pairs can be recorded together. Then, infrastructure will write testdata/test_acp_with_builtin_and_mcp.yaml when an ENV like XXX_RECORD_MODE=true all requests/responses are re-recorded. If that ENV isn't set, if any request fails to match, a 417 returns with a json diff.
let expected_session_id = ExpectedSessionId::default();
let openai = OpenAiFixture::new("test_acp_with_builtin_and_mcp",
"test_acp_with_builtin_and_mcp", <-- key for all requests
expected_session_id.clone(),
)
.await;Describe alternatives you've considered
An alternative would be to automate the data collection part so that the existing match / response patterns can be derived easier, without redoing the whole thing to VCR which is a substantial amount of code and will lead to a large amount of JSON blobs checked in due to repeated system prompts for every test.
Additional context
This VCR pattern was implemented by the same author @codefromthecrypt in the python version of goose and he actively maintains the same in envoy ai-gateway. Just doing this over in rust is tricky and also there are aspects to special case now, such as the session rename request which is not in a predictable order. It was less code to put an interim solution together as a stop-gap.
- I have verified this does not duplicate an existing feature request