Lifei/smoke test for developer by lifeizhou-ap · Pull Request #7174 · block/goose

lifeizhou-ap · 2026-02-12T06:48:42Z

Summary

Type of Change

AI Assistance

This PR was created or reviewed with AI assistance

Testing

Related Issues

Relates to #ISSUE_ID
Discussion: LINK (if any)

Screenshots/Demos (for UX changes)

Before:

After:

Copilot

Pull request overview

Refactors provider smoke-testing scripts into a shared library and splits “normal” vs “code_execution” test modes into separate entrypoints, updating the PR smoke-test workflow to use the new code-exec script.

Changes:

Introduces scripts/test_providers_lib.sh to centralize provider/model config, env/CLI availability checks, parallel execution, and result reporting.
Updates scripts/test_providers.sh to use the shared library and changes the normal-mode test prompt/verification logic.
Adds scripts/test_providers_code_exec.sh and updates .github/workflows/pr-smoke-test.yml to call it (and remove agentic/Node setup from the code-exec job).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
scripts/test_providers_lib.sh	New shared library for provider smoke tests (config, filtering, parallel runner, reporting).
scripts/test_providers_code_exec.sh	New code-execution-mode smoke test runner using the shared library.
scripts/test_providers.sh	Refactored normal-mode provider tests to use the shared library and a new prompt/assertion strategy.
.github/workflows/pr-smoke-test.yml	Adjusts the code-exec smoke-test job to run the new script and removes unnecessary Node/agentic setup there.

Copilot · 2026-02-12T06:56:04Z

scripts/test_providers_lib.sh

+    [[ "$line" =~ ^#.*$ || -z "$line" ]] && continue
+    local provider="${line%% -> *}"
+    if is_provider_available "$provider"; then
+      providers+=("$(eval echo "\"$line\"")")


Using eval to expand the provider config line allows command substitution via environment variables (e.g., a malicious AZURE_OPENAI_DEPLOYMENT_NAME), so this should be replaced with a non-eval expansion (handle azure_openai explicitly or do safe ${VAR} substitution).

Suggested change

providers+=("$(eval echo "\"$line\"")")

local expanded_line="${line//\$\{AZURE_OPENAI_DEPLOYMENT_NAME\}/${AZURE_OPENAI_DEPLOYMENT_NAME}}"

providers+=("$expanded_line")

Copilot · 2026-02-12T06:56:04Z

scripts/test_providers_lib.sh

+  local test_fn="$1"
+
+  RESULTS_DIR=$(mktemp -d)
+  trap "rm -rf $RESULTS_DIR ${CLEANUP_DIR:-}" EXIT


The EXIT trap runs rm -rf $RESULTS_DIR ${CLEANUP_DIR:-} without quoting, so a value containing spaces/globs (or an empty/unset var edge case) can delete the wrong paths; quote variables and guard against empty values before rm -rf.

Suggested change

trap "rm -rf $RESULTS_DIR ${CLEANUP_DIR:-}" EXIT

trap 'if [ -n "${RESULTS_DIR:-}" ]; then rm -rf -- "$RESULTS_DIR"; fi; if [ -n "${CLEANUP_DIR:-}" ]; then rm -rf -- "$CLEANUP_DIR"; fi' EXIT

Copilot · 2026-02-12T06:56:04Z

scripts/test_providers.sh

+    if grep -q "TEST-CONTENT-ABC123" "$output_file"; then
+      echo "success|model read and uppercased file content" > "$result_file"
    else
-      echo "✗ FAILED: Test failed - $FAILURE_MSG"
-      RESULTS+=("✗ ${provider}: ${model}")
-      HARD_FAILURES+=("${provider}: ${model}")
+      echo "failure|model did not return uppercased file content" > "$result_file"
    fi


This non-agentic check can false-pass if the model simply prints the known constant (without using tools), since the expected output is deterministic; consider using per-run random content and/or asserting the text_editor | developer tool-call marker in the output.

Copilot · 2026-02-12T06:56:05Z

scripts/test_providers.sh

    export GOOSE_PROVIDER="$provider"
    export GOOSE_MODEL="$model"
-    cd "$testdir" && "$SCRIPT_DIR/target/debug/goose" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1
+    export PATH=""
+    cd "$testdir" && "$GOOSE_BIN" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1


Setting PATH to empty here can make agentic providers fail to resolve their CLI executables (since resolution depends on PATH + extra search paths), which can cause environment-dependent flakes; keep PATH or set a minimal PATH like /usr/local/bin:/usr/bin:/bin.

the PATH="" was a mistake

* main: fix text editor view broken (#7167) docs: White label guide (#6857) Add PATH detection back to developer extension (#7161) docs: pin version in ci/cd (#7168) Desktop: - No Custom Headers field for custom OpenAI-compatible providers (#6681) feat: edit model and extensions of a recipe from GUI (#6804) feat: MCP support for agentic CLI providers (#6972)

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Copilot · 2026-02-12T08:43:58Z

scripts/test_providers_lib.sh

+PROVIDER_CONFIG="
+openrouter -> google/gemini-2.5-pro|anthropic/claude-sonnet-4.5|qwen/qwen3-coder:exacto|z-ai/glm-4.6:exacto|nvidia/nemotron-3-nano-30b-a3b
+xai -> grok-3
+openai -> gpt-4o|gpt-4o-mini|gpt-3.5-turbo|gpt-5
+anthropic -> claude-sonnet-4-5-20250929|claude-opus-4-5-20251101
+google -> gemini-2.5-pro|gemini-2.5-flash|gemini-3-pro-preview|gemini-3-flash-preview
+tetrate -> claude-sonnet-4-20250514
+databricks -> databricks-claude-sonnet-4|gemini-2-5-flash|gpt-4o
+azure_openai -> ${AZURE_OPENAI_DEPLOYMENT_NAME}
+aws_bedrock -> us.anthropic.claude-sonnet-4-5-20250929-v1:0


PROVIDER_CONFIG expands ${AZURE_OPENAI_DEPLOYMENT_NAME} before .env is loaded, so running locally with .env will leave the Azure model empty (and may produce no Azure test cases even when the env vars are set); load .env before building PROVIDER_CONFIG, or defer expansion (e.g., build the Azure line after env load).

Copilot · 2026-02-12T08:43:58Z

scripts/test_providers_code_exec.sh

+  echo "hello" > "$testdir/hello.txt"
+  local prompt="Run 'ls' to list files in the current directory."
+
+  # Run goose
+  (
+    export GOOSE_PROVIDER="$provider"
+    export GOOSE_MODEL="$model"
+    cd "$testdir" && "$GOOSE_BIN" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1
+  ) > "$output_file" 2>&1
+
+  # Verify: code_execution tool must be called
+  # Matches: "execute | code_execution", "get_function_details | code_execution",
+  #           "tool call | execute", "tool calls | execute"
+  if grep -qE "(execute \| code_execution)|(get_function_details \| code_execution)|(tool calls? \| execute)" "$output_file"; then


The code-exec smoke test prompt doesn't explicitly require using the code_execution/execute tool, so models may respond with a textual explanation (or use a different tool) and cause flaky CI failures; make the prompt force a tool call (similar to the normal-mode script’s “Do not ask for confirmation” style) and/or assert on the ls output as well as the tool-call log.

the session only has code_execution tool, so it is fine

DOsinga

this is a good refactor!

I do wonder though whether at some point we should split the provider test into something we want to run for all providers and then a longer one that does specific and more complicated things but only for one reliable provider. the more we ask for slightly flaky providers, the more they will be flaky, plus of course it will take longer.

* origin/main: (33 commits) fix: replace panic with proper error handling in get_tokenizer (#7175) Lifei/smoke test for developer (#7174) fix text editor view broken (#7167) docs: White label guide (#6857) Add PATH detection back to developer extension (#7161) docs: pin version in ci/cd (#7168) Desktop: - No Custom Headers field for custom OpenAI-compatible providers (#6681) feat: edit model and extensions of a recipe from GUI (#6804) feat: MCP support for agentic CLI providers (#6972) docs: keyring fallback to secrets.yaml (#7165) feat: load provider/model specified inside the recipe config (#6884) fix ask-ai bot hitting tool call limits (#7162) fix flatpak icon (#7154) [docs] Skills Marketplace UI Improvements (#7158) More no-window flags (#7122) feat: Allow overriding default bat themes using environment variables (#7140) Make the system prompt smaller (#6991) Pre release script (#7145) Spelling (#7137) feat(mcp): upgrade rmcp to 0.15.0 and advertise MCP Apps UI extension capability (#6927) ...

…provenance * origin/main: (68 commits) Upgraded npm packages for latest security updates (#7183) docs: reasoning effort levels for Codex provider (#6798) Fix speech local (#7181) chore: add .gooseignore to .gitignore (#6826) Improve error message logging from electron (#7130) chore(deps): bump jsonwebtoken from 9.3.1 to 10.3.0 (#6924) docs: standalone mcp apps and apps extension (#6791) workflow: auto-update cli-commands on release (#6755) feat(apps): Integrate AppRenderer from @mcp-ui/client SDK (#7013) fix(MCP): decode resource content (#7155) feat: reasoning_content in API for reasoning models (#6322) Fix/configure add provider custom headers (#7157) fix: handle keyring fallback as success (#7177) Update process-wrap to 9.0.3 (9.0.2 is yanked) (#7176) feat: support extra field in chatcompletion tool_calls for gemini openai compat (#6184) fix: replace panic with proper error handling in get_tokenizer (#7175) Lifei/smoke test for developer (#7174) fix text editor view broken (#7167) docs: White label guide (#6857) Add PATH detection back to developer extension (#7161) ... # Conflicts: # .github/workflows/nightly.yml

lifeizhou-ap added 2 commits February 12, 2026 15:36

refactored the test_providers scripts

162c73c

added test for develop editor view

520c53a

Copilot AI review requested due to automatic review settings February 12, 2026 06:48

Copilot started reviewing on behalf of lifeizhou-ap February 12, 2026 06:49 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

lifeizhou-ap added 2 commits February 12, 2026 18:25

fixed the test

0a1a58e

Copilot AI review requested due to automatic review settings February 12, 2026 08:11

Copilot started reviewing on behalf of lifeizhou-ap February 12, 2026 08:12 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

lifeizhou-ap added 2 commits February 12, 2026 19:37

tune the prompt and clean up

95125c8

more cleanup

f1baebd

Copilot AI review requested due to automatic review settings February 12, 2026 08:38

Copilot started reviewing on behalf of lifeizhou-ap February 12, 2026 08:38 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

DOsinga approved these changes Feb 12, 2026

View reviewed changes

lifeizhou-ap added this pull request to the merge queue Feb 12, 2026

Merged via the queue into main with commit 0e11d18 Feb 12, 2026
25 checks passed

lifeizhou-ap deleted the lifei/smoke-test-for-developer branch February 12, 2026 10:22

github-actions bot mentioned this pull request Feb 12, 2026

chore(release): release version 1.24.0 (minor) #7102

Closed

github-actions bot mentioned this pull request Feb 17, 2026

chore(release): release version 1.25.0 (minor) #7263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lifei/smoke test for developer#7174

Lifei/smoke test for developer#7174
lifeizhou-ap merged 6 commits intomainfrom
lifei/smoke-test-for-developer

lifeizhou-ap commented Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

lifeizhou-ap Feb 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

lifeizhou-ap Feb 12, 2026

Uh oh!

DOsinga left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	providers+=("$(eval echo "\"$line\"")")
	local expanded_line="${line//\$\{AZURE_OPENAI_DEPLOYMENT_NAME\}/${AZURE_OPENAI_DEPLOYMENT_NAME}}"
	providers+=("$expanded_line")

	trap "rm -rf $RESULTS_DIR ${CLEANUP_DIR:-}" EXIT
	trap 'if [ -n "${RESULTS_DIR:-}" ]; then rm -rf -- "$RESULTS_DIR"; fi; if [ -n "${CLEANUP_DIR:-}" ]; then rm -rf -- "$CLEANUP_DIR"; fi' EXIT

Conversation

lifeizhou-ap commented Feb 12, 2026

Summary

Type of Change

AI Assistance

Testing

Related Issues

Screenshots/Demos (for UX changes)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

lifeizhou-ap Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

lifeizhou-ap Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

DOsinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants