Skip to content

Lifei/smoke test for developer#7174

Merged
lifeizhou-ap merged 6 commits intomainfrom
lifei/smoke-test-for-developer
Feb 12, 2026
Merged

Lifei/smoke test for developer#7174
lifeizhou-ap merged 6 commits intomainfrom
lifei/smoke-test-for-developer

Conversation

@lifeizhou-ap
Copy link
Collaborator

Summary

Type of Change

  • Feature
  • Bug fix
  • Refactor / Code quality
  • Performance improvement
  • Documentation
  • Tests
  • Security fix
  • Build / Release
  • Other (specify below)

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Related Issues

Relates to #ISSUE_ID
Discussion: LINK (if any)

Screenshots/Demos (for UX changes)

Before:

After:

Copilot AI review requested due to automatic review settings February 12, 2026 06:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors provider smoke-testing scripts into a shared library and splits “normal” vs “code_execution” test modes into separate entrypoints, updating the PR smoke-test workflow to use the new code-exec script.

Changes:

  • Introduces scripts/test_providers_lib.sh to centralize provider/model config, env/CLI availability checks, parallel execution, and result reporting.
  • Updates scripts/test_providers.sh to use the shared library and changes the normal-mode test prompt/verification logic.
  • Adds scripts/test_providers_code_exec.sh and updates .github/workflows/pr-smoke-test.yml to call it (and remove agentic/Node setup from the code-exec job).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
scripts/test_providers_lib.sh New shared library for provider smoke tests (config, filtering, parallel runner, reporting).
scripts/test_providers_code_exec.sh New code-execution-mode smoke test runner using the shared library.
scripts/test_providers.sh Refactored normal-mode provider tests to use the shared library and a new prompt/assertion strategy.
.github/workflows/pr-smoke-test.yml Adjusts the code-exec smoke-test job to run the new script and removes unnecessary Node/agentic setup there.

[[ "$line" =~ ^#.*$ || -z "$line" ]] && continue
local provider="${line%% -> *}"
if is_provider_available "$provider"; then
providers+=("$(eval echo "\"$line\"")")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using eval to expand the provider config line allows command substitution via environment variables (e.g., a malicious AZURE_OPENAI_DEPLOYMENT_NAME), so this should be replaced with a non-eval expansion (handle azure_openai explicitly or do safe ${VAR} substitution).

Suggested change
providers+=("$(eval echo "\"$line\"")")
local expanded_line="${line//\$\{AZURE_OPENAI_DEPLOYMENT_NAME\}/${AZURE_OPENAI_DEPLOYMENT_NAME}}"
providers+=("$expanded_line")

Copilot uses AI. Check for mistakes.
local test_fn="$1"

RESULTS_DIR=$(mktemp -d)
trap "rm -rf $RESULTS_DIR ${CLEANUP_DIR:-}" EXIT
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXIT trap runs rm -rf $RESULTS_DIR ${CLEANUP_DIR:-} without quoting, so a value containing spaces/globs (or an empty/unset var edge case) can delete the wrong paths; quote variables and guard against empty values before rm -rf.

Suggested change
trap "rm -rf $RESULTS_DIR ${CLEANUP_DIR:-}" EXIT
trap 'if [ -n "${RESULTS_DIR:-}" ]; then rm -rf -- "$RESULTS_DIR"; fi; if [ -n "${CLEANUP_DIR:-}" ]; then rm -rf -- "$CLEANUP_DIR"; fi' EXIT

Copilot uses AI. Check for mistakes.
Comment on lines 45 to 49
if grep -q "TEST-CONTENT-ABC123" "$output_file"; then
echo "success|model read and uppercased file content" > "$result_file"
else
echo "✗ FAILED: Test failed - $FAILURE_MSG"
RESULTS+=("✗ ${provider}: ${model}")
HARD_FAILURES+=("${provider}: ${model}")
echo "failure|model did not return uppercased file content" > "$result_file"
fi
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-agentic check can false-pass if the model simply prints the known constant (without using tools), since the expected output is deterministic; consider using per-run random content and/or asserting the text_editor | developer tool-call marker in the output.

Copilot uses AI. Check for mistakes.
Comment on lines 32 to 35
export GOOSE_PROVIDER="$provider"
export GOOSE_MODEL="$model"
cd "$testdir" && "$SCRIPT_DIR/target/debug/goose" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1
export PATH=""
cd "$testdir" && "$GOOSE_BIN" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting PATH to empty here can make agentic providers fail to resolve their CLI executables (since resolution depends on PATH + extra search paths), which can cause environment-dependent flakes; keep PATH or set a minimal PATH like /usr/local/bin:/usr/bin:/bin.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PATH="" was a mistake

* main:
  fix text editor view broken (#7167)
  docs: White label guide (#6857)
  Add PATH detection back to developer extension (#7161)
  docs: pin version in ci/cd (#7168)
  Desktop: - No Custom Headers field for custom OpenAI-compatible providers  (#6681)
  feat: edit model and extensions of a recipe from GUI (#6804)
  feat: MCP support for agentic CLI providers (#6972)
Copilot AI review requested due to automatic review settings February 12, 2026 08:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings February 12, 2026 08:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment on lines +3 to +12
PROVIDER_CONFIG="
openrouter -> google/gemini-2.5-pro|anthropic/claude-sonnet-4.5|qwen/qwen3-coder:exacto|z-ai/glm-4.6:exacto|nvidia/nemotron-3-nano-30b-a3b
xai -> grok-3
openai -> gpt-4o|gpt-4o-mini|gpt-3.5-turbo|gpt-5
anthropic -> claude-sonnet-4-5-20250929|claude-opus-4-5-20251101
google -> gemini-2.5-pro|gemini-2.5-flash|gemini-3-pro-preview|gemini-3-flash-preview
tetrate -> claude-sonnet-4-20250514
databricks -> databricks-claude-sonnet-4|gemini-2-5-flash|gpt-4o
azure_openai -> ${AZURE_OPENAI_DEPLOYMENT_NAME}
aws_bedrock -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROVIDER_CONFIG expands ${AZURE_OPENAI_DEPLOYMENT_NAME} before .env is loaded, so running locally with .env will leave the Azure model empty (and may produce no Azure test cases even when the env vars are set); load .env before building PROVIDER_CONFIG, or defer expansion (e.g., build the Azure line after env load).

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +34
echo "hello" > "$testdir/hello.txt"
local prompt="Run 'ls' to list files in the current directory."

# Run goose
(
export GOOSE_PROVIDER="$provider"
export GOOSE_MODEL="$model"
cd "$testdir" && "$GOOSE_BIN" run --text "$prompt" --with-builtin "$BUILTINS" 2>&1
) > "$output_file" 2>&1

# Verify: code_execution tool must be called
# Matches: "execute | code_execution", "get_function_details | code_execution",
# "tool call | execute", "tool calls | execute"
if grep -qE "(execute \| code_execution)|(get_function_details \| code_execution)|(tool calls? \| execute)" "$output_file"; then
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code-exec smoke test prompt doesn't explicitly require using the code_execution/execute tool, so models may respond with a textual explanation (or use a different tool) and cause flaky CI failures; make the prompt force a tool call (similar to the normal-mode script’s “Do not ask for confirmation” style) and/or assert on the ls output as well as the tool-call log.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the session only has code_execution tool, so it is fine

Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good refactor!

I do wonder though whether at some point we should split the provider test into something we want to run for all providers and then a longer one that does specific and more complicated things but only for one reliable provider. the more we ask for slightly flaky providers, the more they will be flaky, plus of course it will take longer.

@lifeizhou-ap lifeizhou-ap added this pull request to the merge queue Feb 12, 2026
Merged via the queue into main with commit 0e11d18 Feb 12, 2026
25 checks passed
@lifeizhou-ap lifeizhou-ap deleted the lifei/smoke-test-for-developer branch February 12, 2026 10:22
jh-block added a commit that referenced this pull request Feb 12, 2026
* origin/main: (33 commits)
  fix: replace panic with proper error handling in get_tokenizer (#7175)
  Lifei/smoke test for developer (#7174)
  fix text editor view broken (#7167)
  docs: White label guide (#6857)
  Add PATH detection back to developer extension (#7161)
  docs: pin version in ci/cd (#7168)
  Desktop: - No Custom Headers field for custom OpenAI-compatible providers  (#6681)
  feat: edit model and extensions of a recipe from GUI (#6804)
  feat: MCP support for agentic CLI providers (#6972)
  docs: keyring fallback to secrets.yaml (#7165)
  feat: load provider/model specified inside the recipe config (#6884)
  fix ask-ai bot hitting tool call limits (#7162)
  fix flatpak icon (#7154)
  [docs] Skills Marketplace UI Improvements (#7158)
  More no-window flags (#7122)
  feat: Allow overriding default bat themes using environment variables (#7140)
  Make the system prompt smaller (#6991)
  Pre release script (#7145)
  Spelling (#7137)
  feat(mcp): upgrade rmcp to 0.15.0 and advertise MCP Apps UI extension capability (#6927)
  ...
tlongwell-block added a commit that referenced this pull request Feb 12, 2026
…provenance

* origin/main: (68 commits)
  Upgraded npm packages for latest security updates (#7183)
  docs: reasoning effort levels for Codex provider (#6798)
  Fix speech local (#7181)
  chore: add .gooseignore to .gitignore (#6826)
  Improve error message logging from electron (#7130)
  chore(deps): bump jsonwebtoken from 9.3.1 to 10.3.0 (#6924)
  docs: standalone mcp apps and apps extension (#6791)
  workflow: auto-update cli-commands on release (#6755)
  feat(apps): Integrate AppRenderer from @mcp-ui/client SDK (#7013)
  fix(MCP): decode resource content (#7155)
  feat: reasoning_content in API for reasoning models (#6322)
  Fix/configure add provider custom headers (#7157)
  fix: handle keyring fallback as success (#7177)
  Update process-wrap to 9.0.3 (9.0.2 is yanked) (#7176)
  feat: support extra field in chatcompletion tool_calls for gemini openai compat (#6184)
  fix: replace panic with proper error handling in get_tokenizer (#7175)
  Lifei/smoke test for developer (#7174)
  fix text editor view broken (#7167)
  docs: White label guide (#6857)
  Add PATH detection back to developer extension (#7161)
  ...

# Conflicts:
#	.github/workflows/nightly.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants