Skip to content

More providers for testing#6849

Merged
jamadeo merged 4 commits intomainfrom
all-providers-test
Feb 3, 2026
Merged

More providers for testing#6849
jamadeo merged 4 commits intomainfrom
all-providers-test

Conversation

@jamadeo
Copy link
Collaborator

@jamadeo jamadeo commented Jan 30, 2026

Also run them in parallel

Copilot AI review requested due to automatic review settings January 30, 2026 21:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the set of providers covered by scripts/test_providers.sh and changes the test harness to run provider/model combinations in parallel for faster feedback. It also adds conditional inclusion of many providers based on environment variables or installed CLIs.

Changes:

  • Extend the PROVIDERS list and add conditional blocks for Databricks, Azure OpenAI, AWS Bedrock, GCP Vertex AI, Snowflake, Venice, LiteLLM, Ollama, SageMaker TGI, GitHub Copilot, ChatGPT Codex, and several CLI-based providers.
  • Refactor the test runner to build a job list, execute tests in parallel with a configurable MAX_PARALLEL, and collect results from per-job temporary directories.
  • Preserve and adapt the existing success/allowed-failure logic to work with the new parallel execution model.

Comment on lines +294 to +295
meta_file="$RESULTS_DIR/meta_$idx"
echo "$provider|$model" > "$meta_file"
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta_file and the corresponding write to $RESULTS_DIR/meta_$idx are never read anywhere, so this extra file creation is dead code and can be removed to simplify the parallel runner loop.

Suggested change
meta_file="$RESULTS_DIR/meta_$idx"
echo "$provider|$model" > "$meta_file"

Copilot uses AI. Check for mistakes.
@jamadeo
Copy link
Collaborator Author

jamadeo commented Feb 2, 2026

As of now, this is what's added but not run:

⚠️  Skipping Databricks tests (DATABRICKS_HOST and DATABRICKS_TOKEN required)
⚠️  Skipping Azure OpenAI tests (AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT_NAME required)
⚠️  Skipping AWS Bedrock tests (AWS_REGION and AWS_PROFILE or AWS credentials required)
⚠️  Skipping GCP Vertex AI tests (GCP_PROJECT_ID required)
⚠️  Skipping Snowflake tests (SNOWFLAKE_HOST and SNOWFLAKE_TOKEN required)
⚠️  Skipping Venice tests (VENICE_API_KEY required)
⚠️  Skipping LiteLLM tests (LITELLM_API_KEY required)
⚠️  Skipping Ollama tests (OLLAMA_HOST required or ollama must be installed)
⚠️  Skipping SageMaker TGI tests (SAGEMAKER_ENDPOINT_NAME and AWS_REGION required)
⚠️  Skipping GitHub Copilot tests (OAuth setup required - run 'goose configure' first)
⚠️  Skipping ChatGPT Codex tests (OAuth setup required - run 'goose configure' first)
⚠️  Skipping Claude Code CLI tests ('claude' CLI tool required)
⚠️  Skipping Codex CLI tests ('codex' CLI tool required)
⚠️  Skipping Gemini CLI tests ('gemini' CLI tool required)
⚠️  Skipping Cursor Agent tests ('cursor-agent' CLI tool required)

echo ""

# Run first test sequentially if any jobs exist
if [ $total_jobs -gt 0 ]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running one alone before the rest concurrently is kind of silly, but solves two small pain points. If you immediately start N concurrent sessions, then:

  • locally, you'll get the keychain password prompt N times, even if you click "always allow"
  • in CI, they all try to create the sqlite database at the same time, and that leads to things breaking. Maybe we should fix that but it also seems highly unlikely to happen in the real world

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also useful since most likely if you break something it is broken for all providers. this way it tells you that immediately maybe

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the faster inner loop - I think is reasonable

Copy link
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - proof of the pudding is in the eating, lets get this in.

Also note the time diff:

Image

Code mode helping a lot?

@jamadeo
Copy link
Collaborator Author

jamadeo commented Feb 3, 2026

Code mode helping a lot?

I think the main difference there is that the other one also runs a compaction test which takes 4-5 min right now. Maybe that should be split out of sequence

@jamadeo jamadeo added this pull request to the merge queue Feb 3, 2026
Merged via the queue into main with commit eae5a47 Feb 3, 2026
18 checks passed
@jamadeo jamadeo deleted the all-providers-test branch February 3, 2026 16:27
stebbins pushed a commit to stebbins/goose that referenced this pull request Feb 4, 2026
Signed-off-by: Harrison <hcstebbins@gmail.com>
kuccello pushed a commit to kuccello/goose that referenced this pull request Feb 7, 2026
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Tyler-Hardin pushed a commit to Tyler-Hardin/goose that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants