Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands the set of providers covered by scripts/test_providers.sh and changes the test harness to run provider/model combinations in parallel for faster feedback. It also adds conditional inclusion of many providers based on environment variables or installed CLIs.
Changes:
- Extend the
PROVIDERSlist and add conditional blocks for Databricks, Azure OpenAI, AWS Bedrock, GCP Vertex AI, Snowflake, Venice, LiteLLM, Ollama, SageMaker TGI, GitHub Copilot, ChatGPT Codex, and several CLI-based providers. - Refactor the test runner to build a job list, execute tests in parallel with a configurable
MAX_PARALLEL, and collect results from per-job temporary directories. - Preserve and adapt the existing success/allowed-failure logic to work with the new parallel execution model.
| meta_file="$RESULTS_DIR/meta_$idx" | ||
| echo "$provider|$model" > "$meta_file" |
There was a problem hiding this comment.
meta_file and the corresponding write to $RESULTS_DIR/meta_$idx are never read anywhere, so this extra file creation is dead code and can be removed to simplify the parallel runner loop.
| meta_file="$RESULTS_DIR/meta_$idx" | |
| echo "$provider|$model" > "$meta_file" |
|
As of now, this is what's added but not run: |
| echo "" | ||
|
|
||
| # Run first test sequentially if any jobs exist | ||
| if [ $total_jobs -gt 0 ]; then |
There was a problem hiding this comment.
Running one alone before the rest concurrently is kind of silly, but solves two small pain points. If you immediately start N concurrent sessions, then:
- locally, you'll get the keychain password prompt N times, even if you click "always allow"
- in CI, they all try to create the sqlite database at the same time, and that leads to things breaking. Maybe we should fix that but it also seems highly unlikely to happen in the real world
There was a problem hiding this comment.
it's also useful since most likely if you break something it is broken for all providers. this way it tells you that immediately maybe
There was a problem hiding this comment.
yeah the faster inner loop - I think is reasonable
I think the main difference there is that the other one also runs a compaction test which takes 4-5 min right now. Maybe that should be split out of sequence |
Signed-off-by: Harrison <hcstebbins@gmail.com>

Also run them in parallel