Skip to content

chore(tools): POC to consolidate immutable ddl tools while preserving the accuracy #354

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 54 commits into
base: chore/issue-307-proposal-2
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
ad1b340
chore: LangChain based accuracy tests
himanshusinghs Jun 28, 2025
2cc3323
chore: use vercel AI SDK instead of langchain
himanshusinghs Jun 30, 2025
75fccd7
chore: integrate capturing accuracy snapshots
himanshusinghs Jun 30, 2025
a4f0246
chore: correct env names
himanshusinghs Jun 30, 2025
38f1fa4
chore: more consolidated prompt tests
himanshusinghs Jun 30, 2025
4fc7322
chore: add a few more tests and some more models
himanshusinghs Jun 30, 2025
86ca1e0
chore: add AzureOpenAI model in the model list
himanshusinghs Jul 1, 2025
7225e6e
chore: use ListDatabasesTool response creator for tests
himanshusinghs Jul 1, 2025
908622f
chore: use ListCollectionsTool response creators in tests
himanshusinghs Jul 1, 2025
7f22de4
chore: tests for collection-indexes tool
himanshusinghs Jul 1, 2025
57e5c76
modify prompt for list-collections prompt and log tools provided
himanshusinghs Jul 1, 2025
579739e
chore: have mock generators return Promise of ToolResult as well
himanshusinghs Jul 1, 2025
02ed178
chore: tests for collection-schema tool
himanshusinghs Jul 1, 2025
4658a35
chore: do not fail tests on dropped accuracy
himanshusinghs Jul 1, 2025
77f0777
chore: added tests for find tool
himanshusinghs Jul 1, 2025
7bf63d8
chore: tests for insert-many tool
himanshusinghs Jul 3, 2025
3f4ab89
chore: tests for delete-many tool
himanshusinghs Jul 3, 2025
3c80f85
chore: add oepnai provider
himanshusinghs Jul 3, 2025
874a85f
chore: fixes accuracy scorer for position independent matching
himanshusinghs Jul 4, 2025
46997f7
chore: replace mock mcp client with real (mockable) mcp client
himanshusinghs Jul 4, 2025
9b2f461
chore: moved all existing tests to vercel mcp client
himanshusinghs Jul 6, 2025
ffbf57c
chore: adds tests for the rest of the tools
himanshusinghs Jul 7, 2025
d523545
chore: adds missed out tests for tools
himanshusinghs Jul 7, 2025
165620b
chore: MongoDB based snapshot storage for accuracy runs
himanshusinghs Jul 8, 2025
90dd193
chore: remove file based snapshot
himanshusinghs Jul 8, 2025
ec41a4e
wip: snapshot summary generator
himanshusinghs Jul 8, 2025
dc2b45c
chore: single entry point for running accuracy tests with different c…
himanshusinghs Jul 8, 2025
fa9cd94
chore: reformat
himanshusinghs Jul 8, 2025
2ff8130
chore: lint fixes
himanshusinghs Jul 8, 2025
ac9b6d0
chore: simplified toolCallingAccuracy calculation
himanshusinghs Jul 8, 2025
15f264b
chore: account for types moved around
himanshusinghs Jul 8, 2025
9c7abca
chore: adds accuracyRunStatus to snapshot entries
himanshusinghs Jul 8, 2025
9a2eb52
chore: add disk based accuracy storage for local runs
himanshusinghs Jul 8, 2025
2d75862
chore: revert changes done to any of the src files
himanshusinghs Jul 8, 2025
4067264
chore: handle test failures and appropriately mark them as failed in …
himanshusinghs Jul 8, 2025
e66b1e1
chore: make snapshot storage independent of accuracyRunId and commitSHA
himanshusinghs Jul 9, 2025
baf18da
chore: bail on first failure and add some explanation for update-accu…
himanshusinghs Jul 9, 2025
7258fb7
chore: refactor to make tests writing simpler and other QOL improveme…
himanshusinghs Jul 9, 2025
f89701a
chore: generate accuracy test summary post test
himanshusinghs Jul 10, 2025
90ca213
chore: add Github workflow to trigger test runs
himanshusinghs Jul 10, 2025
df31977
chore: fix permissions issue
himanshusinghs Jul 10, 2025
1f9083a
chore: bring back packages post merge
himanshusinghs Jul 10, 2025
16d87b8
chore: update report generation to include comparison with baseline a…
himanshusinghs Jul 10, 2025
ba49e17
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
ac4ea1b
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
848e771
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
68d4cb0
Update .github/workflows/accuracy-tests.yml
himanshusinghs Jul 11, 2025
c04ed9a
chore: secrets as per conventions
himanshusinghs Jul 11, 2025
802d8a8
chore: updated how we store accuracy result
himanshusinghs Jul 13, 2025
1cc93f2
chore: move accuracy scripts inside accuracy
himanshusinghs Jul 13, 2025
666e20c
chore: addresses more PR feedback
himanshusinghs Jul 13, 2025
5ff622f
chore: use @ai-sdk/google
himanshusinghs Jul 13, 2025
919b26f
chore: quick consolidation for ddl tools
himanshusinghs Jul 10, 2025
be45e70
chore: schema def mods for better LLM responses
himanshusinghs Jul 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions .github/workflows/accuracy-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Accuracy Tests

on:
workflow_dispatch:
push:
branches:
- main
pull_request:
types:
- labeled

jobs:
run-accuracy-tests:
name: Run Accuracy Tests
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
if: |
github.event_name == 'workflow_dispatch' ||
(github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests')
env:
MDB_OPEN_AI_API_KEY: ${{ secrets.ACCURACY_OPEN_AI_API_KEY }}
MDB_GEMINI_API_KEY: ${{ secrets.ACCURACY_GEMINI_API_KEY }}
MDB_AZURE_OPEN_AI_API_KEY: ${{ secrets.ACCURACY_AZURE_OPEN_AI_API_KEY }}
MDB_AZURE_OPEN_AI_API_URL: ${{ secrets.ACCURACY_AZURE_OPEN_AI_API_URL }}
MDB_ACCURACY_MDB_URL: ${{ secrets.ACCURACY_MDB_CONNECTION_STRING }}
MDB_ACCURACY_BASELINE_COMMIT: ${{ github.event.pull_request.base.sha || '' }}
steps:
- uses: GitHubSecurityLab/actions-permissions/monitor@v1
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version-file: package.json
cache: "npm"
- name: Install dependencies
run: npm ci
- name: Run accuracy tests
run: ./scripts/run-accuracy-tests.sh
- name: Upload accuracy test summary
if: always()
uses: actions/upload-artifact@v4
with:
name: accuracy-test-summary
path: .accuracy/tests-summary.html
- name: Comment summary on PR
if: github.event_name == 'pull_request' && github.event.label.name == 'accuracy-tests'
uses: marocchino/sticky-pull-request-comment@d2ad0de260ae8b0235ce059e63f2949ba9e05943 # v2
with:
path: .accuracy/tests-summary.html
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ state.json

tests/tmp
coverage
# Generated assets by accuracy runs
.accuracy
Loading
Loading