feat: Add CLI-based Search Quality Testing Framework #144
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements a comprehensive CLI tool for testing and diagnosing RAG search quality as specified in #131.
Changes
Features
🔧 CLI Commands
search test- Test single queries with detailed metrics and verbose outputsearch batch-test- Run quality tests on multiple queries with aggregated reportingsearch test-components- Test individual RAG pipeline components for debugging📊 Quality Metrics
🎯 Additional Features
Files Added/Modified
backend/cli/- Core CLI implementation modulesbackend/search_cli.py- Entry point scriptbackend/test_data/search_queries.json- Sample test queriesbackend/tests/cli/- Unit tests for CLI functionalityMakefile- Added search-test, search-batch, search-components targetsTesting
✅ CLI help commands validated
✅ Utility functions tested (metrics calculation, quality evaluation)
✅ Makefile integration verified
✅ Lazy loading prevents configuration errors for help commands
Usage Examples
Documentation
Comprehensive documentation added in
backend/cli/README.mdwith:Closes #131