From 71fcaf2b4df36d9b8e499cb99fde0b00825bf5ff Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 6 Nov 2025 19:54:08 +0000 Subject: [PATCH] docs: organize dev test scripts and create comprehensive documentation (#550) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit implements the requirements from issue #550 to organize scattered development test scripts and create comprehensive documentation. Changes: 1. Moved test scripts to organized structure: - backend/test_embedding_models.py → backend/dev_tests/manual/ - backend/test_elevenlabs_api.py → backend/dev_tests/manual/ - Fixed import paths in moved scripts (parent.parent.parent) 2. Created comprehensive documentation: - docs/development/dev-test-scripts.md (590+ lines) - Detailed usage examples for all 30+ test scripts - Prerequisites, expected outputs, and use cases - Troubleshooting and best practices 3. Updated existing documentation: - backend/dev_tests/README.md - Added categorized script listings - docs/development/index.md - Added reference to new dev test guide - Added link to comprehensive documentation 4. Updated .gitignore: - Added dev test output patterns (*.json, *.csv, *.wav, etc.) - Excluded output/ and results/ directories - Preserved README.md files 5. Documentation structure maintained: - MASTER_ISSUES_ROADMAP.md already at docs/planning/master-roadmap.md - All references verified and updated All scripts remain functional with corrected import paths. All acceptance criteria from issue #550 completed. Closes #550 --- .gitignore | 13 + backend/dev_tests/README.md | 73 +- .../manual}/test_elevenlabs_api.py | 2 +- .../manual}/test_embedding_models.py | 2 +- docs/development/dev-test-scripts.md | 985 ++++++++++++++++++ docs/development/index.md | 22 + 6 files changed, 1088 insertions(+), 9 deletions(-) rename backend/{ => dev_tests/manual}/test_elevenlabs_api.py (96%) rename backend/{ => dev_tests/manual}/test_embedding_models.py (99%) create mode 100644 docs/development/dev-test-scripts.md diff --git a/.gitignore b/.gitignore index 0901b9e3..66054dd7 100644 --- a/.gitignore +++ b/.gitignore @@ -25,6 +25,19 @@ backend/.venv/ coverage/ *.log +# Dev test script outputs +backend/dev_tests/**/*.json +backend/dev_tests/**/*.csv +backend/dev_tests/**/*.txt +backend/dev_tests/**/*.wav +backend/dev_tests/**/*.mp3 +backend/dev_tests/**/*.pdf +backend/dev_tests/**/output/ +backend/dev_tests/**/results/ +backend/dev_tests/**/__pycache__/ +!backend/dev_tests/README.md +!backend/dev_tests/manual/README.md + # Ignore OS generated files .DS_Store Thumbs.db diff --git a/backend/dev_tests/README.md b/backend/dev_tests/README.md index bbed53f7..4e5c2bb3 100644 --- a/backend/dev_tests/README.md +++ b/backend/dev_tests/README.md @@ -6,23 +6,60 @@ This directory contains development utilities, manual test scripts, and experime ### `/manual/` Manual test scripts for testing specific features or debugging issues: + +**Chain of Thought (CoT) Testing**: - `test_cot_comparison.py` - Compare Chain of Thought vs regular search - `test_cot_llm_integration.py` - Test CoT with LLM provider integration - `test_cot_manual.py` - Manual CoT testing - `test_cot_with_documents.py` - Test CoT with document retrieval - `test_cot_workflow.py` - Complete CoT workflow testing + +**Document Processing**: +- `test_docling_config.py` - Docling configuration testing +- `test_docling_debug.py` - Debug Docling processing issues +- `test_pdf_comparison.py` - Compare PDF parsing strategies + +**Embedding & Retrieval**: +- `test_embedding_direct.py` - Direct embedding API tests +- `test_embedding_models.py` - Test different WatsonX embedding models +- `test_embedding_retrieval.py` - Embedding retrieval validation +- `test_embeddings.py` - Embedding service testing +- `test_embeddings_simple.py` - Simple embedding tests + +**Search & Query**: +- `test_query_enhancement_demo.py` - Query enhancement demonstration - `test_regular_search.py` - Regular search testing +- `test_search_api_direct.py` - Direct search API testing +- `test_search_comparison.py` - Compare search implementations +- `test_search_no_cot.py` - Search without Chain of Thought +- `test_workforce_search.py` - Workforce-specific search testing + +**Conversation Testing**: +- `test_conversation_api_direct.py` - Direct conversation API testing +- `test_conversation_direct_api.py` - Alternative conversation API testing +- `test_conversation_simulation.py` - Multi-turn conversation simulation +- `test_conversation_with_documents.py` - Conversations with document context +- `test_conversation_with_mock_auth.py` - Conversations with mocked auth + +**Pipeline & Configuration**: +- `test_pipeline_quick.py` - Quick pipeline testing +- `test_pipeline_simple.py` - Simple pipeline validation - `test_settings_only.py` - Settings configuration testing +**Audio & Podcasts**: +- `test_elevenlabs_api.py` - ElevenLabs API integration testing +- `test_podcast_script_generation.py` - Podcast script generation + +**Debugging**: +- `debug_rag_failure.py` - Debug RAG pipeline failures +- `compare_search.py` - Compare search implementations + ### `/examples/` Example scripts demonstrating CLI and API usage: - `/cli/` - CLI usage examples and interactive workflows -### `/experiments/` -Experimental code and prototypes: -- Various experimental scripts for testing new features -- Performance testing scripts -- Integration experiments +### Root Scripts +- `test_entity_extraction_demo.py` - Entity extraction demonstration ## Usage @@ -34,16 +71,38 @@ cd backend # Run a manual test python dev_tests/manual/test_cot_comparison.py +# Run embedding model comparison +python dev_tests/manual/test_embedding_models.py + +# Run ElevenLabs API test +python dev_tests/manual/test_elevenlabs_api.py + # Run a CLI example python dev_tests/examples/cli/test_workflow.py -# Run an experiment -python dev_tests/experiments/hello_milvus.py +# Run entity extraction demo +python dev_tests/test_entity_extraction_demo.py ``` +## Prerequisites + +Before running scripts: + +1. **Environment Setup**: Services running (see main README) +2. **Environment Variables**: Configured `.env` file +3. **Dependencies**: Installed via Poetry (`poetry install --with dev,test`) +4. **Test Data**: Collections and documents created as needed + ## Important Notes - These are NOT pytest tests - they are standalone scripts - They may require specific environment variables or running services - They are used for development, debugging, and feature exploration - The official test suite is in the `tests/` directory +- For detailed documentation, see `docs/development/dev-test-scripts.md` + +## Documentation + +For comprehensive documentation including prerequisites, expected outputs, and use cases for each script, see: + +📖 **[Development Test Scripts Documentation](../../docs/development/dev-test-scripts.md)** diff --git a/backend/test_elevenlabs_api.py b/backend/dev_tests/manual/test_elevenlabs_api.py similarity index 96% rename from backend/test_elevenlabs_api.py rename to backend/dev_tests/manual/test_elevenlabs_api.py index 24692d9b..e1995339 100644 --- a/backend/test_elevenlabs_api.py +++ b/backend/dev_tests/manual/test_elevenlabs_api.py @@ -6,7 +6,7 @@ from pathlib import Path # Add backend to path -sys.path.insert(0, str(Path(__file__).parent)) +sys.path.insert(0, str(Path(__file__).parent.parent.parent)) import httpx from dotenv import load_dotenv diff --git a/backend/test_embedding_models.py b/backend/dev_tests/manual/test_embedding_models.py similarity index 99% rename from backend/test_embedding_models.py rename to backend/dev_tests/manual/test_embedding_models.py index 5ac5a361..b94625ca 100644 --- a/backend/test_embedding_models.py +++ b/backend/dev_tests/manual/test_embedding_models.py @@ -9,7 +9,7 @@ from ibm_watsonx_ai.foundation_models import Embeddings # Add backend to path -sys.path.insert(0, str(Path(__file__).parent)) +sys.path.insert(0, str(Path(__file__).parent.parent.parent)) from core.config import get_settings diff --git a/docs/development/dev-test-scripts.md b/docs/development/dev-test-scripts.md new file mode 100644 index 00000000..ce2ca03e --- /dev/null +++ b/docs/development/dev-test-scripts.md @@ -0,0 +1,985 @@ +# Development Test Scripts + +This document provides comprehensive documentation for all development test scripts located in `backend/dev_tests/`. These scripts are designed for manual testing, debugging, and feature exploration during development. + +## Overview + +The `backend/dev_tests/` directory contains standalone test scripts that are **NOT** part of the official pytest test suite. These scripts are used for: + +- Manual testing of specific features +- Debugging production issues +- Performance testing and benchmarking +- Integration testing with external services +- Feature exploration and prototyping + +## Directory Structure + +``` +backend/dev_tests/ +├── README.md # Quick reference guide +├── manual/ # Manual test scripts +│ ├── test_*.py # Individual test scripts +│ └── README.md # Manual tests documentation +├── examples/ # Example scripts +│ └── cli/ # CLI usage examples +└── test_entity_extraction_demo.py # Entity extraction demonstration +``` + +## Prerequisites + +Before running any test scripts, ensure you have: + +1. **Environment Setup**: All required services running (see main README) +2. **Environment Variables**: Properly configured `.env` file +3. **Dependencies**: All Python dependencies installed via Poetry +4. **Data**: Test data or collections created as needed + +### Quick Setup + +```bash +# 1. Install dependencies +poetry install --with dev,test + +# 2. Start infrastructure +make local-dev-infra + +# 3. Configure environment +cp .env.example .env +# Edit .env with your credentials + +# 4. Navigate to backend directory +cd backend +``` + +## Manual Test Scripts + +All manual test scripts are located in `backend/dev_tests/manual/`. Run them from the `backend/` directory. + +### Chain of Thought (CoT) Testing + +#### test_cot_comparison.py +**Purpose**: Compare Chain of Thought vs regular search performance and quality. + +**Prerequisites**: +- Running RAG backend +- Configured collection with documents +- Valid user credentials + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_cot_comparison.py +``` + +**Expected Output**: +- Side-by-side comparison of CoT vs non-CoT results +- Performance metrics (latency, tokens) +- Quality assessment + +**Use Cases**: +- Evaluating CoT benefit for specific queries +- Performance benchmarking +- Debugging CoT reasoning steps + +--- + +#### test_cot_llm_integration.py +**Purpose**: Test Chain of Thought integration with different LLM providers. + +**Prerequisites**: +- Multiple LLM provider credentials (WatsonX, OpenAI, etc.) +- Configured pipeline with LLM settings + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_cot_llm_integration.py +``` + +**Expected Output**: +- CoT reasoning steps for each provider +- Provider-specific performance metrics +- Error handling validation + +--- + +#### test_cot_manual.py +**Purpose**: Manual CoT testing with interactive prompts. + +**Prerequisites**: +- Running backend services +- Test collection created + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_cot_manual.py +``` + +**Expected Output**: +- Interactive query input +- Step-by-step CoT reasoning +- Final synthesized answer + +--- + +#### test_cot_with_documents.py +**Purpose**: Test CoT reasoning with specific document sets. + +**Prerequisites**: +- Collection with known documents +- Document metadata available + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_cot_with_documents.py +``` + +**Expected Output**: +- CoT reasoning across multiple documents +- Document attribution +- Source citations + +--- + +#### test_cot_workflow.py +**Purpose**: Complete end-to-end CoT workflow testing. + +**Prerequisites**: +- Fully configured RAG environment +- Test collection with diverse documents + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_cot_workflow.py +``` + +**Expected Output**: +- Complete workflow execution +- Timing breakdown for each stage +- Quality metrics + +--- + +### Document Processing + +#### test_docling_config.py +**Purpose**: Test Docling document processing configuration. + +**Prerequisites**: +- Docling library installed +- Sample documents (PDF, DOCX, etc.) + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_docling_config.py +``` + +**Expected Output**: +- Docling configuration validation +- Document parsing results +- Extracted text and metadata + +**Use Cases**: +- Validating Docling configuration +- Testing document format support +- Debugging parsing issues + +--- + +#### test_docling_debug.py +**Purpose**: Debug Docling document processing issues. + +**Prerequisites**: +- Problem documents that failed processing +- Docling debug logs enabled + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_docling_debug.py +``` + +**Expected Output**: +- Detailed parsing logs +- Error diagnostics +- Suggested fixes + +--- + +#### test_pdf_comparison.py +**Purpose**: Compare different PDF parsing strategies. + +**Prerequisites**: +- Sample PDF documents +- Multiple parsing libraries installed + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_pdf_comparison.py +``` + +**Expected Output**: +- Parsing results from multiple libraries +- Quality comparison +- Performance metrics + +--- + +### Embedding & Retrieval + +#### test_embedding_direct.py +**Purpose**: Direct embedding API testing without RAG pipeline. + +**Prerequisites**: +- Embedding service configured +- LLM provider credentials (WatsonX, OpenAI, etc.) + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_embedding_direct.py +``` + +**Expected Output**: +- Raw embedding vectors +- Embedding dimensions +- Performance metrics (latency, throughput) + +**Use Cases**: +- Validating embedding model configuration +- Testing different embedding providers +- Benchmarking embedding performance + +--- + +#### test_embedding_retrieval.py +**Purpose**: Test embedding-based document retrieval. + +**Prerequisites**: +- Vector database (Milvus) running +- Collection with embedded documents +- Query examples + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_embedding_retrieval.py +``` + +**Expected Output**: +- Retrieved documents with similarity scores +- Retrieval latency +- Relevance assessment + +**Use Cases**: +- Debugging retrieval quality +- Testing vector similarity thresholds +- Evaluating retrieval performance + +--- + +#### test_embedding_models.py +**Purpose**: Test and compare different WatsonX embedding models. + +**Prerequisites**: +- WatsonX API credentials +- Sample PDF document for testing +- Multiple embedding models configured + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_embedding_models.py +``` + +**Expected Output**: +- Comparison of embedding models +- Maximum supported chunk lengths +- Embedding dimensions for each model +- Recommended model selection + +**Use Cases**: +- Selecting optimal embedding model +- Determining maximum chunk sizes +- Benchmarking embedding performance + +--- + +#### test_embeddings.py / test_embeddings_simple.py +**Purpose**: Simple embedding service testing. + +**Prerequisites**: +- Embedding service running +- Test text samples + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_embeddings.py +# or +python dev_tests/manual/test_embeddings_simple.py +``` + +**Expected Output**: +- Embedding vectors +- Service health check +- Basic performance metrics + +--- + +### Search & Query + +#### test_query_enhancement_demo.py +**Purpose**: Demonstrate query enhancement and rewriting. + +**Prerequisites**: +- Query rewriting service configured +- LLM provider for query enhancement + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_query_enhancement_demo.py +``` + +**Expected Output**: +- Original query +- Enhanced/rewritten query +- Query expansion terms +- Improvement metrics + +**Use Cases**: +- Understanding query enhancement pipeline +- Testing query rewriting strategies +- Evaluating query improvement quality + +--- + +#### test_search_no_cot.py +**Purpose**: Test search without Chain of Thought reasoning. + +**Prerequisites**: +- RAG backend running +- Test collection with documents + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_search_no_cot.py +``` + +**Expected Output**: +- Direct search results (no CoT) +- Response time +- Answer quality + +**Use Cases**: +- Baseline performance measurement +- Comparing CoT vs non-CoT +- Testing fast search path + +--- + +#### test_regular_search.py +**Purpose**: Standard RAG search testing. + +**Prerequisites**: +- Fully configured RAG environment +- Test queries prepared + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_regular_search.py +``` + +**Expected Output**: +- Search results +- Retrieved documents +- Generated answer +- Performance metrics + +--- + +#### test_search_api_direct.py / test_search_comparison.py +**Purpose**: Direct API testing and search comparison. + +**Prerequisites**: +- Backend API running +- Test collections created + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_search_api_direct.py +# or +python dev_tests/manual/test_search_comparison.py +``` + +**Expected Output**: +- API response validation +- Search result comparison +- Performance benchmarks + +--- + +#### test_workforce_search.py +**Purpose**: Test search on workforce-related documents. + +**Prerequisites**: +- Workforce dataset ingested +- Domain-specific test queries + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_workforce_search.py +``` + +**Expected Output**: +- Domain-specific search results +- Answer accuracy assessment + +--- + +### Pipeline & Configuration + +#### test_pipeline_quick.py / test_pipeline_simple.py +**Purpose**: Quick pipeline configuration testing. + +**Prerequisites**: +- Pipeline configured +- Basic test setup + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_pipeline_quick.py +# or +python dev_tests/manual/test_pipeline_simple.py +``` + +**Expected Output**: +- Pipeline validation +- Configuration verification +- Quick smoke test results + +--- + +#### test_settings_only.py +**Purpose**: Test configuration settings loading and validation. + +**Prerequisites**: +- `.env` file configured +- Settings module available + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_settings_only.py +``` + +**Expected Output**: +- Loaded configuration values +- Validation results +- Missing/invalid settings warnings + +**Use Cases**: +- Debugging configuration issues +- Validating environment variables +- Testing settings precedence + +--- + +### Conversation Testing + +#### test_conversation_api_direct.py / test_conversation_direct_api.py +**Purpose**: Direct conversation API testing. + +**Prerequisites**: +- Conversation service running +- User authentication configured + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_conversation_api_direct.py +``` + +**Expected Output**: +- Conversation creation +- Message exchange +- Context retention validation + +--- + +#### test_conversation_simulation.py +**Purpose**: Simulate multi-turn conversation scenarios. + +**Prerequisites**: +- Conversation history enabled +- Test conversation flows defined + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_conversation_simulation.py +``` + +**Expected Output**: +- Multi-turn conversation results +- Context tracking +- Conversation flow validation + +--- + +#### test_conversation_with_documents.py +**Purpose**: Test conversations with document context. + +**Prerequisites**: +- Documents ingested +- Conversation service configured + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_conversation_with_documents.py +``` + +**Expected Output**: +- Conversation with document grounding +- Source attribution +- Context-aware responses + +--- + +#### test_conversation_with_mock_auth.py +**Purpose**: Test conversations with mocked authentication. + +**Prerequisites**: +- Mock authentication configured +- Test users created + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_conversation_with_mock_auth.py +``` + +**Expected Output**: +- Authenticated conversation flow +- User-specific responses +- Permission validation + +--- + +### Audio & Podcasts + +#### test_elevenlabs_api.py +**Purpose**: Verify ElevenLabs API integration for text-to-speech. + +**Prerequisites**: +- ElevenLabs API key configured +- Network access to ElevenLabs API + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_elevenlabs_api.py +``` + +**Expected Output**: +- API key validation +- Available voices list +- Connection test results + +**Use Cases**: +- Validating ElevenLabs API credentials +- Testing voice availability +- Debugging TTS integration + +--- + +#### test_podcast_script_generation.py +**Purpose**: Test AI-powered podcast script generation. + +**Prerequisites**: +- LLM provider configured +- Sample documents for podcast content + +**Usage**: +```bash +cd backend +python dev_tests/manual/test_podcast_script_generation.py +``` + +**Expected Output**: +- Generated podcast script +- Script structure (intro, body, outro) +- Voice cues and timing + +--- + +### Debugging + +#### debug_rag_failure.py +**Purpose**: Debug RAG pipeline failures and errors. + +**Prerequisites**: +- RAG pipeline configured +- Failure scenario reproduced + +**Usage**: +```bash +cd backend +python dev_tests/manual/debug_rag_failure.py +``` + +**Expected Output**: +- Detailed error traces +- Pipeline stage breakdown +- Root cause analysis +- Suggested fixes + +**Use Cases**: +- Investigating production failures +- Understanding pipeline bottlenecks +- Debugging complex RAG issues + +--- + +#### compare_search.py +**Purpose**: Compare different search implementations. + +**Prerequisites**: +- Multiple search implementations available +- Test query set prepared + +**Usage**: +```bash +cd backend +python dev_tests/manual/compare_search.py +``` + +**Expected Output**: +- Side-by-side comparison +- Performance metrics +- Quality assessment + +--- + +## Other Test Scripts + +### test_entity_extraction_demo.py +**Purpose**: Demonstrate entity extraction capabilities. + +**Location**: `backend/dev_tests/test_entity_extraction_demo.py` + +**Prerequisites**: +- Entity extraction service configured +- Sample documents with entities + +**Usage**: +```bash +cd backend +python dev_tests/test_entity_extraction_demo.py +``` + +**Expected Output**: +- Extracted entities (persons, organizations, locations) +- Entity types and confidence scores +- Entity relationships + +--- + +## Common Patterns + +### Running Scripts + +All scripts should be run from the `backend/` directory: + +```bash +cd backend +python dev_tests/manual/.py +``` + +### Environment Variables + +Most scripts require environment variables to be configured. Check `.env.example` for required variables: + +```bash +# Core settings +WATSONX_API_KEY=your_api_key +WATSONX_PROJECT_ID=your_project_id + +# Vector database +VECTOR_DB=milvus +MILVUS_HOST=localhost +MILVUS_PORT=19530 + +# LLM providers +OPENAI_API_KEY=your_openai_key +ANTHROPIC_API_KEY=your_anthropic_key + +# Audio services +ELEVENLABS_API_KEY=your_elevenlabs_key +``` + +### Modifying Scripts + +When modifying test scripts: + +1. Keep imports at the top +2. Use type hints +3. Add docstrings +4. Follow existing patterns +5. Test before committing +6. Update documentation + +### Creating New Test Scripts + +To create a new test script: + +```python +#!/usr/bin/env python3 +"""Brief description of what this script tests.""" + +import sys +from pathlib import Path + +# Add backend to path +sys.path.insert(0, str(Path(__file__).parent.parent)) + +from core.config import get_settings + +def main(): + """Main test function.""" + settings = get_settings() + + # Your test code here + print("Running test...") + +if __name__ == "__main__": + main() +``` + +## Troubleshooting + +### Common Issues + +#### Import Errors +**Problem**: `ModuleNotFoundError: No module named 'rag_solution'` + +**Solution**: Ensure you're running from the `backend/` directory and Python path is set correctly: +```bash +cd backend +python -c "import sys; print(sys.path)" +``` + +#### Environment Variables Not Loaded +**Problem**: Scripts can't find API keys or configuration + +**Solution**: Verify `.env` file exists and is properly formatted: +```bash +cat .env | grep API_KEY +``` + +#### Service Connection Failures +**Problem**: Can't connect to Milvus, PostgreSQL, etc. + +**Solution**: Verify services are running: +```bash +make local-dev-status +docker compose ps +``` + +#### Permission Errors +**Problem**: Can't read files or write outputs + +**Solution**: Check file permissions: +```bash +ls -la backend/dev_tests/manual/ +chmod +x backend/dev_tests/manual/test_*.py +``` + +## Best Practices + +### When to Use Test Scripts + +- **Manual Testing**: Validating features before writing pytest tests +- **Debugging**: Investigating production issues with simplified setups +- **Performance**: Benchmarking specific components +- **Integration**: Testing external service integrations +- **Exploration**: Trying new features or libraries + +### When NOT to Use Test Scripts + +- **CI/CD**: Use pytest tests in `tests/` directory instead +- **Automated Testing**: Write proper pytest tests with fixtures +- **Production**: Never run test scripts in production environments +- **Monitoring**: Use proper monitoring tools, not test scripts + +### Script Maintenance + +- **Review Quarterly**: Check if scripts are still relevant +- **Update Documentation**: Keep this guide current +- **Remove Obsolete Scripts**: Delete scripts for deprecated features +- **Keep Scripts Simple**: One purpose per script +- **Use Poetry**: Don't install additional dependencies outside Poetry + +## Performance Benchmarking + +### Performance Test Scripts + +Several scripts include performance metrics: + +- `test_cot_comparison.py` - CoT vs non-CoT performance +- `test_search_comparison.py` - Different search strategies +- `test_embedding_models.py` - Embedding model performance +- `test_pdf_comparison.py` - PDF parsing performance + +### Interpreting Results + +When benchmarking, consider: + +1. **Warm-up**: First run may be slower (model loading, cache warming) +2. **Consistency**: Run multiple times and calculate averages +3. **Environment**: Local vs CI vs production performance differs +4. **Concurrency**: Single-threaded tests don't reflect production load + +### Example Benchmark Output + +``` +Test: CoT vs Non-CoT Search +Query: "What was IBM's revenue in 2020?" + +Non-CoT Search: + Latency: 8.2s + Tokens: 450 + Quality: 7/10 + +CoT Search: + Latency: 22.5s + Tokens: 1,250 + Quality: 9/10 + +Verdict: CoT provides 28% quality improvement at 2.7x latency cost +``` + +## Integration with Main Test Suite + +These development test scripts complement the main pytest test suite: + +| Test Type | Location | Purpose | CI/CD | +|-----------|----------|---------|-------| +| Unit Tests | `tests/unit/` | Fast, isolated tests | ✅ Always | +| Integration Tests | `tests/integration/` | Service interactions | ✅ Always | +| E2E Tests | `tests/e2e/` | Full system tests | ✅ On merge | +| Dev Tests | `backend/dev_tests/` | Manual exploration | ❌ Manual only | + +## Related Documentation + +- **Main README**: `README.md` - Project overview and setup +- **Testing Guide**: `docs/testing/index.md` - Comprehensive testing documentation +- **CLI Guide**: `docs/cli/index.md` - Command-line interface usage +- **Development Guide**: `docs/development/workflow.md` - Development process +- **API Documentation**: `docs/api/index.md` - API reference + +## Contributing + +When adding new test scripts: + +1. **Choose the right location**: + - Manual testing → `backend/dev_tests/manual/` + - Examples → `backend/dev_tests/examples/` + - Experiments → `experiments/` (for prototypes) + +2. **Follow naming conventions**: + - `test__.py` - e.g., `test_cot_comparison.py` + - Use descriptive names + - No spaces or special characters + +3. **Document your script**: + - Add docstring at top + - Include prerequisites + - Document expected output + - Add to this documentation + +4. **Keep it simple**: + - One primary purpose per script + - Clear, readable code + - Minimal dependencies + - Easy to modify + +5. **Make it reproducible**: + - Use environment variables for config + - Include example outputs in comments + - Document any required test data + +## Getting Help + +If you encounter issues with test scripts: + +1. **Check this documentation** - Most common issues are covered +2. **Review script docstrings** - Scripts have inline documentation +3. **Check main documentation** - `docs/` has comprehensive guides +4. **Ask the team** - Slack/Teams channels for questions +5. **Create an issue** - GitHub issues for bugs or improvements + +## Appendix + +### Quick Reference + +```bash +# Start infrastructure +make local-dev-infra + +# Run a test script +cd backend +python dev_tests/manual/test_.py + +# Check services +make local-dev-status + +# View logs +make local-dev-logs + +# Stop services +make local-dev-stop +``` + +### Environment Setup Checklist + +- [ ] Poetry dependencies installed +- [ ] `.env` file configured +- [ ] Infrastructure services running +- [ ] Test collections created +- [ ] LLM provider credentials valid +- [ ] Vector database accessible + +### Common Commands + +```bash +# Install dependencies +poetry install --with dev,test + +# Format code +poetry run ruff format backend/dev_tests/ + +# Lint code +poetry run ruff check backend/dev_tests/ + +# Type check +poetry run mypy backend/dev_tests/ +``` + +--- + +**Document Version**: 1.0 +**Last Updated**: 2025-11-06 +**Maintained By**: RAG Development Team diff --git a/docs/development/index.md b/docs/development/index.md index 192293c0..5a783f19 100644 --- a/docs/development/index.md +++ b/docs/development/index.md @@ -156,6 +156,28 @@ Tests are configured with: - **Mocking**: Isolated testing - **Fixtures**: Reusable test data +### Development Test Scripts + +In addition to the automated test suite, RAG Modulo includes manual development test scripts for debugging, feature exploration, and performance testing. These scripts are located in `backend/dev_tests/` and are NOT part of the CI/CD pipeline. + +For comprehensive documentation on all available development test scripts, including usage examples, prerequisites, and expected outputs, see: + +📖 **[Development Test Scripts Guide](dev-test-scripts.md)** + +Quick examples: +```bash +cd backend + +# Test Chain of Thought reasoning +python dev_tests/manual/test_cot_comparison.py + +# Test embedding models +python dev_tests/manual/test_embedding_models.py + +# Debug RAG failures +python dev_tests/manual/debug_rag_failure.py +``` + ## Development Workflow ### Daily Development