-
Couldn't load subscription status.
- Fork 3
Description
Phase 1: Complete Advanced RAG - Reranking, Compression & Query Enhancement
Parent Epic: #256 - RAG Modulo Evolution - Naive → Advanced → Modular RAG Architecture
Timeline: 4-6 weeks
Priority: High
Complexity: Medium
Overview
Phase 1 focuses on completing the Advanced RAG implementation by adding critical post-retrieval components (reranking, compression) and enhancing query processing capabilities. These are high-impact, relatively quick wins that will immediately improve retrieval precision and answer quality.
Current State Analysis
Existing Implementation
Retrieval Pipeline (backend/rag_solution/retrieval/retriever.py):
- ✅
VectorRetriever: Basic vector similarity search - ✅
KeywordRetriever: TF-IDF based keyword search - ✅
HybridRetriever: Combines vector + keyword with simple scoring - ❌ No reranking after retrieval
- ❌ No chunk filtering or compression
Query Processing (backend/rag_solution/query_rewriting/query_rewriter.py):
- ✅
SimpleQueryRewriter: Basic query expansion - ✅
HypotheticalDocumentEmbedding: HyDE implementation ⚠️ Limited to single query, no multi-query generation⚠️ No semantic query understanding
Chunking (backend/rag_solution/data_ingestion/chunking.py):
- ✅ Simple chunking with overlap
- ✅ Semantic chunking with embeddings
- ✅ Token-based chunking
- ❌ No hierarchical chunking (sections → paragraphs)
- ❌ No structure preservation (headings, lists)
Search Service (backend/rag_solution/services/search_service.py):
- ✅ Orchestrates retrieval → generation
- ✅ Integrates with CoT for complex queries
- ❌ No post-retrieval processing (reranking, filtering)
- ❌ Context assembly is basic (no compression)
Goals and Success Criteria
Goals
- Implement cross-encoder reranking for top-k results
- Add chunk compression to reduce context window usage
- Enhance query processing with multi-query generation
- Implement hierarchical chunking for better structure preservation
- Improve hybrid retrieval with reciprocal rank fusion
Success Criteria
Quantitative Metrics:
- Retrieval Precision@10: Increase from baseline to >0.80 (+20-30%)
- Context Relevance: >85% of retrieved chunks relevant to query
- Context Compression: Reduce tokens by 30-40% without quality loss
- Query Diversity: Generate 3-5 diverse query reformulations
- Latency: Reranking adds <500ms per query
Testing Requirements:
- Unit tests for each new component (>90% coverage)
- Integration tests with SearchService pipeline
- Performance benchmarks on test corpus
- A/B comparison with current implementation
Implementation Details
See full implementation specification in the issue description above.
New Files to Create
-
backend/rag_solution/retrieval/reranker.py(~300 lines)- BaseReranker abstract class
- CrossEncoderReranker using sentence-transformers
- CohereReranker using Cohere API
- RerankFactory for instantiation
-
backend/rag_solution/retrieval/compression.py(~200 lines)- ContextCompressor class
- Relevance filtering, deduplication, token limiting
-
Test files (~750 lines total):
backend/tests/unit/test_reranker.pybackend/tests/unit/test_compression.pybackend/tests/unit/test_multi_query.pybackend/tests/unit/test_hierarchical_chunking.pybackend/tests/unit/test_hybrid_rrf.pybackend/tests/integration/test_reranking_integration.pybackend/tests/integration/test_phase1_pipeline.pybackend/tests/performance/test_phase1_benchmarks.py
Files to Modify
-
backend/core/config.py- Add Phase 1 settings:- Reranking: enable_reranking, reranker_type, rerank_model, rerank_top_k
- Compression: enable_compression, compression_relevance_threshold, compression_max_chunks, compression_max_tokens
- Multi-query: enable_multi_query, multi_query_count
- Hierarchical: parent_chunk_size, child_chunk_size
-
backend/rag_solution/services/search_service.py- Integrate all Phase 1 components:- Add reranker property and integration
- Add compressor property and integration
- Add multi_query_generator and result merging
- Update search() method with new pipeline steps
-
backend/rag_solution/query_rewriting/query_rewriter.py- Add MultiQueryGenerator class -
backend/rag_solution/data_ingestion/chunking.py- Add hierarchical_chunking function -
backend/rag_solution/data_ingestion/pdf_processor.py- Support hierarchical chunks -
backend/rag_solution/retrieval/retriever.py- Enhance HybridRetriever with RRF -
backend/rag_solution/schemas/pipeline_schema.py- Add HybridConfig -
backend/pyproject.toml- Add dependencies:sentence-transformers = "^2.2.2" cohere = {version = "^5.0.0", optional = true}
Implementation Timeline
Week 1-2: Post-Retrieval Enhancement
- Implement reranker module with CrossEncoder and Cohere support
- Implement compression module with filtering, deduplication, token limiting
- Add settings configuration
- Write unit tests
- Integrate with SearchService behind feature flags
Week 3-4: Enhanced Query Processing
- Implement MultiQueryGenerator class
- Add hierarchical chunking function
- Update PDF processor for hierarchical support
- Write unit tests
- Integration testing
Week 5-6: Hybrid Retrieval & Testing
- Implement Reciprocal Rank Fusion for HybridRetriever
- Complete integration tests
- Run performance benchmarks
- A/B testing on production data
- Documentation and rollout preparation
Testing Strategy
Unit Tests (>90% coverage)
- Test each component in isolation
- Mock external dependencies (LLM calls, embeddings)
- Edge cases: empty inputs, errors, fallbacks
Integration Tests
- Full pipeline test with all Phase 1 features enabled
- Test feature flags (enable/disable each component)
- Test component interactions (rerank → compress → generate)
Performance Benchmarks
- Reranking latency < 500ms
- Retrieval Precision@10 > 0.80
- Compression achieves 30-40% token reduction
- Multi-query generates 3-5 diverse queries
A/B Testing
- Compare Phase 1 pipeline vs baseline
- Measure improvements in retrieval quality
- Track latency increases
Rollout Plan
- Week 1-2: Foundation (reranker, compression)
- Week 3-4: Query enhancement (multi-query, hierarchical chunking)
- Week 5-6: Refinement, testing, and staged rollout
- Week 6: Production release (10% → 50% → 100%)
Acceptance Criteria
- All unit tests passing (>90% coverage)
- All integration tests passing
- Performance benchmarks meet targets
- A/B testing shows 20-30% improvement in retrieval precision
- Documentation complete
- No regression in existing functionality
Related Issues
- Parent: Epic: RAG Modulo Evolution - Naive → Advanced → Modular RAG Architecture #256 - RAG Modulo Evolution Epic
- Dependency: Enhancement: Integrate IBM Docling for Advanced Document Processing #255 - Docling Integration (optional)
- Related: 🧠 Implement Chain of Thought (CoT) Reasoning for Enhanced RAG Search Quality #136 - Chain of Thought Implementation
- Related: Improve Pipeline Association Architecture for Better UX and Flexibility #222 - Automatic Pipeline Resolution
Next Phase
After Phase 1 completion, proceed to:
- Phase 2: Early Modular RAG - Routing & Orchestration (#TBD)