Skip to content

Phase 1: Complete Advanced RAG - Reranking, Compression & Query Enhancement #257

@manavgup

Description

@manavgup

Phase 1: Complete Advanced RAG - Reranking, Compression & Query Enhancement

Parent Epic: #256 - RAG Modulo Evolution - Naive → Advanced → Modular RAG Architecture

Timeline: 4-6 weeks
Priority: High
Complexity: Medium

Overview

Phase 1 focuses on completing the Advanced RAG implementation by adding critical post-retrieval components (reranking, compression) and enhancing query processing capabilities. These are high-impact, relatively quick wins that will immediately improve retrieval precision and answer quality.

Current State Analysis

Existing Implementation

Retrieval Pipeline (backend/rag_solution/retrieval/retriever.py):

  • VectorRetriever: Basic vector similarity search
  • KeywordRetriever: TF-IDF based keyword search
  • HybridRetriever: Combines vector + keyword with simple scoring
  • ❌ No reranking after retrieval
  • ❌ No chunk filtering or compression

Query Processing (backend/rag_solution/query_rewriting/query_rewriter.py):

  • SimpleQueryRewriter: Basic query expansion
  • HypotheticalDocumentEmbedding: HyDE implementation
  • ⚠️ Limited to single query, no multi-query generation
  • ⚠️ No semantic query understanding

Chunking (backend/rag_solution/data_ingestion/chunking.py):

  • ✅ Simple chunking with overlap
  • ✅ Semantic chunking with embeddings
  • ✅ Token-based chunking
  • ❌ No hierarchical chunking (sections → paragraphs)
  • ❌ No structure preservation (headings, lists)

Search Service (backend/rag_solution/services/search_service.py):

  • ✅ Orchestrates retrieval → generation
  • ✅ Integrates with CoT for complex queries
  • ❌ No post-retrieval processing (reranking, filtering)
  • ❌ Context assembly is basic (no compression)

Goals and Success Criteria

Goals

  1. Implement cross-encoder reranking for top-k results
  2. Add chunk compression to reduce context window usage
  3. Enhance query processing with multi-query generation
  4. Implement hierarchical chunking for better structure preservation
  5. Improve hybrid retrieval with reciprocal rank fusion

Success Criteria

Quantitative Metrics:

  • Retrieval Precision@10: Increase from baseline to >0.80 (+20-30%)
  • Context Relevance: >85% of retrieved chunks relevant to query
  • Context Compression: Reduce tokens by 30-40% without quality loss
  • Query Diversity: Generate 3-5 diverse query reformulations
  • Latency: Reranking adds <500ms per query

Testing Requirements:

  • Unit tests for each new component (>90% coverage)
  • Integration tests with SearchService pipeline
  • Performance benchmarks on test corpus
  • A/B comparison with current implementation

Implementation Details

See full implementation specification in the issue description above.

New Files to Create

  1. backend/rag_solution/retrieval/reranker.py (~300 lines)

    • BaseReranker abstract class
    • CrossEncoderReranker using sentence-transformers
    • CohereReranker using Cohere API
    • RerankFactory for instantiation
  2. backend/rag_solution/retrieval/compression.py (~200 lines)

    • ContextCompressor class
    • Relevance filtering, deduplication, token limiting
  3. Test files (~750 lines total):

    • backend/tests/unit/test_reranker.py
    • backend/tests/unit/test_compression.py
    • backend/tests/unit/test_multi_query.py
    • backend/tests/unit/test_hierarchical_chunking.py
    • backend/tests/unit/test_hybrid_rrf.py
    • backend/tests/integration/test_reranking_integration.py
    • backend/tests/integration/test_phase1_pipeline.py
    • backend/tests/performance/test_phase1_benchmarks.py

Files to Modify

  1. backend/core/config.py - Add Phase 1 settings:

    • Reranking: enable_reranking, reranker_type, rerank_model, rerank_top_k
    • Compression: enable_compression, compression_relevance_threshold, compression_max_chunks, compression_max_tokens
    • Multi-query: enable_multi_query, multi_query_count
    • Hierarchical: parent_chunk_size, child_chunk_size
  2. backend/rag_solution/services/search_service.py - Integrate all Phase 1 components:

    • Add reranker property and integration
    • Add compressor property and integration
    • Add multi_query_generator and result merging
    • Update search() method with new pipeline steps
  3. backend/rag_solution/query_rewriting/query_rewriter.py - Add MultiQueryGenerator class

  4. backend/rag_solution/data_ingestion/chunking.py - Add hierarchical_chunking function

  5. backend/rag_solution/data_ingestion/pdf_processor.py - Support hierarchical chunks

  6. backend/rag_solution/retrieval/retriever.py - Enhance HybridRetriever with RRF

  7. backend/rag_solution/schemas/pipeline_schema.py - Add HybridConfig

  8. backend/pyproject.toml - Add dependencies:

    sentence-transformers = "^2.2.2"
    cohere = {version = "^5.0.0", optional = true}

Implementation Timeline

Week 1-2: Post-Retrieval Enhancement

  • Implement reranker module with CrossEncoder and Cohere support
  • Implement compression module with filtering, deduplication, token limiting
  • Add settings configuration
  • Write unit tests
  • Integrate with SearchService behind feature flags

Week 3-4: Enhanced Query Processing

  • Implement MultiQueryGenerator class
  • Add hierarchical chunking function
  • Update PDF processor for hierarchical support
  • Write unit tests
  • Integration testing

Week 5-6: Hybrid Retrieval & Testing

  • Implement Reciprocal Rank Fusion for HybridRetriever
  • Complete integration tests
  • Run performance benchmarks
  • A/B testing on production data
  • Documentation and rollout preparation

Testing Strategy

Unit Tests (>90% coverage)

  • Test each component in isolation
  • Mock external dependencies (LLM calls, embeddings)
  • Edge cases: empty inputs, errors, fallbacks

Integration Tests

  • Full pipeline test with all Phase 1 features enabled
  • Test feature flags (enable/disable each component)
  • Test component interactions (rerank → compress → generate)

Performance Benchmarks

  • Reranking latency < 500ms
  • Retrieval Precision@10 > 0.80
  • Compression achieves 30-40% token reduction
  • Multi-query generates 3-5 diverse queries

A/B Testing

  • Compare Phase 1 pipeline vs baseline
  • Measure improvements in retrieval quality
  • Track latency increases

Rollout Plan

  1. Week 1-2: Foundation (reranker, compression)
  2. Week 3-4: Query enhancement (multi-query, hierarchical chunking)
  3. Week 5-6: Refinement, testing, and staged rollout
  4. Week 6: Production release (10% → 50% → 100%)

Acceptance Criteria

  • All unit tests passing (>90% coverage)
  • All integration tests passing
  • Performance benchmarks meet targets
  • A/B testing shows 20-30% improvement in retrieval precision
  • Documentation complete
  • No regression in existing functionality

Related Issues

Next Phase

After Phase 1 completion, proceed to:

  • Phase 2: Early Modular RAG - Routing & Orchestration (#TBD)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinfrastructureInfrastructure and deployment

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions