-
Couldn't load subscription status.
- Fork 3
Description
Epic: RAG Modulo Evolution - Naive → Advanced → Modular RAG Architecture
Executive Summary
This epic tracks the evolution of RAG Modulo from its current state (70% Advanced RAG, 30% Modular RAG) to a fully-featured Modular RAG architecture. The roadmap is divided into 3 phases over 24 weeks, prioritizing high-impact improvements that enhance retrieval quality, answer accuracy, and system intelligence.
Current State Assessment
✅ Implemented Features
Indexing (Modular-level)
- Document processors for PDF, DOCX, XLSX, TXT (
backend/rag_solution/data_ingestion/) - Multiple chunking strategies: simple, semantic, token-based (
backend/rag_solution/data_ingestion/chunking.py) - Batch embedding generation (
backend/rag_solution/data_ingestion/ingestion.py) - Multiple vector DB support: Milvus, Elasticsearch, Pinecone, Weaviate, ChromaDB
Pre-Retrieval (Advanced-level - Partial)
- Query rewriting with HyDE (
backend/rag_solution/query_rewriting/query_rewriter.py) - SimpleQueryRewriter for query expansion
Retrieval (Hybrid approach)
- Vector, Keyword, Hybrid retriever types (
backend/rag_solution/retrieval/retriever.py) - Pipeline configuration schema with retriever selection (
backend/rag_solution/schemas/pipeline_schema.py) - Multiple LLM providers: WatsonX, OpenAI, Anthropic (
backend/rag_solution/generation/providers/)
Generation (Modular-level - Partial)
- Chain of Thought (CoT) reasoning (Issue 🧠 Implement Chain of Thought (CoT) Reasoning for Enhanced RAG Search Quality #136)
- Question decomposition (
backend/rag_solution/services/question_decomposer.py) - Answer synthesis (
backend/rag_solution/services/answer_synthesizer.py) - Source attribution (
backend/rag_solution/services/source_attribution_service.py)
- Question decomposition (
- Token tracking service (
backend/rag_solution/services/token_tracking_service.py) - Automatic pipeline resolution (Issue Improve Pipeline Association Architecture for Better UX and Flexibility #222)
❌ Missing Critical Components
Post-Retrieval
- No reranking (commented out code exists in
backend/rag_solution/evaluation/metrics.py) - No chunk compression/selection
- Basic context management without filtering
Generation
- No answer verification
- No hallucination detection
- No external knowledge integration
Orchestration
- No semantic routing for pipeline selection
- No hard/soft prompt distinction
- No adaptive retrieval strategies
Architecture Target
Based on the Modular RAG reference architecture, we aim to implement:
- Advanced Query Processing: Multi-query, decomposition, semantic understanding
- Intelligent Retrieval: Hybrid search with reranking and compression
- Smart Orchestration: Semantic routing, scheduling, adaptive strategies
- Verified Generation: Answer verification, hallucination detection, confidence scoring
- Knowledge Enhancement: Knowledge graphs, multi-hop reasoning, external knowledge integration
Roadmap Overview
Phase 1: Complete Advanced RAG (4-6 weeks)
Goal: Implement missing Advanced RAG components for immediate quality improvements
- Post-retrieval reranking (cross-encoder, Cohere Rerank)
- Chunk compression and selection
- Enhanced query expansion and decomposition
- Structural organization (hierarchical chunking)
- Hybrid retrieval refinement
Expected Impact: 20-30% improvement in retrieval precision, 15-25% improvement in answer quality
Phase 2: Early Modular RAG (6-8 weeks)
Goal: Build foundational Modular RAG modules
- Semantic routing and orchestration
- Query scheduling (hard vs soft prompts)
- Answer verification and hallucination detection
- External knowledge integration
- Knowledge graph foundation
Expected Impact: Intelligent pipeline selection (>85% accuracy), reduced hallucinations (>90% accuracy)
Phase 3: Full Modular RAG (8-10 weeks)
Goal: Complete Modular RAG implementation with advanced capabilities
- Retriever fine-tuning (LM-supervised)
- Advanced indexing strategies (multi-index, chunk optimization)
- Full orchestration with dynamic pipeline assembly
- Continuous learning and auto-optimization
Expected Impact: Production-ready Modular RAG with adaptive intelligence, continuous improvement
Success Metrics
Quantitative
- Retrieval Precision@10: Increase from baseline to >0.85
- Answer Accuracy: >90% factually correct responses
- Hallucination Rate: <5% hallucinated facts
- Routing Accuracy: >85% correct pipeline selection
- Latency: <2s for simple queries, <5s for complex CoT queries
Qualitative
- Improved handling of complex multi-part questions
- Better table and structured data retrieval
- Reduced false positives in search results
- Smarter resource allocation (avoid unnecessary retrieval)
Implementation Phases
This epic is broken down into the following child issues:
- #TBD - Phase 1: Complete Advanced RAG (4-6 weeks)
- #TBD - Phase 2: Early Modular RAG (6-8 weeks)
- #TBD - Phase 3: Full Modular RAG (8-10 weeks)
Dependencies
- Docling integration (Enhancement: Integrate IBM Docling for Advanced Document Processing #255) - Enhances document ingestion quality
- Existing CoT implementation (Issue 🧠 Implement Chain of Thought (CoT) Reasoning for Enhanced RAG Search Quality #136) - Foundation for advanced reasoning
- Automatic pipeline resolution (Issue Improve Pipeline Association Architecture for Better UX and Flexibility #222) - Foundation for orchestration
Technical Approach
Design Principles
- Incremental: Each phase delivers standalone value
- Backward Compatible: Existing functionality remains unchanged
- Configurable: All new features behind feature flags
- Tested: Comprehensive unit, integration, and performance tests
- Documented: Clear documentation for each component
Architecture Pattern
- Service-based: Follow existing service architecture (
backend/rag_solution/services/) - Dependency Injection: Use settings and dependency injection patterns
- Abstract Interfaces: Define base classes for extensibility
- Factory Pattern: Use factories for component instantiation
Risk Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
| Performance degradation | High | Benchmark each phase, feature flags for rollback |
| Complexity creep | Medium | Strict scope control, MVP approach per phase |
| Breaking changes | High | Comprehensive testing, staged rollout |
| LLM cost increase | Medium | Token budgets, caching, smart routing |
Resources Required
- Development: 24 weeks total (can be parallelized with multiple developers)
- Testing: Benchmark datasets, evaluation framework
- Infrastructure: No additional infrastructure for Phases 1-2, potential GPU for Phase 3 fine-tuning
Timeline
Weeks 1-6: Phase 1 - Complete Advanced RAG
Weeks 7-14: Phase 2 - Early Modular RAG
Weeks 15-24: Phase 3 - Full Modular RAG
References
- Modular RAG Architecture (reference diagram provided)
- RAG Survey Paper
- LangChain Retrieval Strategies
- LlamaIndex Advanced Retrieval Patterns
Related Issues
- Enhancement: Integrate IBM Docling for Advanced Document Processing #255 - Docling Integration for Enhanced Document Processing
- 🧠 Implement Chain of Thought (CoT) Reasoning for Enhanced RAG Search Quality #136 - Chain of Thought Reasoning Implementation
- Improve Pipeline Association Architecture for Better UX and Flexibility #222 - Simplified Pipeline Resolution
Note: This is a living epic that will be updated as phases progress. Each phase issue will contain detailed implementation specifications, file changes, and testing criteria.
Child Issues
This epic is broken down into the following implementation phases:
- Phase 1: Complete Advanced RAG - Reranking, Compression & Query Enhancement #257 - Phase 1: Complete Advanced RAG (4-6 weeks) - Reranking, Compression & Query Enhancement
- Phase 2: Early Modular RAG - Routing, Orchestration & Verification #258 - Phase 2: Early Modular RAG (6-8 weeks) - Routing, Orchestration & Verification
- Phase 3: Full Modular RAG - Advanced Indexing, Fine-Tuning & Auto-Optimization #259 - Phase 3: Full Modular RAG (8-10 weeks) - Advanced Indexing, Fine-Tuning & Auto-Optimization
Total Timeline: 18-24 weeks to complete all phases
Current Priority: Phase 1 (#257)