A production-ready, modular Retrieval-Augmented Generation (RAG) platform with Chain of Thought reasoning, multi-LLM support, and enterprise-grade features
๐ Quick Start โข ๐ Documentation โข ๐ ๏ธ Development โข โจ Features โข ๐ค Contributing
RAG Modulo is a production-ready Retrieval-Augmented Generation platform that provides enterprise-grade document processing, intelligent search, and AI-powered question answering with advanced Chain of Thought (CoT) reasoning. Built with modern technologies and designed for scalability, it supports multiple vector databases (Milvus, Elasticsearch, Pinecone, Weaviate, ChromaDB), LLM providers (WatsonX, OpenAI, Anthropic), and document formats including enhanced support via IBM Docling integration.
๐ง AI-Powered | ๐ Advanced Search | ๐ฌ Interactive UI | ๐ Production Ready |
---|---|---|---|
Chain of Thought reasoning Automatic pipeline resolution Multi-LLM provider support Token tracking & monitoring |
Vector similarity search Hybrid search strategies Intelligent source attribution Auto-generated suggestions |
Modern React interface Real-time document upload Podcast generation Voice preview features |
Docker + GHCR images Multi-stage CI/CD Security scanning 947 automated tests |
- Modern UI: React 18 with Tailwind CSS for responsive, accessible design
- Enhanced Search: Interactive chat interface with Chain of Thought reasoning visualization
- Document Management: Real-time file upload with drag-and-drop support
- Smart Display: Document source attribution with chunk-level page references
- Podcast Generation: AI-powered podcast creation with voice preview
- Question Suggestions: Intelligent query recommendations based on collection content
Component | Status | Progress |
---|---|---|
๐๏ธ Infrastructure | โ Production Ready | Docker + GHCR + Cloud Deployment |
๐งช Testing | โ Comprehensive | 947 tests (atomic, unit, integration, API) |
๐ Core Services | โ Fully Operational | 26+ services with DI pattern |
๐ Documentation | โ Extensive | API docs, guides, deployment |
๐ง Development | โ Optimized | Containerless local dev workflow |
๐ Security | โ Hardened | Multi-layer scanning (Trivy, Bandit, Gitleaks) |
Feature | Description | Benefit |
---|---|---|
๐ง Chain of Thought | Automatic question decomposition with step-by-step reasoning | 40%+ better answer quality on complex queries |
โก Auto Pipeline Resolution | Zero-config search - backend handles pipeline selection | Simplified API, reduced client complexity |
๐ Security Hardening | Multi-layer scanning (Trivy, Bandit, Gitleaks, Semgrep) | Production-grade security posture |
๐ Containerless Dev | Local development without containers | 10x faster iteration, instant hot-reload |
๐ IBM Docling | Enhanced document processing for complex formats | Better PDF/DOCX/XLSX handling |
๐๏ธ Podcast Generation | AI-powered podcast creation with voice preview | Interactive content from documents |
๐ก Smart Suggestions | Auto-generated relevant questions | Improved user experience and discovery |
๐ฆ GHCR Images | Pre-built production images | Faster deployments, consistent environments |
Requirement | Version | Purpose |
---|---|---|
Python | 3.12+ | Backend development |
Poetry | Latest | Python dependency management |
Node.js | 18+ | Frontend development |
Docker | Latest | Infrastructure services |
Docker Compose | V2 | Orchestration |
Best for: Daily development, feature work, rapid iteration
# 1. Clone repository
git clone https://github.com/manavgup/rag-modulo.git
cd rag-modulo
# 2. Set up environment
cp env.example .env
# Edit .env with your API keys (WatsonX, OpenAI, etc.)
# 3. Install dependencies
make local-dev-setup # Installs both backend (Poetry) and frontend (npm)
# 4. Start infrastructure (Postgres, Milvus, MinIO, MLFlow)
make local-dev-infra
# 5. Start backend (Terminal 1)
make local-dev-backend
# 6. Start frontend (Terminal 2)
make local-dev-frontend
# OR start everything in background
make local-dev-all
Access Points:
- ๐ Frontend: http://localhost:3000
- ๐ง Backend API: http://localhost:8000/docs (Swagger UI)
- ๐ MLFlow: http://localhost:5001
- ๐พ MinIO Console: http://localhost:9001
Benefits:
- โก Instant reload - Python/React changes reflected immediately (no container rebuilds)
- ๐ Native debugging - Use PyCharm, VS Code debugger with breakpoints
- ๐ฆ Local caching - Poetry/npm caches work natively for faster dependency installs
- ๐ฅ Fastest iteration - Pre-commit hooks optimized (fast on commit, comprehensive on push)
When to use:
- โ Daily development work
- โ Feature development and bug fixes
- โ Rapid iteration and testing
- โ Debugging with breakpoints
Best for: Production-like testing, deployment validation
# Clone repository
git clone https://github.com/manavgup/rag-modulo.git
cd rag-modulo
# Set up environment
cp env.example .env
# Edit .env with your API keys
# Start with pre-built images from GHCR
make run-ghcr
# OR build and run locally
make build-all-local
docker compose up -d
When to use:
- โ Testing production configurations
- โ Validating Docker builds
- โ Deployment rehearsal
- โ Performance benchmarking
Best for: Quick experimentation, onboarding, cloud development
- Go to repository โ "Code" โ "Codespaces"
- Click "Create codespace" on your branch
- Start coding in browser-based VS Code
- Run:
make venv && make run-infra
When to use:
- โ No local setup required
- โ Consistent development environment
- โ Work from any device
- โ Team onboarding
RAG Modulo follows a modern, service-based architecture with clear separation of concerns:
graph TB
subgraph "Frontend Layer"
UI[React Web UI]
CLI[Command Line Interface]
end
subgraph "API Layer"
API[FastAPI Backend]
AUTH[OIDC Authentication]
end
subgraph "Service Layer"
SEARCH[Search Service]
CONV[Conversation Service]
TOKEN[Token Tracking]
COT[Chain of Thought]
end
subgraph "Data Layer"
VDB[(Vector Database)]
PG[(PostgreSQL)]
MINIO[(MinIO Storage)]
end
subgraph "External Services"
LLM[LLM Providers]
EMB[Embedding Models]
end
UI --> API
CLI --> API
API --> SEARCH
API --> CONV
API --> TOKEN
API --> COT
SEARCH --> VDB
SEARCH --> PG
CONV --> LLM
TOKEN --> PG
COT --> LLM
API --> MINIO
Philosophy: Develop locally without containers for maximum speed, deploy with containers for production.
# Morning setup (once per day)
cd rag-modulo
source backend/.venv/bin/activate # Activate Python environment
make run-infra # Start infrastructure (Postgres, Milvus, etc.)
# Terminal 1: Backend with auto-reload
cd backend
uvicorn main:app --reload --port 8000
# Terminal 2: Frontend with HMR
cd frontend
npm run dev
# Development cycle
# 1. Make code changes
# 2. See changes instantly (auto-reload)
# 3. Test manually via http://localhost:3000
# 4. Run quick checks before commit
make quick-check
# End of day cleanup
make local-dev-stop # Stop infrastructure containers
deactivate # Deactivate Python venv
Command | Description | When to Use |
---|---|---|
make local-dev-setup |
Install all dependencies (backend + frontend) | First time setup |
make local-dev-infra |
Start infrastructure containers only | Daily (Postgres, Milvus, MinIO, MLFlow) |
make local-dev-backend |
Start backend with hot-reload | Development (Terminal 1) |
make local-dev-frontend |
Start frontend with HMR | Development (Terminal 2) |
make local-dev-all |
Start everything in background | Quick full stack startup |
make quick-check |
Fast lint + format check | Pre-commit validation |
make test-unit-fast |
Run unit tests locally | Rapid testing without containers |
make local-dev-stop |
Stop all services | Clean shutdown |
# Fast local testing (no containers)
source backend/.venv/bin/activate
cd backend
pytest tests/unit/ -v # Unit tests only
pytest tests/integration/ -v # Integration tests
# Or use Makefile targets
make test-unit-fast # Fast unit tests
make test-integration # Integration tests (needs infra)
# Quality checks
make quick-check # Fast: format + lint
make lint # All linters
make format # Auto-fix formatting
make security-check # Security scans
make coverage # Test coverage report
Only for production-like testing or deployment validation:
# Build production images
make build-backend
make build-frontend
# Start production environment
make prod-start
# Or use pre-built GHCR images
make run-ghcr
- Chain of Thought Reasoning: Automatic question decomposition with step-by-step reasoning, iterative context building, and transparent reasoning visualization
- Automatic Pipeline Resolution: Zero-config search experience - backend automatically selects and creates pipelines based on user context
- Token Tracking & Monitoring: Real-time usage tracking across all LLM interactions with detailed breakdowns
- Multi-LLM Support: Seamless switching between WatsonX, OpenAI, and Anthropic with provider-specific optimizations
- IBM Docling Integration: Enhanced document processing for complex formats (PDF, DOCX, XLSX)
- Question Suggestions: AI-generated relevant questions based on document collection content
- Vector Databases: Pluggable support for Milvus (default), Elasticsearch, Pinecone, Weaviate, ChromaDB via common interface
- Hybrid Search: Combines semantic vector similarity with keyword search strategies
- Source Attribution: Granular document source tracking with chunk-level page references across reasoning steps
- Advanced Chunking: Hierarchical chunking strategies with configurable size and overlap
- Conversation History: Context-aware search with conversation memory for multi-turn interactions
- Service-Based Design: 26+ services with clean separation of concerns and dependency injection pattern
- Repository Pattern: Data access abstraction layer for improved testability and maintainability
- Asynchronous Operations: Async/await throughout for efficient concurrent request handling
- Production Deployment: Docker + GHCR images, multi-stage builds, cloud-ready (AWS, Azure, GCP, IBM Cloud)
- Modular Design: Pluggable components for vector DBs, LLM providers, embedding models
- Comprehensive Test Suite: 947 automated tests across all layers (atomic, unit, integration, API, E2E)
- Multi-Layer Testing Strategy:
- Atomic tests for schemas and data structures
- Unit tests for business logic and services
- Integration tests for service interactions
- API tests for endpoint validation
- Security Scanning: Multi-layer security with Trivy (containers), Bandit (Python), Gitleaks (secrets), Semgrep (SAST)
- Code Quality: Ruff linting, MyPy type checking, Pylint analysis, pre-commit hooks
- CI/CD Pipeline: Multi-stage GitHub Actions with test isolation, builds, and comprehensive integration testing
- ๐ Full Documentation - Comprehensive guides and API reference
- ๐ Getting Started - Quick start guide
- ๐ ๏ธ Development Guide - Development workflow and best practices
- ๐งช Testing Guide - Testing strategies and execution
- ๐ Deployment Guide - Production deployment instructions
- โ๏ธ Configuration Guide - Environment setup and configuration
- ๐ API Reference - Complete API documentation
- ๐ฅ๏ธ CLI Documentation - Command-line interface guide
RAG Modulo includes a powerful CLI for interacting with the system:
# After installation, use the CLI commands:
rag-cli --help # Main CLI help
rag-search # Search operations
rag-admin # Administrative tasks
# Example: Search a collection
rag-cli search query <collection-id> "your question here"
# Create a collection
rag-cli collection create --name "My Docs"
# Upload documents
rag-cli collection upload <collection-id> path/to/documents/
RAG Modulo supports multiple deployment strategies:
# Start production environment (all containers)
make prod-start
# Check status
make prod-status
# View logs
make prod-logs
# Stop production environment
make prod-stop
# Pull and run latest images from GitHub Container Registry
make run-ghcr
Available Images:
ghcr.io/manavgup/rag_modulo/backend:latest
ghcr.io/manavgup/rag_modulo/frontend:latest
# Build local images
make build-all
# Start services
make run-app
AWS Deployment
- ECS (Elastic Container Service): Use docker-compose.production.yml
- EKS (Kubernetes): Deploy with Kubernetes manifests
- EC2: Docker Compose or standalone containers
- Lambda: Serverless functions for specific services
Azure Deployment
- Azure Container Instances: Quick container deployment
- AKS (Azure Kubernetes Service): Production-grade orchestration
- Azure Container Apps: Serverless container hosting
Google Cloud Deployment
- Cloud Run: Fully managed serverless platform
- GKE (Google Kubernetes Engine): Kubernetes orchestration
- Compute Engine: VM-based deployment with Docker
IBM Cloud Deployment
- Code Engine: Serverless container platform
- IKS (IBM Kubernetes Service): Enterprise Kubernetes
- Red Hat OpenShift: Advanced container platform
# Apply Kubernetes manifests
kubectl apply -f deployment/k8s/
# Or deploy with Helm (if charts exist)
helm install rag-modulo ./charts/rag-modulo
RAG Modulo uses a comprehensive CI/CD pipeline with multiple stages:
Triggers: Push to main
, Pull Requests
Stages:
-
Lint and Unit Tests (No infrastructure)
- Ruff linting (120 char line length)
- MyPy type checking
- Unit tests with pytest
- Fast feedback (~5-10 minutes)
-
Build Docker Images
- Backend image build
- Frontend image build
- Push to GitHub Container Registry (GHCR)
- Tagged with:
latest
,sha-<commit>
, branch name
-
Integration Tests
- Full stack deployment
- PostgreSQL, Milvus, MLFlow, MinIO
- API tests, integration tests
- End-to-end validation
Status Badges:
[](https://github.com/manavgup/rag_modulo/actions)
Triggers: Push to main
, Pull Requests, Weekly schedule
Scans:
- Trivy: Container vulnerability scanning
- Bandit: Python security linting
- Gitleaks: Secret detection
- Safety: Python dependency vulnerabilities
- Semgrep: SAST code analysis
Triggers: Push to main
, Pull Requests to docs/
Actions:
- Build MkDocs site
- Deploy to GitHub Pages
- API documentation generation
Test CI pipeline locally before pushing:
# Run same checks as CI
make ci-local
# Validate CI workflows
make validate-ci
# Security checks
make security-check
make scan-secrets
Optimized for developer velocity:
On Commit (fast, 5-10 sec):
- Ruff formatting
- Trailing whitespace
- YAML syntax
- File size limits
On Push (slow, 30-60 sec):
- MyPy type checking
- Pylint analysis
- Security scans
- Strangler pattern checks
In CI (comprehensive):
- All checks run regardless
- Ensures quality gates
GitHub Container Registry (GHCR):
- Automatic image builds on push
- Multi-architecture support (amd64, arm64)
- Image signing and verification
- Retention policies
Image Tags:
latest
: Latest main branch buildsha-<commit>
: Specific commit<branch>
: Branch-specific buildsv<version>
: Release tags
Category | Count | Description | Command |
---|---|---|---|
โก Atomic Tests | 100+ | Schema validation, data structures | pytest -m atomic |
๐ Unit Tests | 83+ | Service logic, business rules | make test-unit-fast |
๐ Integration Tests | 43+ | Service interactions, DB integration | make test-integration |
๐ API Tests | 21+ | Endpoint validation, request/response | pytest -m api |
๐ E2E Tests | 22+ | Full workflow scenarios | pytest -m e2e |
๐ Total | 947 | Complete test coverage | make test-all |
# Fast local testing (no containers, recommended for development)
make test-unit-fast
# Specific test categories
make test-atomic # Schema and data structure tests
make test-integration # Service integration tests (requires infrastructure)
make test-api # API endpoint tests
# Full test suite with coverage
make coverage
# Run specific test file
make test testfile=tests/unit/test_search_service.py
We welcome contributions! Please see our Contributing Guide for details.
- Service Layer Architecture - Follow service-based patterns
- Code Quality - Use type hints, comprehensive docstrings, PEP 8
- Testing - Write tests for all new features
- Documentation - Update docs for any changes
- Fork and Clone the repository
- Create Feature Branch from main
- Make Changes following our guidelines
- Run Tests and ensure they pass
- Submit Pull Request with clear description
- Service-based architecture with 26+ services
- Comprehensive test infrastructure (947 tests)
- Multi-LLM provider support (WatsonX, OpenAI, Anthropic)
- Vector database abstraction layer
- CI/CD pipeline with security scanning
- Chain of Thought (CoT) reasoning system
- Automatic pipeline resolution
- Token tracking and monitoring
- IBM Docling integration
- Podcast generation with voice preview
- Question suggestion system
- Containerless local development workflow
- Production deployment with GHCR images
- Multi-stage Docker builds
- Security hardening (Trivy, Bandit, Gitleaks, Semgrep)
- Enhanced monitoring and observability
- Performance optimization and caching
- Authentication system improvements (OIDC)
- Multi-tenant support
- Advanced analytics and dashboards
- Batch processing capabilities
- API rate limiting and quotas
- Advanced caching strategies
- Multi-modal support (image, audio)
- Agentic AI workflows
- Real-time collaborative features
- Advanced reasoning strategies
- Federated learning support
๐ Virtual Environment Issues
Problem: Dependencies not installing
# Use the Makefile (recommended)
make local-dev-setup
# OR manually:
cd backend
poetry config virtualenvs.in-project true
poetry install --with dev,test
source .venv/bin/activate
# Frontend
cd ../frontend
npm install
Problem: Wrong tool versions (e.g., Ruff 0.5.7 instead of 0.14.0)
# Ensure you're in the Poetry virtual environment
cd backend
source .venv/bin/activate
which python # Should show backend/.venv/bin/python
ruff --version # Should show 0.14.0
Problem: poetry install
fails
# Update Poetry and retry
poetry self update
poetry cache clear . --all
poetry install --with dev,test --sync
๐ณ Docker Issues
Problem: Infrastructure services fail to start
# Use Makefile commands (recommended)
make local-dev-stop # Stop everything
make local-dev-infra # Restart infrastructure
# OR manually:
docker compose -f docker-compose-infra.yml down
docker compose -f docker-compose-infra.yml up -d
# Check logs
make logs
Problem: Port already in use
# Find what's using the port
lsof -i :8000 # Backend
lsof -i :3000 # Frontend
lsof -i :5432 # Postgres
# Stop all services
make local-dev-stop
# OR kill specific service
kill $(lsof -t -i:8000)
๐ Authentication Issues
Problem: Login attempts fail
- Ensure OIDC configuration is correct in
.env
- Check IBM Cloud credentials
- Verify redirect URLs match your setup
Development Mode: Use mock authentication
# In .env or .env.dev
SKIP_AUTH=true
DEVELOPMENT_MODE=true
ENABLE_MOCK_AUTH=true
๐งช Test Failures
Problem: Tests failing locally
# Ensure you're in venv
source backend/.venv/bin/activate
# Run specific test
cd backend
pytest tests/unit/test_example.py -v
# Run with more details
pytest tests/unit/test_example.py -vv -s
# Check test dependencies
poetry install --with test --sync
๐ฆ Dependency Issues
Problem: Import errors or missing modules
# Reinstall all dependencies
cd backend
poetry install --with dev,test --sync
# Check what's installed
poetry show
# Verify Python path
python -c "import sys; print(sys.path)"
- ๐ Check Documentation: Full docs
- ๐ Report Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ See:
IMMEDIATE_FIX.md
for common development issues
This project is licensed under the MIT License - see the LICENSE file for details.
- IBM Docling - Advanced document processing and understanding
- IBM WatsonX - Enterprise AI foundation models
- FastAPI - Modern, high-performance web framework
- React - Powerful UI library for building interactive interfaces
- Milvus - High-performance vector database
- Docker - Containerization and deployment platform
- All Contributors - Thank you for your contributions!