From 536256fb70013d758af7c3da7db5b8d5573016a3 Mon Sep 17 00:00:00 2001 From: manavgup Date: Thu, 27 Nov 2025 13:05:52 -0500 Subject: [PATCH] docs: Add agentic RAG architecture documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add comprehensive architecture documentation for the Agentic RAG Platform: - agentic-ui-architecture.md: React component hierarchy, state management, and API integration for agent features - backend-architecture-diagram.md: Overall backend architecture with Mermaid diagrams showing service layers and data flow - mcp-integration-architecture.md: MCP client/server integration strategy, PR comparison (#671 vs #684), and Context Forge integration - rag-modulo-mcp-server-architecture.md: Exposing RAG capabilities as MCP server with tools (rag_search, rag_ingest, etc.) and resources - search-agent-hooks-architecture.md: 3-stage agent pipeline (pre-search, post-search, response) with database schema and execution flow - system-architecture.md: Complete system architecture overview with technology stack and data flows These documents guide implementation of: - PR #695 (SPIFFE/SPIRE agent identity) - PR #671 (MCP Gateway client) - Issue #697 (Agent execution hooks) - Issue #698 (MCP Server) - Issue #699 (Agentic UI) Closes #696 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- docs/architecture/agentic-ui-architecture.md | 1470 +++++++++++++++++ .../backend-architecture-diagram.md | 517 ++++++ .../mcp-integration-architecture.md | 200 +++ .../rag-modulo-mcp-server-architecture.md | 689 ++++++++ .../search-agent-hooks-architecture.md | 416 +++++ docs/architecture/system-architecture.md | 425 +++++ 6 files changed, 3717 insertions(+) create mode 100644 docs/architecture/agentic-ui-architecture.md create mode 100644 docs/architecture/backend-architecture-diagram.md create mode 100644 docs/architecture/mcp-integration-architecture.md create mode 100644 docs/architecture/rag-modulo-mcp-server-architecture.md create mode 100644 docs/architecture/search-agent-hooks-architecture.md create mode 100644 docs/architecture/system-architecture.md diff --git a/docs/architecture/agentic-ui-architecture.md b/docs/architecture/agentic-ui-architecture.md new file mode 100644 index 00000000..ceabf6ec --- /dev/null +++ b/docs/architecture/agentic-ui-architecture.md @@ -0,0 +1,1470 @@ +# Agentic UI Architecture + +**Date**: November 2025 +**Status**: Architecture Design +**Version**: 1.0 +**Related Documents**: + +- [MCP Integration Architecture](./mcp-integration-architecture.md) +- [SearchService Agent Hooks Architecture](./search-agent-hooks-architecture.md) +- [RAG Modulo MCP Server Architecture](./rag-modulo-mcp-server-architecture.md) + +## Overview + +This document describes the frontend architecture for transforming RAG Modulo into a fully +agentic RAG solution. It covers the React component hierarchy, state management, user +interactions, and integration patterns needed to support: + +1. **Agent Configuration** - Per-collection agent assignment and configuration +2. **Artifact Display** - Rendering and downloading agent-generated artifacts +3. **Execution Visibility** - Real-time pipeline stage and agent status indicators +4. **Agent Management** - Dashboard for managing user's agents and viewing analytics + +## Current Frontend Architecture + +### Existing Components (Reference) + +``` +frontend/src/components/ +├── agents/ +│ └── LightweightAgentOrchestration.tsx # Existing workflow-focused agent UI +├── search/ +│ ├── LightweightSearchInterface.tsx # Main search chat interface +│ ├── ChainOfThoughtAccordion.tsx # CoT reasoning display +│ ├── SourcesAccordion.tsx # Document sources +│ ├── CitationsAccordion.tsx # Citation display +│ └── TokenAnalysisAccordion.tsx # Token usage metrics +├── collections/ +│ ├── LightweightCollections.tsx # Collection list +│ └── LightweightCollectionDetail.tsx # Collection settings +└── ui/ + ├── Card.tsx, Button.tsx, Modal.tsx # Reusable UI components + └── ... +``` + +### Design System + +- **Framework**: React 18 with TypeScript +- **Styling**: Tailwind CSS with Carbon Design System colors +- **Icons**: Heroicons (@heroicons/react) +- **State**: React hooks + Context (NotificationContext) +- **Routing**: React Router DOM + +## New Component Architecture + +### Component Hierarchy + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ App Layout │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ LightweightLayout (existing) │ │ +│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Routes │ │ │ +│ │ │ │ │ │ +│ │ │ /search │ │ │ +│ │ │ └── LightweightSearchInterface (ENHANCED) │ │ │ +│ │ │ ├── SearchInput │ │ │ +│ │ │ ├── MessageList │ │ │ +│ │ │ │ └── MessageCard │ │ │ +│ │ │ │ ├── ChainOfThoughtAccordion │ │ │ +│ │ │ │ ├── SourcesAccordion │ │ │ +│ │ │ │ ├── AgentArtifactsPanel (NEW) │ │ │ +│ │ │ │ │ └── ArtifactCard (NEW) │ │ │ +│ │ │ │ └── AgentExecutionIndicator (NEW) │ │ │ +│ │ │ └── AgentPipelineStatus (NEW) │ │ │ +│ │ │ │ │ │ +│ │ │ /collections/:id/settings │ │ │ +│ │ │ └── LightweightCollectionDetail (ENHANCED) │ │ │ +│ │ │ └── CollectionAgentsTab (NEW) │ │ │ +│ │ │ ├── AgentList (NEW) │ │ │ +│ │ │ ├── AgentConfigModal (NEW) │ │ │ +│ │ │ └── AgentMarketplace (NEW) │ │ │ +│ │ │ │ │ │ +│ │ │ /agents │ │ │ +│ │ │ └── AgentDashboard (NEW) │ │ │ +│ │ │ ├── MyAgentsPanel (NEW) │ │ │ +│ │ │ ├── AgentAnalytics (NEW) │ │ │ +│ │ │ └── AgentAuditLog (NEW) │ │ │ +│ │ │ │ │ │ +│ │ │ /agents/marketplace │ │ │ +│ │ │ └── AgentMarketplacePage (NEW) │ │ │ +│ │ │ ├── AgentCatalog (NEW) │ │ │ +│ │ │ └── AgentDetailModal (NEW) │ │ │ +│ │ └─────────────────────────────────────────────────────────────────┘ │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### File Structure + +``` +frontend/src/ +├── components/ +│ ├── agents/ +│ │ ├── LightweightAgentOrchestration.tsx # Existing (keep for workflows) +│ │ ├── AgentDashboard.tsx # NEW: Main agent management page +│ │ ├── MyAgentsPanel.tsx # NEW: User's configured agents +│ │ ├── AgentAnalytics.tsx # NEW: Agent usage stats +│ │ ├── AgentAuditLog.tsx # NEW: Execution history +│ │ ├── AgentMarketplacePage.tsx # NEW: Browse available agents +│ │ ├── AgentCatalog.tsx # NEW: Grid of available agents +│ │ ├── AgentDetailModal.tsx # NEW: Agent info and add button +│ │ ├── CollectionAgentsTab.tsx # NEW: Collection settings tab +│ │ ├── AgentList.tsx # NEW: Agents for a collection +│ │ ├── AgentConfigModal.tsx # NEW: Configure agent settings +│ │ └── AgentPriorityDragDrop.tsx # NEW: Drag to reorder priority +│ │ +│ ├── search/ +│ │ ├── LightweightSearchInterface.tsx # ENHANCED: Add artifact support +│ │ ├── AgentArtifactsPanel.tsx # NEW: Container for artifacts +│ │ ├── ArtifactCard.tsx # NEW: Single artifact display +│ │ ├── ArtifactPreviewModal.tsx # NEW: Preview images/PDFs +│ │ ├── AgentExecutionIndicator.tsx # NEW: Per-message agent badges +│ │ └── AgentPipelineStatus.tsx # NEW: Real-time pipeline stages +│ │ +│ └── ui/ +│ ├── ProgressSteps.tsx # NEW: Pipeline stage indicator +│ └── FileDownloadButton.tsx # NEW: Base64 download handler +│ +├── services/ +│ ├── apiClient.ts # ENHANCED: Add agent API methods +│ └── agentApiClient.ts # NEW: Agent-specific API calls +│ +├── types/ +│ └── agent.ts # NEW: Agent TypeScript interfaces +│ +└── contexts/ + └── AgentContext.tsx # NEW: Agent state management +``` + +## New Components Specification + +### 1. Search Interface Enhancements + +#### AgentArtifactsPanel + +Container for displaying agent-generated artifacts within search results. + +```typescript +// frontend/src/components/search/AgentArtifactsPanel.tsx + +interface AgentArtifact { + agent_id: string; + type: 'pptx' | 'pdf' | 'png' | 'mp3' | 'html' | 'txt'; + data: string; // base64 encoded + filename: string; + metadata: Record; +} + +interface AgentArtifactsPanelProps { + artifacts: AgentArtifact[]; + isLoading?: boolean; +} + +const AgentArtifactsPanel: React.FC = ({ + artifacts, + isLoading +}) => { + if (!artifacts?.length && !isLoading) return null; + + return ( +
+
+ +

+ Generated Artifacts ({artifacts.length}) +

+
+ + {isLoading ? ( +
+ {[1, 2].map(i => ( + + ))} +
+ ) : ( +
+ {artifacts.map((artifact, index) => ( + + ))} +
+ )} +
+ ); +}; +``` + +#### ArtifactCard + +Individual artifact display with preview and download actions. + +```typescript +// frontend/src/components/search/ArtifactCard.tsx + +interface ArtifactCardProps { + artifact: AgentArtifact; +} + +const ArtifactCard: React.FC = ({ artifact }) => { + const [previewOpen, setPreviewOpen] = useState(false); + + const getIcon = () => { + switch (artifact.type) { + case 'pptx': return ; + case 'pdf': return ; + case 'png': return ; + case 'mp3': return ; + case 'html': return ; + default: return ; + } + }; + + const getLabel = () => { + switch (artifact.type) { + case 'pptx': return 'PowerPoint'; + case 'pdf': return 'PDF Report'; + case 'png': return 'Chart'; + case 'mp3': return 'Audio'; + case 'html': return 'HTML'; + default: return 'File'; + } + }; + + const canPreview = ['png', 'pdf'].includes(artifact.type); + + const handleDownload = () => { + const mimeTypes: Record = { + pptx: 'application/vnd.openxmlformats-officedocument.presentationml.presentation', + pdf: 'application/pdf', + png: 'image/png', + mp3: 'audio/mpeg', + html: 'text/html', + txt: 'text/plain' + }; + + const blob = base64ToBlob(artifact.data, mimeTypes[artifact.type]); + const url = URL.createObjectURL(blob); + const a = document.createElement('a'); + a.href = url; + a.download = artifact.filename; + a.click(); + URL.revokeObjectURL(url); + }; + + return ( + <> +
+
+
+ {getIcon()} +
+
+

+ {getLabel()} +

+

+ {artifact.filename} +

+
+
+ +
+ {canPreview && ( + + )} + +
+ + {artifact.metadata && ( +

+ {artifact.metadata.slides && `${artifact.metadata.slides} slides`} + {artifact.metadata.width && `${artifact.metadata.width}x${artifact.metadata.height}`} +

+ )} +
+ + {previewOpen && ( + setPreviewOpen(false)} + /> + )} + + ); +}; +``` + +#### AgentPipelineStatus + +Real-time pipeline stage indicator shown during search. + +```typescript +// frontend/src/components/search/AgentPipelineStatus.tsx + +type PipelineStage = 'pre_search' | 'search' | 'post_search' | 'generation' | 'response_agents' | 'complete'; + +interface AgentPipelineStatusProps { + currentStage: PipelineStage; + stages: { + id: PipelineStage; + label: string; + agentCount: number; + status: 'pending' | 'running' | 'completed' | 'error'; + duration?: number; + }[]; + isVisible: boolean; +} + +const AgentPipelineStatus: React.FC = ({ + currentStage, + stages, + isVisible +}) => { + if (!isVisible) return null; + + return ( +
+
+ + + Agent Pipeline Processing + +
+ +
+ {stages.map((stage, index) => ( + +
+
+ {stage.status === 'completed' ? ( + + ) : stage.status === 'running' ? ( + + ) : ( + stage.agentCount + )} +
+ + {stage.label} + + {stage.duration && ( + + {stage.duration}ms + + )} +
+ + {index < stages.length - 1 && ( +
+ )} + + ))} +
+
+ ); +}; +``` + +#### AgentExecutionIndicator + +Badge showing which agents processed a response. + +```typescript +// frontend/src/components/search/AgentExecutionIndicator.tsx + +interface AgentExecution { + agent_id: string; + agent_name: string; + stage: 'pre_search' | 'post_search' | 'response'; + duration_ms: number; + success: boolean; +} + +interface AgentExecutionIndicatorProps { + executions: AgentExecution[]; +} + +const AgentExecutionIndicator: React.FC = ({ + executions +}) => { + if (!executions?.length) return null; + + const [expanded, setExpanded] = useState(false); + + const successCount = executions.filter(e => e.success).length; + const totalDuration = executions.reduce((sum, e) => sum + e.duration_ms, 0); + + return ( +
+ + + {expanded && ( +
+ {executions.map((exec, index) => ( +
+ + {exec.agent_name} + ({exec.stage}) + {exec.duration_ms}ms +
+ ))} +
+ )} +
+ ); +}; +``` + +### 2. Collection Agent Configuration + +#### CollectionAgentsTab + +Tab component for collection settings page to configure agents. + +```typescript +// frontend/src/components/agents/CollectionAgentsTab.tsx + +interface CollectionAgentsTabProps { + collectionId: string; +} + +const CollectionAgentsTab: React.FC = ({ + collectionId +}) => { + const [agents, setAgents] = useState([]); + const [availableAgents, setAvailableAgents] = useState([]); + const [isLoading, setIsLoading] = useState(true); + const [showAddModal, setShowAddModal] = useState(false); + const [editingAgent, setEditingAgent] = useState(null); + const { addNotification } = useNotification(); + + useEffect(() => { + loadAgents(); + }, [collectionId]); + + const loadAgents = async () => { + setIsLoading(true); + try { + const [collectionAgents, allAgents] = await Promise.all([ + agentApiClient.getCollectionAgents(collectionId), + agentApiClient.getAvailableAgents() + ]); + setAgents(collectionAgents); + setAvailableAgents(allAgents); + } catch (error) { + addNotification('error', 'Error', 'Failed to load agents'); + } finally { + setIsLoading(false); + } + }; + + const handleToggleAgent = async (agentConfigId: string, enabled: boolean) => { + try { + await agentApiClient.updateAgentConfig(agentConfigId, { enabled }); + setAgents(prev => prev.map(a => + a.id === agentConfigId ? { ...a, enabled } : a + )); + } catch (error) { + addNotification('error', 'Error', 'Failed to update agent'); + } + }; + + const handleReorderAgents = async (reorderedAgents: CollectionAgent[]) => { + try { + // Update priorities based on new order + const updates = reorderedAgents.map((agent, index) => ({ + id: agent.id, + priority: index + })); + await agentApiClient.batchUpdatePriorities(updates); + setAgents(reorderedAgents); + } catch (error) { + addNotification('error', 'Error', 'Failed to reorder agents'); + } + }; + + return ( +
+ {/* Header */} +
+
+

Collection Agents

+

+ Configure AI agents that enhance search and generate artifacts +

+
+ +
+ + {/* Agent List by Stage */} + {isLoading ? ( +
+ {[1, 2, 3].map(i => )} +
+ ) : ( + <> + {/* Pre-Search Agents */} + a.trigger_stage === 'pre_search')} + onToggle={handleToggleAgent} + onEdit={setEditingAgent} + onReorder={handleReorderAgents} + /> + + {/* Post-Search Agents */} + a.trigger_stage === 'post_search')} + onToggle={handleToggleAgent} + onEdit={setEditingAgent} + onReorder={handleReorderAgents} + /> + + {/* Response Agents */} + a.trigger_stage === 'response')} + onToggle={handleToggleAgent} + onEdit={setEditingAgent} + onReorder={handleReorderAgents} + /> + + )} + + {/* Add Agent Modal */} + {showAddModal && ( + { + loadAgents(); + setShowAddModal(false); + }} + onClose={() => setShowAddModal(false)} + /> + )} + + {/* Edit Agent Modal */} + {editingAgent && ( + { + loadAgents(); + setEditingAgent(null); + }} + onClose={() => setEditingAgent(null)} + /> + )} +
+ ); +}; +``` + +#### AgentStageSection + +Section component for agents at a specific pipeline stage. + +```typescript +// frontend/src/components/agents/AgentStageSection.tsx + +interface AgentStageSectionProps { + title: string; + description: string; + stage: 'pre_search' | 'post_search' | 'response'; + agents: CollectionAgent[]; + onToggle: (id: string, enabled: boolean) => void; + onEdit: (agent: CollectionAgent) => void; + onReorder: (agents: CollectionAgent[]) => void; +} + +const AgentStageSection: React.FC = ({ + title, + description, + stage, + agents, + onToggle, + onEdit, + onReorder +}) => { + const stageIcons = { + pre_search: , + post_search: , + response: + }; + + const stageColors = { + pre_search: 'bg-yellow-10 text-yellow-60', + post_search: 'bg-blue-10 text-blue-60', + response: 'bg-purple-10 text-purple-60' + }; + + return ( +
+
+
+ {stageIcons[stage]} +
+
+

{title}

+

{description}

+
+
+ + {agents.length === 0 ? ( +
+ +

No agents configured for this stage

+
+ ) : ( + { + if (!result.destination) return; + const items = Array.from(agents); + const [reordered] = items.splice(result.source.index, 1); + items.splice(result.destination.index, 0, reordered); + onReorder(items); + }}> + + {(provided) => ( +
+ {agents.map((agent, index) => ( + + {(provided, snapshot) => ( +
+
+ +
+ +
+

{agent.name}

+

{agent.description}

+
+ +
+ + Priority: {agent.priority} + + + onToggle(agent.id, enabled)} + className={` + ${agent.enabled ? 'bg-green-50' : 'bg-gray-30'} + relative inline-flex h-5 w-9 items-center rounded-full + `} + > + + + + +
+
+ )} +
+ ))} + {provided.placeholder} +
+ )} +
+
+ )} +
+ ); +}; +``` + +#### AgentConfigModal + +Modal for configuring agent-specific settings. + +```typescript +// frontend/src/components/agents/AgentConfigModal.tsx + +interface AgentConfigModalProps { + agent: CollectionAgent; + onSave: () => void; + onClose: () => void; +} + +const AgentConfigModal: React.FC = ({ + agent, + onSave, + onClose +}) => { + const [config, setConfig] = useState(agent.config); + const [isSaving, setIsSaving] = useState(false); + const { addNotification } = useNotification(); + + // Generate form fields from agent's config schema + const renderConfigField = (key: string, schema: any) => { + const value = config.settings?.[key] ?? schema.default; + + switch (schema.type) { + case 'integer': + return ( +
+ + setConfig({ + ...config, + settings: { ...config.settings, [key]: parseInt(e.target.value) } + })} + className="input-field w-full" + /> + {schema.description && ( +

{schema.description}

+ )} +
+ ); + + case 'boolean': + return ( +
+
+ + {schema.description && ( +

{schema.description}

+ )} +
+ setConfig({ + ...config, + settings: { ...config.settings, [key]: checked } + })} + /> +
+ ); + + case 'string': + if (schema.enum) { + return ( +
+ + +
+ ); + } + return ( +
+ + setConfig({ + ...config, + settings: { ...config.settings, [key]: e.target.value } + })} + className="input-field w-full" + /> +
+ ); + + default: + return null; + } + }; + + const handleSave = async () => { + setIsSaving(true); + try { + await agentApiClient.updateAgentConfig(agent.id, { config }); + addNotification('success', 'Saved', 'Agent configuration updated'); + onSave(); + } catch (error) { + addNotification('error', 'Error', 'Failed to save configuration'); + } finally { + setIsSaving(false); + } + }; + + return ( + +
+

+ Configure {agent.name} +

+ +
+ {/* Agent info */} +
+

{agent.description}

+
+ Stage: {agent.trigger_stage} + Type: {agent.config.type} +
+
+ + {/* Dynamic config fields */} + {agent.config_schema?.properties && ( +
+ {Object.entries(agent.config_schema.properties).map(([key, schema]) => + renderConfigField(key, schema) + )} +
+ )} +
+ +
+ + +
+
+
+ ); +}; +``` + +### 3. Agent Management Dashboard + +#### AgentDashboard + +Main page for managing user's agents across all collections. + +```typescript +// frontend/src/components/agents/AgentDashboard.tsx + +const AgentDashboard: React.FC = () => { + const [activeTab, setActiveTab] = useState<'my-agents' | 'analytics' | 'audit'>('my-agents'); + + return ( +
+
+ {/* Header */} +
+

Agent Management

+

+ Configure and monitor AI agents for your document collections +

+
+ + {/* Tabs */} +
+ +
+ + {/* Tab Content */} + {activeTab === 'my-agents' && } + {activeTab === 'analytics' && } + {activeTab === 'audit' && } +
+
+ ); +}; +``` + +### 4. Agent Marketplace + +#### AgentMarketplacePage + +Browse and discover available agents. + +```typescript +// frontend/src/components/agents/AgentMarketplacePage.tsx + +interface AgentManifest { + agent_id: string; + name: string; + version: string; + description: string; + capabilities: string[]; + config_schema: Record; + input_schema: Record; + output_schema: Record; + category: 'pre_search' | 'post_search' | 'response'; + icon?: string; + author?: string; + downloads?: number; +} + +const AgentMarketplacePage: React.FC = () => { + const [agents, setAgents] = useState([]); + const [filter, setFilter] = useState('all'); + const [search, setSearch] = useState(''); + const [selectedAgent, setSelectedAgent] = useState(null); + + useEffect(() => { + loadAgents(); + }, []); + + const loadAgents = async () => { + const data = await agentApiClient.getAvailableAgents(); + setAgents(data); + }; + + const filteredAgents = agents.filter(agent => { + const matchesFilter = filter === 'all' || agent.category === filter; + const matchesSearch = !search || + agent.name.toLowerCase().includes(search.toLowerCase()) || + agent.description.toLowerCase().includes(search.toLowerCase()); + return matchesFilter && matchesSearch; + }); + + const categories = [ + { id: 'all', label: 'All Agents' }, + { id: 'pre_search', label: 'Pre-Search' }, + { id: 'post_search', label: 'Post-Search' }, + { id: 'response', label: 'Response' }, + ]; + + return ( +
+
+ {/* Header */} +
+

Agent Marketplace

+

+ Discover and add AI agents to enhance your RAG workflows +

+
+ + {/* Filters */} +
+
+ + setSearch(e.target.value)} + className="input-field w-full pl-10" + /> +
+ +
+ {categories.map(cat => ( + + ))} +
+
+ + {/* Agent Grid */} +
+ {filteredAgents.map(agent => ( +
setSelectedAgent(agent)} + > +
+
+ +
+
+

{agent.name}

+

v{agent.version}

+
+
+ +

+ {agent.description} +

+ +
+ + {agent.category.replace('_', '-')} + + + +
+
+ ))} +
+ + {/* Agent Detail Modal */} + {selectedAgent && ( + setSelectedAgent(null)} + /> + )} +
+
+ ); +}; +``` + +## API Integration + +### Agent API Client + +```typescript +// frontend/src/services/agentApiClient.ts + +import apiClient from './apiClient'; + +export interface AgentManifest { + agent_id: string; + name: string; + version: string; + description: string; + capabilities: string[]; + category: 'pre_search' | 'post_search' | 'response'; + config_schema: Record; +} + +export interface CollectionAgent { + id: string; + agent_id: string; + name: string; + description: string; + config: { + type: 'mcp' | 'builtin'; + context_forge_tool_id?: string; + settings: Record; + }; + config_schema?: Record; + enabled: boolean; + trigger_stage: 'pre_search' | 'post_search' | 'response'; + priority: number; +} + +export interface AgentExecution { + id: string; + agent_id: string; + agent_name: string; + collection_id: string; + trigger_stage: string; + success: boolean; + duration_ms: number; + error?: string; + created_at: string; +} + +const agentApiClient = { + // Available agents + getAvailableAgents: async (): Promise => { + const response = await apiClient.get('/api/v1/agents/'); + return response.data; + }, + + getAgentsByCapability: async (capability: string): Promise => { + const response = await apiClient.get(`/api/v1/agents/capabilities/${capability}`); + return response.data; + }, + + // User's agent configurations + getUserAgentConfigs: async (): Promise => { + const response = await apiClient.get('/api/v1/agents/configs'); + return response.data; + }, + + createAgentConfig: async (config: Partial): Promise => { + const response = await apiClient.post('/api/v1/agents/configs', config); + return response.data; + }, + + updateAgentConfig: async ( + configId: string, + updates: Partial + ): Promise => { + const response = await apiClient.patch(`/api/v1/agents/configs/${configId}`, updates); + return response.data; + }, + + deleteAgentConfig: async (configId: string): Promise => { + await apiClient.delete(`/api/v1/agents/configs/${configId}`); + }, + + // Collection agents + getCollectionAgents: async (collectionId: string): Promise => { + const response = await apiClient.get(`/api/v1/agents/collections/${collectionId}/agents`); + return response.data; + }, + + addAgentToCollection: async ( + collectionId: string, + agentConfigId: string + ): Promise => { + await apiClient.post(`/api/v1/agents/collections/${collectionId}/agents`, { + agent_config_id: agentConfigId + }); + }, + + removeAgentFromCollection: async ( + collectionId: string, + agentConfigId: string + ): Promise => { + await apiClient.delete( + `/api/v1/agents/collections/${collectionId}/agents/${agentConfigId}` + ); + }, + + batchUpdatePriorities: async ( + updates: { id: string; priority: number }[] + ): Promise => { + await apiClient.patch('/api/v1/agents/configs/priorities', { updates }); + }, + + // Analytics + getAgentAnalytics: async ( + agentConfigId?: string, + dateRange?: { start: string; end: string } + ): Promise => { + const params = new URLSearchParams(); + if (agentConfigId) params.append('agent_config_id', agentConfigId); + if (dateRange) { + params.append('start', dateRange.start); + params.append('end', dateRange.end); + } + const response = await apiClient.get(`/api/v1/agents/analytics?${params}`); + return response.data; + }, + + // Audit log + getAgentExecutions: async ( + options?: { + agentConfigId?: string; + collectionId?: string; + limit?: number; + offset?: number; + } + ): Promise => { + const params = new URLSearchParams(); + if (options?.agentConfigId) params.append('agent_config_id', options.agentConfigId); + if (options?.collectionId) params.append('collection_id', options.collectionId); + if (options?.limit) params.append('limit', options.limit.toString()); + if (options?.offset) params.append('offset', options.offset.toString()); + const response = await apiClient.get(`/api/v1/agents/executions?${params}`); + return response.data; + }, +}; + +export default agentApiClient; +``` + +### Enhanced Search Response Schema + +```typescript +// frontend/src/types/search.ts + +export interface SearchResponse { + answer: string; + sources: Source[]; + cot_steps?: CotStep[]; + + // NEW: Agent-related fields + agent_artifacts?: AgentArtifact[]; + agent_executions?: AgentExecution[]; + pipeline_metadata?: { + pre_search_agents: number; + post_search_agents: number; + response_agents: number; + total_agent_time_ms: number; + }; +} + +export interface AgentArtifact { + agent_id: string; + type: 'pptx' | 'pdf' | 'png' | 'mp3' | 'html' | 'txt'; + data: string; + filename: string; + metadata: Record; +} + +export interface AgentExecution { + agent_id: string; + agent_name: string; + stage: 'pre_search' | 'post_search' | 'response'; + duration_ms: number; + success: boolean; + error?: string; +} +``` + +## State Management + +### AgentContext + +Context for managing agent-related state across the application. + +```typescript +// frontend/src/contexts/AgentContext.tsx + +interface AgentState { + availableAgents: AgentManifest[]; + userConfigs: CollectionAgent[]; + isLoading: boolean; + error: string | null; +} + +interface AgentContextType extends AgentState { + loadAvailableAgents: () => Promise; + loadUserConfigs: () => Promise; + createConfig: (config: Partial) => Promise; + updateConfig: (id: string, updates: Partial) => Promise; + deleteConfig: (id: string) => Promise; +} + +const AgentContext = createContext(null); + +export const AgentProvider: React.FC<{ children: React.ReactNode }> = ({ children }) => { + const [state, setState] = useState({ + availableAgents: [], + userConfigs: [], + isLoading: false, + error: null + }); + + const loadAvailableAgents = async () => { + setState(s => ({ ...s, isLoading: true })); + try { + const agents = await agentApiClient.getAvailableAgents(); + setState(s => ({ ...s, availableAgents: agents, isLoading: false })); + } catch (error) { + setState(s => ({ ...s, error: 'Failed to load agents', isLoading: false })); + } + }; + + const loadUserConfigs = async () => { + setState(s => ({ ...s, isLoading: true })); + try { + const configs = await agentApiClient.getUserAgentConfigs(); + setState(s => ({ ...s, userConfigs: configs, isLoading: false })); + } catch (error) { + setState(s => ({ ...s, error: 'Failed to load configs', isLoading: false })); + } + }; + + // ... other methods + + return ( + + {children} + + ); +}; + +export const useAgents = () => { + const context = useContext(AgentContext); + if (!context) { + throw new Error('useAgents must be used within AgentProvider'); + } + return context; +}; +``` + +## Accessibility + +### Keyboard Navigation + +- All agent cards and buttons are focusable +- Drag-and-drop has keyboard alternatives (up/down arrow keys) +- Modal focus trapping implemented +- Screen reader announcements for status changes + +### ARIA Labels + +```tsx +// Example: Artifact card +
+ +
+ +// Example: Pipeline status +
+ ... +
+``` + +## Responsive Design + +### Breakpoints + +| Breakpoint | Width | Layout Changes | +|------------|-------|----------------| +| Mobile | < 640px | Single column, stacked artifacts | +| Tablet | 640-1024px | 2-column grid, collapsible panels | +| Desktop | > 1024px | 3-column grid, full sidebar | + +### Mobile Considerations + +- Artifact preview uses full-screen modal on mobile +- Drag-and-drop replaced with move up/down buttons on touch +- Pipeline status collapses to minimal indicator +- Agent config modal is full-screen on mobile + +## Performance + +### Lazy Loading + +- Agent marketplace loads agents in pages of 20 +- Artifact preview images loaded on-demand +- Audit log uses virtual scrolling for large lists + +### Caching + +- Available agents cached for 5 minutes +- User configs cached with SWR for real-time updates +- Artifact data not cached (too large) + +### Bundle Optimization + +- Agent components code-split by route +- react-beautiful-dnd loaded only when drag-drop needed +- Large icons tree-shaken + +## Related Documents + +- [MCP Integration Architecture](./mcp-integration-architecture.md) +- [SearchService Agent Hooks Architecture](./search-agent-hooks-architecture.md) +- [RAG Modulo MCP Server Architecture](./rag-modulo-mcp-server-architecture.md) diff --git a/docs/architecture/backend-architecture-diagram.md b/docs/architecture/backend-architecture-diagram.md new file mode 100644 index 00000000..0cd94bb3 --- /dev/null +++ b/docs/architecture/backend-architecture-diagram.md @@ -0,0 +1,517 @@ +# RAG Modulo Backend Architecture + +This document provides a comprehensive architecture diagram and description of the RAG Modulo +backend system. + +## Architecture Overview + +The RAG Modulo backend is a FastAPI-based application that implements a Retrieval-Augmented +Generation (RAG) system with a modular, stage-based pipeline architecture. The system supports +multiple LLM providers, vector databases, and document processing strategies. + +## Component Architecture Diagram + +```mermaid +graph TB + subgraph "Client Layer" + WEB[Web Frontend] + CLI[CLI Client] + API_CLIENT[API Clients] + end + + subgraph "API Gateway Layer" + FASTAPI[FastAPI Application
main.py] + + subgraph "Middleware Stack" + CORS[LoggingCORSMiddleware] + SESSION[SessionMiddleware] + AUTH[AuthenticationMiddleware
SPIFFE/OIDC Support] + end + end + + subgraph "Router Layer" + AUTH_R[Auth Router] + SEARCH_R[Search Router] + COLLECTION_R[Collection Router] + CHAT_R[Chat Router] + CONV_R[Conversation Router] + PODCAST_R[Podcast Router] + VOICE_R[Voice Router] + AGENT_R[Agent Router] + USER_R[User Router] + TEAM_R[Team Router] + DASH_R[Dashboard Router] + HEALTH_R[Health Router] + WS_R[WebSocket Router] + end + + subgraph "Service Layer" + SEARCH_SVC[SearchService] + CONV_SVC[ConversationService] + MSG_ORCH[MessageProcessingOrchestrator] + COLLECTION_SVC[CollectionService] + FILE_SVC[FileManagementService] + PODCAST_SVC[PodcastService] + VOICE_SVC[VoiceService] + AGENT_SVC[AgentService] + USER_SVC[UserService] + TEAM_SVC[TeamService] + DASH_SVC[DashboardService] + PIPELINE_SVC[PipelineService] + COT_SVC[ChainOfThoughtService] + ANSWER_SYNTH[AnswerSynthesizer] + CITATION_SVC[CitationAttributionService] + end + + subgraph "Pipeline Architecture" + PIPELINE_EXEC[PipelineExecutor] + + subgraph "Pipeline Stages" + STAGE1[PipelineResolutionStage] + STAGE2[QueryEnhancementStage] + STAGE3[RetrievalStage] + STAGE4[RerankingStage] + STAGE5[ReasoningStage] + STAGE6[GenerationStage] + end + + SEARCH_CTX[SearchContext] + end + + subgraph "Data Ingestion Pipeline" + DOC_STORE[DocumentStore] + DOC_PROC[DocumentProcessor] + + subgraph "Document Processors" + PDF_PROC[PdfProcessor] + DOCLING_PROC[DoclingProcessor] + WORD_PROC[WordProcessor] + EXCEL_PROC[ExcelProcessor] + TXT_PROC[TxtProcessor] + end + + CHUNKING[Chunking Strategies
Sentence/Semantic/Hierarchical] + end + + subgraph "Retrieval Layer" + RETRIEVER[Retriever] + RERANKER[Reranker] + QUERY_REWRITER[QueryRewriter] + end + + subgraph "Generation Layer" + LLM_FACTORY[LLMProviderFactory] + + subgraph "LLM Providers" + WATSONX[WatsonX Provider] + OPENAI[OpenAI Provider] + ANTHROPIC[Anthropic Provider] + end + + AUDIO_FACTORY[AudioFactory] + + subgraph "Audio Providers" + ELEVENLABS[ElevenLabs Audio] + OPENAI_AUDIO[OpenAI Audio] + OLLAMA_AUDIO[Ollama Audio] + end + end + + subgraph "Repository Layer" + USER_REPO[UserRepository] + COLLECTION_REPO[CollectionRepository] + FILE_REPO[FileRepository] + CONV_REPO[ConversationRepository] + AGENT_REPO[AgentRepository] + PODCAST_REPO[PodcastRepository] + VOICE_REPO[VoiceRepository] + TEAM_REPO[TeamRepository] + PIPELINE_REPO[PipelineRepository] + LLM_REPO[LLMProviderRepository] + end + + subgraph "Data Persistence" + POSTGRES[(PostgreSQL
Metadata & Config)] + VECTOR_DB[(Vector Database)] + + subgraph "Vector DB Implementations" + MILVUS[Milvus] + PINECONE[Pinecone] + WEAVIATE[Weaviate] + ELASTICSEARCH[Elasticsearch] + CHROMA[Chroma] + end + end + + subgraph "External Services" + SPIRE[SPIRE Server
SPIFFE Identity] + OIDC[OIDC Provider
IBM AppID] + MINIO[MinIO
Object Storage] + end + + subgraph "Core Infrastructure" + CONFIG[Settings/Config] + LOGGING[Logging Utils] + IDENTITY[Identity Service] + EXCEPTIONS[Custom Exceptions] + end + + %% Client to API Gateway + WEB --> FASTAPI + CLI --> FASTAPI + API_CLIENT --> FASTAPI + + %% Middleware Flow + FASTAPI --> CORS + CORS --> SESSION + SESSION --> AUTH + + %% Router Registration + AUTH --> AUTH_R + AUTH --> SEARCH_R + AUTH --> COLLECTION_R + AUTH --> CHAT_R + AUTH --> CONV_R + AUTH --> PODCAST_R + AUTH --> VOICE_R + AUTH --> AGENT_R + AUTH --> USER_R + AUTH --> TEAM_R + AUTH --> DASH_R + AUTH --> HEALTH_R + AUTH --> WS_R + + %% Router to Service + SEARCH_R --> SEARCH_SVC + CHAT_R --> CONV_SVC + CONV_R --> CONV_SVC + CONV_SVC --> MSG_ORCH + MSG_ORCH --> SEARCH_SVC + COLLECTION_R --> COLLECTION_SVC + COLLECTION_SVC --> FILE_SVC + PODCAST_R --> PODCAST_SVC + VOICE_R --> VOICE_SVC + AGENT_R --> AGENT_SVC + USER_R --> USER_SVC + TEAM_R --> TEAM_SVC + DASH_R --> DASH_SVC + + %% Search Service to Pipeline + SEARCH_SVC --> PIPELINE_EXEC + PIPELINE_EXEC --> STAGE1 + STAGE1 --> STAGE2 + STAGE2 --> STAGE3 + STAGE3 --> STAGE4 + STAGE4 --> STAGE5 + STAGE5 --> STAGE6 + PIPELINE_EXEC --> SEARCH_CTX + + %% Pipeline Stages to Services + STAGE1 --> PIPELINE_SVC + STAGE2 --> PIPELINE_SVC + STAGE3 --> PIPELINE_SVC + STAGE4 --> PIPELINE_SVC + STAGE5 --> COT_SVC + STAGE6 --> ANSWER_SYNTH + + %% Pipeline Service to Retrieval + PIPELINE_SVC --> RETRIEVER + PIPELINE_SVC --> RERANKER + PIPELINE_SVC --> QUERY_REWRITER + + %% Retrieval to Vector DB + RETRIEVER --> VECTOR_DB + VECTOR_DB --> MILVUS + VECTOR_DB --> PINECONE + VECTOR_DB --> WEAVIATE + VECTOR_DB --> ELASTICSEARCH + VECTOR_DB --> CHROMA + + %% Generation Layer + ANSWER_SYNTH --> LLM_FACTORY + LLM_FACTORY --> WATSONX + LLM_FACTORY --> OPENAI + LLM_FACTORY --> ANTHROPIC + PODCAST_SVC --> LLM_FACTORY + VOICE_SVC --> AUDIO_FACTORY + AUDIO_FACTORY --> ELEVENLABS + AUDIO_FACTORY --> OPENAI_AUDIO + AUDIO_FACTORY --> OLLAMA_AUDIO + + %% Data Ingestion + FILE_SVC --> DOC_STORE + DOC_STORE --> DOC_PROC + DOC_PROC --> PDF_PROC + DOC_PROC --> DOCLING_PROC + DOC_PROC --> WORD_PROC + DOC_PROC --> EXCEL_PROC + DOC_PROC --> TXT_PROC + DOC_PROC --> CHUNKING + DOC_STORE --> VECTOR_DB + + %% Service to Repository + USER_SVC --> USER_REPO + COLLECTION_SVC --> COLLECTION_REPO + FILE_SVC --> FILE_REPO + CONV_SVC --> CONV_REPO + AGENT_SVC --> AGENT_REPO + PODCAST_SVC --> PODCAST_REPO + VOICE_SVC --> VOICE_REPO + TEAM_SVC --> TEAM_REPO + PIPELINE_SVC --> PIPELINE_REPO + PIPELINE_SVC --> LLM_REPO + + %% Repository to Database + USER_REPO --> POSTGRES + COLLECTION_REPO --> POSTGRES + FILE_REPO --> POSTGRES + CONV_REPO --> POSTGRES + AGENT_REPO --> POSTGRES + PODCAST_REPO --> POSTGRES + VOICE_REPO --> POSTGRES + TEAM_REPO --> POSTGRES + PIPELINE_REPO --> POSTGRES + LLM_REPO --> POSTGRES + + %% Authentication + AUTH --> SPIRE + AUTH --> OIDC + AGENT_SVC --> SPIRE + + %% Storage + FILE_SVC --> MINIO + PODCAST_SVC --> MINIO + VOICE_SVC --> MINIO + + %% Core Infrastructure + FASTAPI --> CONFIG + FASTAPI --> LOGGING + AUTH --> IDENTITY + SEARCH_SVC --> EXCEPTIONS + CONV_SVC --> EXCEPTIONS + + style FASTAPI fill:#4A90E2 + style PIPELINE_EXEC fill:#50C878 + style VECTOR_DB fill:#FF6B6B + style POSTGRES fill:#4ECDC4 + style LLM_FACTORY fill:#FFD93D + style DOC_STORE fill:#9B59B6 +``` + +## Architecture Layers + +### 1. API Gateway Layer + +**FastAPI Application (`main.py`)** + +- Entry point for all HTTP requests +- Manages application lifespan (startup/shutdown) +- Configures middleware stack +- Registers all routers +- Initializes database and LLM providers + +**Middleware Stack:** + +- **LoggingCORSMiddleware**: Handles CORS and request/response logging +- **SessionMiddleware**: Manages user sessions +- **AuthenticationMiddleware**: Validates user authentication via SPIFFE/OIDC + +### 2. Router Layer + +The router layer provides RESTful API endpoints organized by domain: + +- **Auth Router**: User authentication and authorization +- **Search Router**: RAG search operations +- **Collection Router**: Document collection management +- **Chat Router**: Conversational interface +- **Conversation Router**: Conversation history and context +- **Podcast Router**: AI-powered podcast generation +- **Voice Router**: Voice synthesis operations +- **Agent Router**: SPIFFE-based agent management +- **User Router**: User profile management +- **Team Router**: Team collaboration features +- **Dashboard Router**: Analytics and metrics +- **Health Router**: System health checks +- **WebSocket Router**: Real-time updates + +### 3. Service Layer + +Business logic services that orchestrate operations: + +- **SearchService**: Coordinates RAG search operations +- **ConversationService**: Manages conversation sessions and messages +- **MessageProcessingOrchestrator**: Orchestrates message processing with context +- **CollectionService**: Manages document collections +- **FileManagementService**: Handles file uploads and processing +- **PodcastService**: Generates podcasts from documents +- **VoiceService**: Manages voice synthesis +- **AgentService**: Manages AI agents with SPIFFE identity +- **PipelineService**: Executes RAG pipeline stages +- **ChainOfThoughtService**: Implements reasoning capabilities +- **AnswerSynthesizer**: Generates final answers from retrieved context +- **CitationAttributionService**: Attributes sources to answers + +### 4. Pipeline Architecture + +**Stage-Based RAG Pipeline:** + +The system uses a modular, stage-based pipeline architecture: + +1. **PipelineResolutionStage**: Resolves user's default pipeline configuration +2. **QueryEnhancementStage**: Rewrites/enhances queries for better retrieval +3. **RetrievalStage**: Retrieves documents from vector database +4. **RerankingStage**: Reranks results for relevance +5. **ReasoningStage**: Applies Chain of Thought reasoning if needed +6. **GenerationStage**: Generates final answer using LLM + +**PipelineExecutor**: Orchestrates stage execution with context passing + +**SearchContext**: Maintains state across pipeline stages + +### 5. Data Ingestion Pipeline + +**DocumentStore**: Manages document ingestion workflow + +**DocumentProcessor**: Routes documents to appropriate processors: + +- **PdfProcessor**: PDF extraction with OCR support +- **DoclingProcessor**: Advanced document processing (tables, images) +- **WordProcessor**: Microsoft Word documents +- **ExcelProcessor**: Spreadsheet processing +- **TxtProcessor**: Plain text files + +**Chunking Strategies**: + +- Sentence-based (recommended) +- Semantic chunking +- Hierarchical chunking +- Token-based chunking +- Fixed-size chunking + +### 6. Retrieval Layer + +- **Retriever**: Performs vector similarity search +- **Reranker**: Reranks results for better relevance +- **QueryRewriter**: Enhances queries for better retrieval + +### 7. Generation Layer + +**LLMProviderFactory**: Factory for creating LLM provider instances + +- **WatsonX Provider**: IBM WatsonX integration +- **OpenAI Provider**: OpenAI API integration +- **Anthropic Provider**: Claude API integration + +**AudioFactory**: Factory for audio generation + +- **ElevenLabs Audio**: Voice synthesis +- **OpenAI Audio**: TTS integration +- **Ollama Audio**: Local TTS + +### 8. Repository Layer + +Data access layer using Repository pattern: + +- **UserRepository**: User data operations +- **CollectionRepository**: Collection management +- **FileRepository**: File metadata operations +- **ConversationRepository**: Conversation data (unified, optimized) +- **AgentRepository**: Agent management +- **PodcastRepository**: Podcast metadata +- **VoiceRepository**: Voice configuration +- **TeamRepository**: Team operations +- **PipelineRepository**: Pipeline configuration +- **LLMProviderRepository**: LLM provider settings + +### 9. Data Persistence + +**PostgreSQL**: + +- Stores metadata (users, collections, files, conversations) +- Manages configuration (pipelines, LLM settings) +- Handles relationships and transactions + +**Vector Database** (Abstracted via VectorStore interface): + +- **Milvus**: Primary vector database +- **Pinecone**: Cloud vector database +- **Weaviate**: GraphQL vector database +- **Elasticsearch**: Search engine with vector support +- **Chroma**: Lightweight vector database + +### 10. External Services + +- **SPIRE Server**: SPIFFE workload identity for agent authentication +- **OIDC Provider**: IBM AppID for user authentication +- **MinIO**: Object storage for files and audio + +### 11. Core Infrastructure + +- **Settings/Config**: Centralized configuration management +- **Logging Utils**: Structured logging with context +- **Identity Service**: User/agent identity management +- **Custom Exceptions**: Domain-specific error handling + +## Data Flow + +### Search Request Flow + +1. **Client** → FastAPI → **Search Router** +2. **Search Router** → **SearchService** +3. **SearchService** → **PipelineExecutor** +4. **PipelineExecutor** executes stages: + - Pipeline Resolution → Query Enhancement → Retrieval → Reranking → Reasoning → Generation +5. **RetrievalStage** → **Retriever** → **Vector Database** +6. **GenerationStage** → **AnswerSynthesizer** → **LLM Provider** +7. Response flows back through layers to client + +### Document Ingestion Flow + +1. **Client** → **Collection Router** → **CollectionService** → **FileManagementService** +2. **FileManagementService** → **DocumentStore** +3. **DocumentStore** → **DocumentProcessor** → **Specific Processor** (PDF/Word/etc.) +4. **Processor** → **Chunking Strategy** → **Document Chunks** +5. **DocumentStore** → **Vector Database** (embeddings + metadata) +6. **FileManagementService** → **FileRepository** → **PostgreSQL** (metadata) + +### Conversation Flow + +1. **Client** → **Conversation Router** → **ConversationService** +2. **ConversationService** → **MessageProcessingOrchestrator** +3. **MessageProcessingOrchestrator** → **SearchService** (with context) +4. **SearchService** executes pipeline with conversation context +5. Response saved via **ConversationRepository** → **PostgreSQL** + +## Key Design Patterns + +1. **Repository Pattern**: Data access abstraction +2. **Factory Pattern**: LLM and Vector DB instantiation +3. **Strategy Pattern**: Chunking strategies, LLM providers +4. **Pipeline Pattern**: Stage-based RAG processing +5. **Dependency Injection**: Services and repositories +6. **Middleware Pattern**: Cross-cutting concerns (auth, logging, CORS) + +## Scalability Considerations + +- **Stateless Services**: Services are stateless for horizontal scaling +- **Database Connection Pooling**: SQLAlchemy connection management +- **Async/Await**: Asynchronous operations for I/O-bound tasks +- **Vector DB Abstraction**: Easy switching between vector databases +- **LLM Provider Abstraction**: Support for multiple LLM providers +- **Modular Pipeline**: Stages can be optimized independently + +## Security Features + +- **SPIFFE/SPIRE**: Machine-to-machine authentication for agents +- **OIDC**: User authentication via IBM AppID +- **Session Management**: Secure session handling +- **CORS**: Controlled cross-origin access +- **Input Validation**: Pydantic schemas for request validation +- **Error Handling**: Secure error messages without information leakage + +## Configuration Management + +- **Environment Variables**: `.env` file support +- **Pydantic Settings**: Type-safe configuration +- **Runtime Configuration**: Dynamic configuration updates +- **User-Specific Settings**: Per-user LLM and pipeline configuration diff --git a/docs/architecture/mcp-integration-architecture.md b/docs/architecture/mcp-integration-architecture.md new file mode 100644 index 00000000..e4be1eb8 --- /dev/null +++ b/docs/architecture/mcp-integration-architecture.md @@ -0,0 +1,200 @@ +# MCP Integration Architecture + +**Date**: November 2025 +**Status**: Architecture Design +**Version**: 1.0 +**Related PRs**: #671, #684, #695 + +## Overview + +This document describes the architecture for integrating Model Context Protocol (MCP) into +RAG Modulo. The integration enables bidirectional MCP communication: + +1. **RAG Modulo as MCP Client**: Consuming external MCP tools (PowerPoint generation, charts, translation) +2. **RAG Modulo as MCP Server**: Exposing RAG capabilities to external AI tools (Claude Desktop, workflow systems) + +## PR Comparison and Decision + +### PR #671 vs #684 Analysis + +| Aspect | PR #671 | PR #684 | Decision | +|--------|---------|---------|----------| +| **File Organization** | `mcp/` dedicated directory | `services/` directory | #684 naming preferred | +| **Lines Changed** | 2,502 | 2,846 | Similar | +| **Test Functions** | 63 | 50 | #671 has more tests | +| **Mergeable** | Yes | Unknown | #671 confirmed | + +### Decision: Adopt #684 File Naming with #671 Test Coverage + +We will use #684's file naming convention (`mcp_gateway_client.py`, `search_result_enricher.py`) +placed in the `services/` directory, as this follows the existing service-based architecture +pattern. However, we should incorporate the additional test coverage from #671. + +## High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ MCP Context Forge │ +│ (Central Gateway/Registry) │ +│ │ +│ Registered Servers: │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ Internal (RAG Modulo consumes): │ │ +│ │ • ppt-generator-mcp (PowerPoint) │ │ +│ │ • chart-generator-mcp (Visualizations) │ │ +│ │ • translator-mcp (Language translation) │ │ +│ │ • web-enricher-mcp (Real-time data) │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +│ ┌──────────────────────────────────────────────────────────────────────┐ │ +│ │ External (RAG Modulo exposes): │ │ +│ │ • rag-modulo-mcp (search, ingest, podcast, collections) │ │ +│ └──────────────────────────────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────────────┘ + ▲ ▲ + │ │ + │ RAG Modulo calls External tools call + │ external MCP tools RAG Modulo MCP server + │ │ + ▼ ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RAG Modulo Backend │ +│ │ +│ ┌───────────────────────────────────────────────────────────────────────┐ │ +│ │ MCP Client │ MCP Server │ │ +│ │ services/mcp_gateway_client.py │ mcp_server/server.py │ │ +│ │ services/search_result_enricher.py│ mcp_server/tools.py │ │ +│ │ │ │ │ +│ │ Consumes: ppt-generator, │ Exposes: rag_search, │ │ +│ │ chart-generator, etc. │ rag_ingest, rag_podcast │ │ +│ └───────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Core Services ││ +│ │ SearchService, DocumentService, PodcastService, CollectionService ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +└─────────────────────────────────────────────────────────────────────────────┘ + ▲ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RAG Modulo Frontend │ +│ │ +│ • Triggers searches → gets artifacts back │ +│ • Configures which agents run per collection │ +│ • Downloads/previews generated artifacts │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## File Structure + +``` +backend/rag_solution/ +├── services/ +│ ├── mcp_gateway_client.py # Client to call external MCP tools +│ ├── search_result_enricher.py # Post-search enrichment agent +│ └── ... (existing services) +│ +├── mcp_server/ # RAG Modulo as MCP server +│ ├── __init__.py +│ ├── server.py # MCP server setup, transport handling +│ ├── tools.py # Tool definitions (rag_search, rag_ingest, etc.) +│ ├── resources.py # MCP resources (collection metadata, etc.) +│ └── auth.py # SPIFFE/Bearer token validation +│ +├── schemas/ +│ ├── mcp_schema.py # Schemas for MCP requests/responses +│ └── ... +│ +└── router/ + ├── mcp_router.py # REST endpoints for MCP management + └── ... + +tests/unit/ +├── services/ +│ ├── test_mcp_gateway_client.py +│ └── test_search_result_enricher.py +├── router/ +│ └── test_mcp_router.py +└── mcp_server/ + ├── test_server.py + └── test_tools.py +``` + +## MCP Client Components + +### MCPGatewayClient + +Thin wrapper with circuit breaker pattern for calling external MCP tools via Context Forge. + +**Key Features**: + +- Circuit breaker: 5 failure threshold, 60s recovery timeout +- Health checks: 5-second timeout +- Default timeout: 30 seconds on all calls +- Graceful degradation on failures + +### SearchResultEnricher + +Content Enricher pattern implementation for augmenting search results with external data. + +**Capabilities**: + +- Real-time data enrichment (stock prices, weather, etc.) +- External knowledge base queries +- Document metadata enhancement + +## MCP Server Components + +RAG Modulo exposes its capabilities as MCP tools for external consumption. + +### Exposed Tools + +| Tool | Description | Parameters | +|------|-------------|------------| +| `rag_search` | Search documents in a collection | `collection_id`, `query`, `top_k`, `use_cot` | +| `rag_ingest` | Add documents to a collection | `collection_id`, `documents` | +| `rag_list_collections` | List accessible collections | `include_stats` | +| `rag_generate_podcast` | Generate podcast from collection | `collection_id`, `topic`, `duration_minutes` | +| `rag_smart_questions` | Get suggested follow-up questions | `collection_id`, `context` | + +### Exposed Resources + +| Resource URI | Description | +|--------------|-------------| +| `rag://collection/{id}/documents` | Document metadata for a collection | +| `rag://collection/{id}/stats` | Collection statistics | +| `rag://search/{query}/results` | Cached search results | + +### Authentication + +- **SPIFFE JWT-SVID** (PR #695): For agent-to-agent calls +- **Bearer token**: For user-delegated access from Claude Desktop, etc. + +## Integration with Context Forge + +IBM's MCP Context Forge serves as the central gateway providing: + +- Protocol translation (stdio, SSE, WebSocket, HTTP) +- Tool registry and discovery +- Bearer token auth with JWT + RBAC +- Rate limiting with Redis backing +- OpenTelemetry integration +- Admin UI for management +- Redis-backed federation for distributed deployment + +## Security Considerations + +1. **Network Isolation**: Context Forge runs in same VPC as RAG Modulo backend +2. **JWT Authentication**: Secure token-based auth for all API calls +3. **RBAC**: Team-based access control for sensitive tools +4. **Secrets Management**: MCP server credentials managed by Context Forge +5. **Audit Logging**: All tool invocations logged via OpenTelemetry +6. **Capability Validation**: SPIFFE capabilities mapped to MCP tool permissions + +## Related Documents + +- [SearchService Agent Hooks Architecture](./search-agent-hooks-architecture.md) +- [RAG Modulo MCP Server Architecture](./rag-modulo-mcp-server-architecture.md) +- [SPIRE Integration Architecture](./spire-integration-architecture.md) +- [Agent MCP Architecture Design](../design/agent-mcp-architecture.md) +- [MCP Context Forge Integration Design](../design/mcp-context-forge-integration.md) diff --git a/docs/architecture/rag-modulo-mcp-server-architecture.md b/docs/architecture/rag-modulo-mcp-server-architecture.md new file mode 100644 index 00000000..4bbff346 --- /dev/null +++ b/docs/architecture/rag-modulo-mcp-server-architecture.md @@ -0,0 +1,689 @@ +# RAG Modulo MCP Server Architecture + +**Date**: November 2025 +**Status**: Architecture Design +**Version**: 1.0 +**Related Documents**: [MCP Integration Architecture](./mcp-integration-architecture.md), [SPIRE Integration Architecture](./spire-integration-architecture.md) + +## Overview + +This document describes the architecture for exposing RAG Modulo's capabilities as an MCP +(Model Context Protocol) server. This enables external AI tools like Claude Desktop, workflow +automation systems, and other MCP clients to interact with RAG Modulo's search, ingestion, +and content generation features. + +## Use Cases + +### External MCP Clients + +| Client | Use Case | +|--------|----------| +| **Claude Desktop** | User asks Claude to search their company documents | +| **n8n/Zapier** | Workflow automation: ingest email attachments, search on triggers | +| **Custom AI Bots** | Slack/Teams bots that query document collections | +| **Agent Frameworks** | LangChain, AutoGPT agents using RAG Modulo as knowledge source | + +### Example Scenarios + +**Scenario 1: Claude Desktop** + +``` +User in Claude Desktop: +"Search my company's financial documents for Q4 projections" + +Claude Desktop: +1. Discovers rag_search tool via MCP +2. Calls rag_search(collection_id="...", query="Q4 projections") +3. Receives answer + sources from RAG Modulo +4. Presents to user with citations +``` + +**Scenario 2: Workflow Automation** + +``` +Trigger: New email received with attachment +Action 1: Extract attachment, upload to temp storage +Action 2: Call rag_ingest to add document to collection +Action 3: Call rag_search to check for related content +Action 4: Send Slack notification with summary +``` + +**Scenario 3: Multi-Agent System** + +``` +Orchestrator Agent: +1. Calls rag_list_collections to find relevant collection +2. Calls rag_search to gather information +3. Calls rag_generate_podcast to create audio summary +4. Combines results for final user response +``` + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ EXTERNAL MCP CLIENTS │ +│ │ +│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ Claude Desktop │ │ Custom AI Bot │ │ Workflow Tool │ │ +│ │ │ │ │ │ (n8n, Zapier) │ │ +│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ +│ │ │ │ │ +│ └────────────────────┼────────────────────┘ │ +│ │ │ +│ ▼ MCP Protocol (stdio/SSE/HTTP) │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RAG Modulo Native MCP Server │ +│ backend/rag_solution/mcp_server/ │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Tools ││ +│ │ ││ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ +│ │ │ rag_search │ │ rag_ingest │ │ rag_list_colls │ ││ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ││ +│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││ +│ │ │ rag_gen_podcast │ │ rag_smart_q's │ │ rag_get_doc │ ││ +│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Resources ││ +│ │ ││ +│ │ rag://collection/{id}/documents - Document metadata ││ +│ │ rag://collection/{id}/stats - Collection statistics ││ +│ │ rag://search/{query}/results - Cached search results ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Authentication ││ +│ │ ││ +│ │ • SPIFFE JWT-SVID (agent-to-agent) ◀── PR #695 ││ +│ │ • Bearer token (user-delegated access) ││ +│ │ • API key (service accounts) ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ RAG Modulo Backend Services │ +│ (SearchService, DocumentService, PodcastService, etc.) │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Exposed Tools + +### rag_search + +Search documents in a RAG Modulo collection. + +```yaml +name: rag_search +description: Search documents in a RAG Modulo collection using semantic search with optional Chain-of-Thought reasoning + +parameters: + collection_id: + type: string + description: UUID of the collection to search + required: true + query: + type: string + description: Natural language search query + required: true + top_k: + type: integer + description: Number of results to return + required: false + default: 5 + use_cot: + type: boolean + description: Enable Chain-of-Thought reasoning for complex queries + required: false + default: false + +returns: + answer: + type: string + description: Synthesized answer from retrieved documents + sources: + type: array + description: List of source documents with titles and relevance scores + cot_steps: + type: array + description: Reasoning steps (if use_cot=true) +``` + +### rag_ingest + +Add documents to a collection. + +```yaml +name: rag_ingest +description: Add one or more documents to a RAG Modulo collection + +parameters: + collection_id: + type: string + description: UUID of the target collection + required: true + documents: + type: array + description: List of documents to ingest + required: true + items: + type: object + properties: + title: + type: string + description: Document title + content: + type: string + description: Document content (text) + metadata: + type: object + description: Optional metadata (author, date, tags, etc.) + +returns: + ingested_count: + type: integer + description: Number of documents successfully ingested + document_ids: + type: array + description: UUIDs of ingested documents + errors: + type: array + description: Any errors encountered during ingestion +``` + +### rag_list_collections + +List collections accessible to the authenticated agent/user. + +```yaml +name: rag_list_collections +description: List document collections the authenticated agent can access + +parameters: + include_stats: + type: boolean + description: Include document counts and last updated timestamps + required: false + default: false + +returns: + collections: + type: array + items: + type: object + properties: + id: + type: string + description: Collection UUID + name: + type: string + description: Collection name + description: + type: string + description: Collection description + document_count: + type: integer + description: Number of documents (if include_stats=true) + last_updated: + type: string + description: ISO timestamp of last update (if include_stats=true) +``` + +### rag_generate_podcast + +Generate an audio podcast from collection content. + +```yaml +name: rag_generate_podcast +description: Generate an AI-powered audio podcast from collection documents + +parameters: + collection_id: + type: string + description: UUID of the source collection + required: true + topic: + type: string + description: Focus topic for the podcast (optional - uses all content if not specified) + required: false + duration_minutes: + type: integer + description: Target podcast duration in minutes + required: false + default: 5 + minimum: 1 + maximum: 30 + +returns: + audio_url: + type: string + description: URL to download the generated audio file + transcript: + type: string + description: Full text transcript of the podcast + duration: + type: number + description: Actual duration in seconds +``` + +### rag_smart_questions + +Get AI-suggested follow-up questions based on context. + +```yaml +name: rag_smart_questions +description: Generate intelligent follow-up questions based on collection content and conversation context + +parameters: + collection_id: + type: string + description: UUID of the collection + required: true + context: + type: string + description: Current conversation context or recent query + required: false + count: + type: integer + description: Number of questions to generate + required: false + default: 3 + minimum: 1 + maximum: 10 + +returns: + questions: + type: array + items: + type: string + description: List of suggested follow-up questions +``` + +### rag_get_document + +Retrieve a specific document's content and metadata. + +```yaml +name: rag_get_document +description: Retrieve full content and metadata for a specific document + +parameters: + document_id: + type: string + description: UUID of the document + required: true + +returns: + id: + type: string + description: Document UUID + title: + type: string + description: Document title + content: + type: string + description: Full document text content + metadata: + type: object + description: Document metadata + collection_id: + type: string + description: Parent collection UUID + created_at: + type: string + description: ISO timestamp of document creation +``` + +## Exposed Resources + +MCP resources provide read-only access to RAG Modulo data. + +### rag://collection/{id}/documents + +Document metadata for a collection. + +```json +{ + "uri": "rag://collection/abc123/documents", + "name": "Collection Documents", + "description": "List of documents in the collection", + "mimeType": "application/json" +} +``` + +**Content**: + +```json +{ + "collection_id": "abc123", + "documents": [ + { + "id": "doc1", + "title": "Q4 Financial Report", + "created_at": "2024-10-15T10:00:00Z", + "word_count": 5000, + "metadata": { "author": "Finance Team" } + } + ], + "total_count": 150 +} +``` + +### rag://collection/{id}/stats + +Collection statistics. + +```json +{ + "uri": "rag://collection/abc123/stats", + "name": "Collection Statistics", + "description": "Usage statistics for the collection", + "mimeType": "application/json" +} +``` + +**Content**: + +```json +{ + "collection_id": "abc123", + "document_count": 150, + "total_words": 500000, + "total_chunks": 2500, + "last_ingestion": "2024-11-20T14:30:00Z", + "query_count_30d": 1250, + "avg_query_time_ms": 450 +} +``` + +### rag://search/{query}/results + +Cached search results (for efficiency when same query is repeated). + +```json +{ + "uri": "rag://search/q4+projections/results", + "name": "Cached Search Results", + "description": "Cached results for recent search query", + "mimeType": "application/json" +} +``` + +## Authentication + +### SPIFFE JWT-SVID (Agent-to-Agent) + +For AI agents authenticated via SPIFFE/SPIRE (PR #695): + +``` +Authorization: Bearer + +JWT Claims: +{ + "sub": "spiffe://rag-modulo.example.com/agent/search-enricher/abc123", + "aud": ["rag-modulo-mcp"], + "exp": 1732800000 +} +``` + +The MCP server validates the JWT-SVID and extracts: + +- Agent SPIFFE ID +- Capabilities (from agents table) +- Owner user ID (for collection access) + +### Bearer Token (User-Delegated) + +For external clients acting on behalf of users: + +``` +Authorization: Bearer +``` + +User tokens are issued via existing OAuth flow and include: + +- User ID +- Scopes (read, write, admin) +- Expiration + +### API Key (Service Accounts) + +For service-to-service integration: + +``` +X-API-Key: +``` + +API keys are associated with: + +- Service account user +- Allowed collections +- Rate limits + +## Authorization + +### Capability-Based Access Control + +SPIFFE agents have capabilities that map to MCP tool permissions: + +| Capability | Allowed Tools | +|------------|---------------| +| `search:read` | `rag_search`, `rag_list_collections`, `rag_get_document` | +| `search:write` | `rag_ingest` | +| `llm:invoke` | `rag_generate_podcast`, `rag_smart_questions` | +| `collection:read` | All read operations on owned collections | +| `collection:write` | Create/modify collections | + +### Collection Access + +Agents can only access collections where: + +1. They are owned by the agent's owner_user_id +2. They are shared with the agent's team_id +3. The collection is marked as public + +## File Structure + +``` +backend/rag_solution/mcp_server/ +├── __init__.py +├── server.py # MCP server setup, transport handling +├── tools.py # Tool definitions and implementations +├── resources.py # Resource definitions +├── auth.py # SPIFFE/Bearer/API key validation +└── schemas.py # Request/response schemas + +tests/unit/mcp_server/ +├── __init__.py +├── test_server.py +├── test_tools.py +├── test_resources.py +└── test_auth.py +``` + +## Server Implementation + +### Transport Options + +| Transport | Use Case | Port | +|-----------|----------|------| +| **stdio** | Claude Desktop, local CLI | N/A | +| **SSE** | Web clients, real-time updates | 8010 | +| **HTTP** | REST-like integration | 8010 | + +### Example Server Setup + +```python +# backend/rag_solution/mcp_server/server.py + +from mcp import Server, Tool, Resource +from mcp.transports import StdioTransport, SSETransport + +from .tools import ( + rag_search, + rag_ingest, + rag_list_collections, + rag_generate_podcast, + rag_smart_questions, + rag_get_document, +) +from .resources import collection_documents, collection_stats, search_results +from .auth import validate_auth + +server = Server("rag-modulo") + +# Register tools +server.register_tool(rag_search) +server.register_tool(rag_ingest) +server.register_tool(rag_list_collections) +server.register_tool(rag_generate_podcast) +server.register_tool(rag_smart_questions) +server.register_tool(rag_get_document) + +# Register resources +server.register_resource(collection_documents) +server.register_resource(collection_stats) +server.register_resource(search_results) + +# Auth middleware +server.use(validate_auth) + +# Run server +if __name__ == "__main__": + transport = StdioTransport() # Or SSETransport(port=8010) + server.run(transport) +``` + +## Integration with Context Forge + +Register RAG Modulo MCP server with Context Forge for federation: + +```bash +curl -X POST http://localhost:8001/api/v1/servers \ + -H "Authorization: Bearer $CONTEXT_FORGE_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "name": "RAG Modulo", + "type": "mcp", + "endpoint": "http://rag-modulo-backend:8010", + "config": { + "protocol": "sse", + "auth_required": true + } + }' +``` + +## SPIFFE + MCP Coexistence + +### Identity Flow + +``` +┌─────────────────────────────────────────────────────────────────────────────┐ +│ Identity Architecture │ +│ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ Human Users ││ +│ │ - Authenticate via OIDC/OAuth (existing auth) ││ +│ │ - JWT with user claims ││ +│ │ - Access collections they own ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ ▲ │ +│ │ Creates & owns │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ AI Agents (PR #695 SPIFFE) ││ +│ │ ││ +│ │ SPIFFE ID: spiffe://rag-modulo.example.com/agent/{type}/{id} ││ +│ │ ││ +│ │ Agent Record: ││ +│ │ - id: UUID ││ +│ │ - spiffe_id: Full SPIFFE ID ││ +│ │ - agent_type: search-enricher, cot-reasoning, etc. ││ +│ │ - owner_user_id: UUID (who created/owns this agent) ││ +│ │ - capabilities: [search:read, llm:invoke, etc.] ││ +│ │ - status: active, suspended, revoked, pending ││ +│ │ ││ +│ │ Auth Flow: ││ +│ │ 1. Agent presents JWT-SVID from SPIRE ││ +│ │ 2. MCP Server validates via SpiffeAuthenticator ││ +│ │ 3. Creates AgentPrincipal with capabilities ││ +│ │ 4. CBAC (Capability-Based Access Control) ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +│ ▲ │ +│ │ Invokes via MCP │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐│ +│ │ MCP Tools ││ +│ │ ││ +│ │ MCP Server handles: ││ +│ │ - Protocol translation (stdio, SSE, HTTP) ││ +│ │ - Tool discovery and invocation ││ +│ │ - Rate limiting and circuit breakers ││ +│ │ ││ +│ │ Identity Propagation: ││ +│ │ - Agent's SPIFFE ID passed in X-Spiffe-Id header ││ +│ │ - MCP tools validate agent capabilities ││ +│ │ - Audit log includes agent identity ││ +│ └─────────────────────────────────────────────────────────────────────────┘│ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +### Example Flow + +```python +# Agent executes MCP tool with SPIFFE identity + +# 1. Agent authenticates with SPIFFE JWT-SVID +agent_principal = await spiffe_authenticator.validate_svid(jwt_token) +# AgentPrincipal(spiffe_id="spiffe://rag-modulo/agent/search-enricher/abc123", +# capabilities=["search:read", "llm:invoke"]) + +# 2. Agent calls MCP tool +response = await mcp_server.invoke_tool( + tool_name="rag_search", + arguments={"collection_id": "...", "query": "Q4 projections"}, + auth_context=agent_principal +) + +# 3. MCP tool validates capability +if "search:read" not in agent_principal.capabilities: + raise PermissionDenied("Agent lacks search:read capability") + +# 4. Audit log captures full chain +logger.info( + "MCP tool invoked", + agent_spiffe_id=agent_principal.spiffe_id, + tool="rag_search", + owner_user_id=str(agent.owner_user_id) +) +``` + +## Security Considerations + +1. **Authentication Required**: All MCP endpoints require authentication +2. **Capability Validation**: Every tool invocation checks agent capabilities +3. **Collection Scoping**: Agents can only access authorized collections +4. **Rate Limiting**: Per-agent rate limits prevent abuse +5. **Audit Logging**: All tool invocations logged with identity context +6. **Token Expiration**: JWT-SVIDs have short lifetimes (15 minutes) +7. **Revocation**: Agents can be suspended/revoked immediately + +## Observability + +- OpenTelemetry spans for all MCP operations +- Metrics: tool invocation counts, latency, error rates +- Structured logging with agent identity context +- Integration with Context Forge admin UI + +## Related Documents + +- [MCP Integration Architecture](./mcp-integration-architecture.md) +- [SearchService Agent Hooks Architecture](./search-agent-hooks-architecture.md) +- [SPIRE Integration Architecture](./spire-integration-architecture.md) diff --git a/docs/architecture/search-agent-hooks-architecture.md b/docs/architecture/search-agent-hooks-architecture.md new file mode 100644 index 00000000..8ee862a9 --- /dev/null +++ b/docs/architecture/search-agent-hooks-architecture.md @@ -0,0 +1,416 @@ +# SearchService Agent Hooks Architecture + +**Date**: November 2025 +**Status**: Architecture Design +**Version**: 1.0 +**Related Documents**: [MCP Integration Architecture](./mcp-integration-architecture.md) + +## Overview + +This document describes the three-stage agent execution hook system integrated into +SearchService. Agents can be injected at strategic points in the search pipeline to enhance, +transform, or augment the search process. + +## Pipeline Flow + +``` +User Query: "What are the revenue projections for Q4?" + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STAGE 1: PRE-SEARCH AGENTS │ +│ │ +│ Purpose: Enhance/transform the query BEFORE vector search │ +│ │ +│ Example agents: │ +│ ┌────────────────────────────────────────────────────────────────────────┐ │ +│ │ • Query Expander: "revenue projections Q4" → │ │ +│ │ "revenue projections Q4 2024 2025 forecast financial outlook" │ │ +│ │ │ │ +│ │ • Language Detector/Translator: Detect non-English, translate to EN │ │ +│ │ │ │ +│ │ • Acronym Resolver: "Q4" → "fourth quarter, Q4, Oct-Dec" │ │ +│ │ │ │ +│ │ • Intent Classifier: Tag as "financial_analysis" for routing │ │ +│ └────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Input: { query: "What are the revenue projections for Q4?" } │ +│ Output: { query: "revenue projections Q4 2024 forecast...", metadata: {} } │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ CORE RAG SEARCH (existing logic - unchanged) │ +│ │ +│ • Vector embedding of (enhanced) query │ +│ • Milvus similarity search │ +│ • Document retrieval │ +│ • Optional: Chain-of-Thought reasoning │ +│ │ +│ Output: 10 ranked documents with scores │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STAGE 2: POST-SEARCH AGENTS │ +│ │ +│ Purpose: Process/filter/augment retrieved documents BEFORE answer gen │ +│ │ +│ Example agents: │ +│ ┌────────────────────────────────────────────────────────────────────────┐ │ +│ │ • Re-ranker: Use cross-encoder to re-score documents for relevance │ │ +│ │ │ │ +│ │ • Deduplicator: Remove near-duplicate content across documents │ │ +│ │ │ │ +│ │ • Fact Checker: Validate claims against trusted sources │ │ +│ │ │ │ +│ │ • PII Redactor: Remove sensitive info before showing to user │ │ +│ │ │ │ +│ │ • External Enricher: Add real-time stock prices, weather, etc. │ │ +│ │ (This is what SearchResultEnricher does) │ │ +│ └────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ Input: { documents: [...10 docs...], query: "..." } │ +│ Output: { documents: [...8 docs, reordered, enriched...] } │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ ANSWER GENERATION (existing logic - unchanged) │ +│ │ +│ • LLM synthesizes answer from documents │ +│ • Source attribution │ +│ • CoT reasoning steps (if enabled) │ +│ │ +│ Output: { answer: "Based on the documents...", sources: [...] } │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ STAGE 3: RESPONSE AGENTS │ +│ │ +│ Purpose: Generate artifacts/transformations from the final answer │ +│ │ +│ Example agents: │ +│ ┌────────────────────────────────────────────────────────────────────────┐ │ +│ │ • PowerPoint Generator: Create slides from answer + sources │ │ +│ │ Output: { type: "pptx", data: "base64...", filename: "Q4.pptx" } │ │ +│ │ │ │ +│ │ • PDF Report Generator: Formatted document with citations │ │ +│ │ Output: { type: "pdf", data: "base64...", filename: "report.pdf" } │ │ +│ │ │ │ +│ │ • Chart Generator: Visualize numerical data from answer │ │ +│ │ Output: { type: "png", data: "base64...", filename: "chart.png" } │ │ +│ │ │ │ +│ │ • Audio Summary: Text-to-speech of key findings │ │ +│ │ Output: { type: "mp3", data: "base64...", filename: "summary.mp3" } │ │ +│ │ │ │ +│ │ • Email Draft: Format answer for email sharing │ │ +│ │ Output: { type: "html", data: "...", subject: "Q4 Summary" } │ │ +│ └────────────────────────────────────────────────────────────────────────┘ │ +│ │ +│ These run in PARALLEL since they're independent transformations │ +└─────────────────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────────────┐ +│ FINAL RESPONSE │ +│ │ +│ { │ +│ "answer": "Based on the financial documents, Q4 revenue is...", │ +│ "sources": [ │ +│ { "document_id": "...", "title": "Q4 Forecast", "score": 0.92 } │ +│ ], │ +│ "cot_steps": [...], // If CoT enabled │ +│ "agent_artifacts": [ // NEW - from response agents │ +│ { │ +│ "agent_id": "ppt_generator", │ +│ "type": "pptx", │ +│ "data": "UEsDBBQAAAAIAH...", // base64 │ +│ "filename": "Q4_Revenue_Projections.pptx", │ +│ "metadata": { "slides": 5 } │ +│ }, │ +│ { │ +│ "agent_id": "chart_generator", │ +│ "type": "png", │ +│ "data": "iVBORw0KGgo...", // base64 │ +│ "filename": "revenue_chart.png", │ +│ "metadata": { "width": 800, "height": 600 } │ +│ } │ +│ ] │ +│ } │ +└─────────────────────────────────────────────────────────────────────────────┘ +``` + +## Agent Stages + +### Stage 1: Pre-Search Agents + +**Purpose**: Transform or enhance the query before vector search. + +**Execution**: Sequential by priority (results chain to next agent). + +| Agent Type | Description | Use Case | +|------------|-------------|----------| +| Query Expander | Adds synonyms and related terms | Improve recall | +| Language Detector | Identifies query language | Multi-language support | +| Translator | Translates non-English queries | Internationalization | +| Acronym Resolver | Expands abbreviations | Domain-specific search | +| Intent Classifier | Tags query intent | Routing and filtering | +| Spell Checker | Corrects typos | User experience | + +**Input Schema**: + +```python +class PreSearchInput: + query: str + collection_id: UUID + user_id: UUID + metadata: dict[str, Any] +``` + +**Output Schema**: + +```python +class PreSearchOutput: + query: str # Modified query + metadata: dict[str, Any] # Additional context + skip_search: bool = False # If True, skip core search +``` + +### Stage 2: Post-Search Agents + +**Purpose**: Process, filter, or augment retrieved documents before answer generation. + +**Execution**: Sequential by priority (documents flow through each agent). + +| Agent Type | Description | Use Case | +|------------|-------------|----------| +| Re-ranker | Cross-encoder re-scoring | Improve precision | +| Deduplicator | Remove near-duplicates | Cleaner results | +| Fact Checker | Validate against trusted sources | Accuracy | +| PII Redactor | Remove sensitive information | Compliance | +| External Enricher | Add real-time data | Currency | +| Relevance Filter | Remove low-quality results | Quality | + +**Input Schema**: + +```python +class PostSearchInput: + documents: list[Document] + query: str + collection_id: UUID + user_id: UUID + metadata: dict[str, Any] +``` + +**Output Schema**: + +```python +class PostSearchOutput: + documents: list[Document] # Modified/filtered documents + metadata: dict[str, Any] # Enrichment data +``` + +### Stage 3: Response Agents + +**Purpose**: Generate artifacts or transformations from the final answer. + +**Execution**: Parallel (independent transformations). + +| Agent Type | Description | Output Format | +|------------|-------------|---------------| +| PowerPoint Generator | Create presentation slides | `.pptx` | +| PDF Report Generator | Formatted document with citations | `.pdf` | +| Chart Generator | Visualize numerical data | `.png`, `.svg` | +| Audio Summary | Text-to-speech narration | `.mp3` | +| Email Draft | Format for email sharing | `.html` | +| Executive Summary | Condensed key findings | `.txt` | + +**Input Schema**: + +```python +class ResponseAgentInput: + answer: str + sources: list[Source] + query: str + documents: list[Document] + collection_id: UUID + user_id: UUID + cot_steps: list[CotStep] | None +``` + +**Output Schema**: + +```python +class AgentArtifact: + agent_id: str + type: str # "pptx", "pdf", "png", "mp3", "html" + data: str # base64 encoded + filename: str + metadata: dict[str, Any] +``` + +## Agent Priority and Chaining + +Agents at each stage execute in priority order (lower number = higher priority): + +``` +Pre-search stage (priority order): + 1. Language Detector (priority: 0) → detects "es" (Spanish) + 2. Translator (priority: 10) → uses detection, translates to EN + 3. Query Expander (priority: 20) → expands the translated query + +Each agent receives: + - AgentContext with query, collection_id, user_id + - previous_agent_results: List of results from earlier agents in this stage +``` + +## AgentContext + +Context object passed to all agents: + +```python +@dataclass +class AgentContext: + # Collection context + collection_id: UUID + user_id: UUID + + # Conversation context + conversation_id: UUID | None = None + conversation_history: list[dict[str, str]] | None = None + + # Search context (populated as pipeline progresses) + query: str | None = None + retrieved_documents: list[dict[str, Any]] | None = None + search_metadata: dict[str, Any] | None = None + + # Pipeline context + pipeline_stage: str # 'pre_search', 'post_search', 'response' + + # Agent chaining + previous_agent_results: list[AgentResult] | None = None +``` + +## AgentResult + +Result object returned by all agents: + +```python +@dataclass +class AgentResult: + agent_id: str + success: bool + data: dict[str, Any] + metadata: dict[str, Any] + errors: list[str] | None = None + + # For chaining agents + next_agent_id: str | None = None +``` + +## Collection-Agent Association + +Agents are configured per collection: + +``` +Collection Settings → Agents & Tools +┌─────────────────────────────────────────────────────────────────────────┐ +│ ☑ PowerPoint Generator Stage: Response Priority: 1 │ +│ Creates slides from search results [Configure] │ +├─────────────────────────────────────────────────────────────────────────┤ +│ ☑ Query Expander Stage: Pre-search Priority: 0 │ +│ Adds synonyms and related terms [Configure] │ +├─────────────────────────────────────────────────────────────────────────┤ +│ ☐ External Knowledge Enricher Stage: Post-search Priority: 5 │ +│ Augments with real-time market data [Configure] │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +## Database Schema + +### AgentConfig Table + +```sql +CREATE TABLE agent_configs ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, + agent_id VARCHAR(100) NOT NULL, -- From agent registry + name VARCHAR(255) NOT NULL, + description TEXT, + config JSONB NOT NULL DEFAULT '{}', -- Agent-specific settings + enabled BOOLEAN NOT NULL DEFAULT true, + trigger_stage VARCHAR(50) NOT NULL, -- 'pre_search', 'post_search', 'response' + priority INTEGER NOT NULL DEFAULT 0, + created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP +); + +-- Many-to-many: Collections ↔ AgentConfigs +CREATE TABLE collection_agents ( + collection_id UUID NOT NULL REFERENCES collections(id) ON DELETE CASCADE, + agent_config_id UUID NOT NULL REFERENCES agent_configs(id) ON DELETE CASCADE, + PRIMARY KEY (collection_id, agent_config_id) +); + +-- Indexes +CREATE INDEX idx_agent_configs_user_id ON agent_configs(user_id); +CREATE INDEX idx_agent_configs_trigger_stage ON agent_configs(trigger_stage); +CREATE INDEX idx_agent_configs_enabled ON agent_configs(enabled); +``` + +### Example AgentConfig + +```json +{ + "id": "abc123...", + "user_id": "user456...", + "agent_id": "ppt_generator", + "name": "PowerPoint Generator", + "config": { + "type": "mcp", + "context_forge_tool_id": "generate_powerpoint", + "argument_mapping": { + "title": "query", + "documents": "documents", + "max_slides": "config.max_slides" + }, + "settings": { + "max_slides": 15, + "template": "corporate" + } + }, + "enabled": true, + "trigger_stage": "response", + "priority": 10 +} +``` + +## Error Handling + +- **Agent Timeout**: Each agent has configurable timeout (default 30s) +- **Agent Failure**: Logged, skipped, pipeline continues +- **Circuit Breaker**: Failing agents disabled after threshold +- **Fallback**: Optional fallback agents for critical stages + +## Performance Considerations + +1. **Pre-search agents**: Run sequentially (query transformation order matters) +2. **Post-search agents**: Run sequentially (document filtering order matters) +3. **Response agents**: Run in parallel (independent artifact generation) +4. **Caching**: Agent results cached by (query_hash, agent_id, config_hash) +5. **Timeouts**: Per-agent and per-stage timeouts prevent runaway execution + +## Observability + +- All agent executions logged with structured context +- OpenTelemetry spans for each agent invocation +- Metrics: execution time, success rate, artifact sizes +- Traces flow through Context Forge for end-to-end visibility + +## Related Documents + +- [MCP Integration Architecture](./mcp-integration-architecture.md) +- [RAG Modulo MCP Server Architecture](./rag-modulo-mcp-server-architecture.md) +- [Agent MCP Architecture Design](../design/agent-mcp-architecture.md) diff --git a/docs/architecture/system-architecture.md b/docs/architecture/system-architecture.md new file mode 100644 index 00000000..ddff44b6 --- /dev/null +++ b/docs/architecture/system-architecture.md @@ -0,0 +1,425 @@ +# RAG Modulo System Architecture + +## Repository Overview + +**RAG Modulo** is a production-ready Retrieval-Augmented Generation (RAG) platform that enables +intelligent document processing, semantic search, and AI-powered question answering. The system +combines enterprise-grade document processing with advanced AI reasoning capabilities to provide +accurate, context-aware answers from large document collections. + +### Key Capabilities + +1. **Document Processing**: Supports multiple formats (PDF, DOCX, XLSX, TXT) with advanced + processing via IBM Docling for tables, images, and complex layouts +2. **Intelligent Search**: Vector similarity search with hybrid strategies, reranking, and source attribution +3. **Chain of Thought Reasoning**: Automatic question decomposition with step-by-step reasoning for complex queries +4. **Multi-LLM Support**: Seamless integration with WatsonX, OpenAI, and Anthropic +5. **Multi-Vector Database**: Pluggable support for Milvus, Elasticsearch, Pinecone, Weaviate, and ChromaDB +6. **Conversational Interface**: Multi-turn conversations with context preservation +7. **Podcast Generation**: AI-powered podcast creation from document collections +8. **Voice Synthesis**: Text-to-speech capabilities with multiple providers + +## System Architecture Diagram + +```mermaid +graph TB + subgraph "Client Layer" + WEB[React Web Frontend
TypeScript + Tailwind CSS
Carbon Design System] + CLI[CLI Client
rag-cli commands] + API_CLIENT[External API Clients
REST/WebSocket] + end + + subgraph "API Gateway Layer" + FASTAPI[FastAPI Application
main.py
Port 8000] + + subgraph "Middleware Stack" + CORS[LoggingCORSMiddleware
CORS + Request Logging] + SESSION[SessionMiddleware
Session Management] + AUTH_MW[AuthenticationMiddleware
SPIFFE/OIDC Validation] + end + end + + subgraph "Router Layer - REST Endpoints" + AUTH_R["/auth
Authentication"] + SEARCH_R["/api/search
RAG Search"] + COLLECTION_R["/api/collections
Document Management"] + CHAT_R["/api/chat
Conversational Interface"] + CONV_R["/api/conversations
Session Management"] + PODCAST_R["/api/podcast
Podcast Generation"] + VOICE_R["/api/voice
Voice Synthesis"] + AGENT_R["/api/agents
SPIFFE Agent Management"] + USER_R["/api/users
User Management"] + TEAM_R["/api/teams
Team Collaboration"] + DASH_R["/api/dashboard
Analytics"] + HEALTH_R["/api/health
Health Checks"] + WS_R["/ws
WebSocket"] + end + + subgraph "Service Layer - Business Logic" + SEARCH_SVC[SearchService
RAG Orchestration] + CONV_SVC[ConversationService
Multi-turn Context] + MSG_ORCH[MessageProcessingOrchestrator
Message Flow] + COLLECTION_SVC[CollectionService
Collection Management] + FILE_SVC[FileManagementService
File Operations] + PODCAST_SVC[PodcastService
Content Generation] + VOICE_SVC[VoiceService
Audio Synthesis] + AGENT_SVC[AgentService
SPIFFE Identity] + USER_SVC[UserService
User Operations] + TEAM_SVC[TeamService
Team Operations] + DASH_SVC[DashboardService
Analytics] + PIPELINE_SVC[PipelineService
Pipeline Execution] + COT_SVC[ChainOfThoughtService
Reasoning Engine] + ANSWER_SYNTH[AnswerSynthesizer
Answer Generation] + CITATION_SVC[CitationAttributionService
Source Attribution] + end + + subgraph "RAG Pipeline Architecture - 6 Stages" + PIPELINE_EXEC[PipelineExecutor
Orchestrates Stages] + SEARCH_CTX[SearchContext
State Management] + + STAGE1[Stage 1: Pipeline Resolution
Resolve User Pipeline Config] + STAGE2[Stage 2: Query Enhancement
Rewrite/Enhance Query] + STAGE3[Stage 3: Retrieval
Vector Similarity Search] + STAGE4[Stage 4: Reranking
Relevance Scoring] + STAGE5[Stage 5: Reasoning
Chain of Thought] + STAGE6[Stage 6: Generation
LLM Answer Synthesis] + end + + subgraph "Document Ingestion Pipeline" + DOC_STORE[DocumentStore
Ingestion Orchestration] + DOC_PROC[DocumentProcessor
Format Router] + + PDF_PROC[PdfProcessor
PyMuPDF + OCR] + DOCLING_PROC[DoclingProcessor
IBM Docling
Tables/Images] + WORD_PROC[WordProcessor
DOCX Support] + EXCEL_PROC[ExcelProcessor
XLSX Support] + TXT_PROC[TxtProcessor
Plain Text] + + CHUNKING[Chunking Strategies
Sentence/Semantic/Hierarchical] + EMBEDDING[Embedding Generation
Vector Creation] + end + + subgraph "Retrieval Layer" + RETRIEVER[Retriever
Vector Search] + RERANKER[Reranker
Relevance Scoring] + QUERY_REWRITER[QueryRewriter
Query Optimization] + end + + subgraph "Generation Layer" + LLM_FACTORY[LLMProviderFactory
Provider Management] + + WATSONX[WatsonX Provider
IBM WatsonX AI] + OPENAI[OpenAI Provider
GPT Models] + ANTHROPIC[Anthropic Provider
Claude Models] + + AUDIO_FACTORY[AudioFactory
Audio Provider Management] + ELEVENLABS[ElevenLabs Audio
Voice Synthesis] + OPENAI_AUDIO[OpenAI Audio
TTS] + OLLAMA_AUDIO[Ollama Audio
Local TTS] + end + + subgraph "Repository Layer - Data Access" + USER_REPO[UserRepository] + COLLECTION_REPO[CollectionRepository] + FILE_REPO[FileRepository] + CONV_REPO[ConversationRepository] + AGENT_REPO[AgentRepository] + PODCAST_REPO[PodcastRepository] + VOICE_REPO[VoiceRepository] + TEAM_REPO[TeamRepository] + PIPELINE_REPO[PipelineRepository] + LLM_REPO[LLMProviderRepository] + end + + subgraph "Data Persistence Layer" + POSTGRES[(PostgreSQL
Port 5432
Metadata & Config)] + + VECTOR_DB[(Vector Database
Abstracted Interface)] + MILVUS[Milvus
Primary Vector DB
Port 19530] + PINECONE[Pinecone
Cloud Vector DB] + WEAVIATE[Weaviate
GraphQL Vector DB] + ELASTICSEARCH[Elasticsearch
Search Engine] + CHROMA[ChromaDB
Lightweight Vector DB] + end + + subgraph "Object Storage" + MINIO[(MinIO
Port 9000
Object Storage
Files & Audio)] + end + + subgraph "External Services" + SPIRE[SPIRE Server
SPIFFE Workload Identity
Agent Authentication] + OIDC[OIDC Provider
IBM AppID
User Authentication] + MLFLOW[MLFlow
Port 5001
Model Tracking] + end + + subgraph "Core Infrastructure" + CONFIG[Settings/Config
Pydantic Settings
Environment Variables] + LOGGING[Logging Utils
Structured Logging
Context Tracking] + IDENTITY[Identity Service
User/Agent Identity] + EXCEPTIONS[Custom Exceptions
Domain Errors] + end + + %% Client to API Gateway + WEB -->|HTTP/WebSocket| FASTAPI + CLI -->|HTTP| FASTAPI + API_CLIENT -->|REST API| FASTAPI + + %% Middleware Flow + FASTAPI --> CORS + CORS --> SESSION + SESSION --> AUTH_MW + + %% Router Registration + AUTH_MW --> AUTH_R + AUTH_MW --> SEARCH_R + AUTH_MW --> COLLECTION_R + AUTH_MW --> CHAT_R + AUTH_MW --> CONV_R + AUTH_MW --> PODCAST_R + AUTH_MW --> VOICE_R + AUTH_MW --> AGENT_R + AUTH_MW --> USER_R + AUTH_MW --> TEAM_R + AUTH_MW --> DASH_R + AUTH_MW --> HEALTH_R + AUTH_MW --> WS_R + + %% Router to Service + SEARCH_R --> SEARCH_SVC + CHAT_R --> CONV_SVC + CONV_R --> CONV_SVC + CONV_SVC --> MSG_ORCH + MSG_ORCH --> SEARCH_SVC + COLLECTION_R --> COLLECTION_SVC + COLLECTION_SVC --> FILE_SVC + PODCAST_R --> PODCAST_SVC + VOICE_R --> VOICE_SVC + AGENT_R --> AGENT_SVC + USER_R --> USER_SVC + TEAM_R --> TEAM_SVC + DASH_R --> DASH_SVC + + %% Search Service to Pipeline + SEARCH_SVC --> PIPELINE_EXEC + PIPELINE_EXEC --> STAGE1 + STAGE1 --> STAGE2 + STAGE2 --> STAGE3 + STAGE3 --> STAGE4 + STAGE4 --> STAGE5 + STAGE5 --> STAGE6 + PIPELINE_EXEC --> SEARCH_CTX + + %% Pipeline Stages to Services + STAGE1 --> PIPELINE_SVC + STAGE2 --> QUERY_REWRITER + STAGE3 --> RETRIEVER + STAGE4 --> RERANKER + STAGE5 --> COT_SVC + STAGE6 --> ANSWER_SYNTH + + %% Retrieval to Vector DB + RETRIEVER --> VECTOR_DB + VECTOR_DB --> MILVUS + VECTOR_DB --> PINECONE + VECTOR_DB --> WEAVIATE + VECTOR_DB --> ELASTICSEARCH + VECTOR_DB --> CHROMA + + %% Generation Layer + ANSWER_SYNTH --> LLM_FACTORY + LLM_FACTORY --> WATSONX + LLM_FACTORY --> OPENAI + LLM_FACTORY --> ANTHROPIC + PODCAST_SVC --> LLM_FACTORY + VOICE_SVC --> AUDIO_FACTORY + AUDIO_FACTORY --> ELEVENLABS + AUDIO_FACTORY --> OPENAI_AUDIO + AUDIO_FACTORY --> OLLAMA_AUDIO + + %% Data Ingestion Flow + FILE_SVC --> DOC_STORE + DOC_STORE --> DOC_PROC + DOC_PROC --> PDF_PROC + DOC_PROC --> DOCLING_PROC + DOC_PROC --> WORD_PROC + DOC_PROC --> EXCEL_PROC + DOC_PROC --> TXT_PROC + DOC_PROC --> CHUNKING + CHUNKING --> EMBEDDING + DOC_STORE --> VECTOR_DB + DOC_STORE --> MINIO + + %% Service to Repository + USER_SVC --> USER_REPO + COLLECTION_SVC --> COLLECTION_REPO + FILE_SVC --> FILE_REPO + CONV_SVC --> CONV_REPO + AGENT_SVC --> AGENT_REPO + PODCAST_SVC --> PODCAST_REPO + VOICE_SVC --> VOICE_REPO + TEAM_SVC --> TEAM_REPO + PIPELINE_SVC --> PIPELINE_REPO + PIPELINE_SVC --> LLM_REPO + + %% Repository to Database + USER_REPO --> POSTGRES + COLLECTION_REPO --> POSTGRES + FILE_REPO --> POSTGRES + CONV_REPO --> POSTGRES + AGENT_REPO --> POSTGRES + PODCAST_REPO --> POSTGRES + VOICE_REPO --> POSTGRES + TEAM_REPO --> POSTGRES + PIPELINE_REPO --> POSTGRES + LLM_REPO --> POSTGRES + + %% Authentication + AUTH_MW --> SPIRE + AUTH_MW --> OIDC + AGENT_SVC --> SPIRE + + %% Storage + FILE_SVC --> MINIO + PODCAST_SVC --> MINIO + VOICE_SVC --> MINIO + + %% Core Infrastructure + FASTAPI --> CONFIG + FASTAPI --> LOGGING + AUTH_MW --> IDENTITY + SEARCH_SVC --> EXCEPTIONS + CONV_SVC --> EXCEPTIONS + + %% Styling + style FASTAPI fill:#4A90E2,stroke:#2E5C8A,stroke-width:3px + style PIPELINE_EXEC fill:#50C878,stroke:#2D8659,stroke-width:2px + style VECTOR_DB fill:#FF6B6B,stroke:#C92A2A,stroke-width:2px + style POSTGRES fill:#4ECDC4,stroke:#2D7D7D,stroke-width:2px + style LLM_FACTORY fill:#FFD93D,stroke:#CC9900,stroke-width:2px + style DOC_STORE fill:#9B59B6,stroke:#6C3483,stroke-width:2px + style WEB fill:#61DAFB,stroke:#20232A,stroke-width:2px + style MINIO fill:#FFA500,stroke:#CC7700,stroke-width:2px +``` + +## Architecture Layers Explained + +### 1. Client Layer + +- **React Web Frontend**: Modern TypeScript/React application with Carbon Design System +- **CLI Client**: Command-line interface for automation and scripting +- **API Clients**: External integrations via REST/WebSocket + +### 2. API Gateway Layer + +- **FastAPI Application**: Main entry point handling HTTP requests +- **Middleware Stack**: CORS, session management, and authentication + +### 3. Router Layer + +RESTful endpoints organized by domain (auth, search, collections, chat, etc.) + +### 4. Service Layer + +Business logic services that orchestrate operations across repositories and external services + +### 5. RAG Pipeline (6 Stages) + +1. **Pipeline Resolution**: Determines user's default pipeline configuration +2. **Query Enhancement**: Rewrites/enhances queries for better retrieval +3. **Retrieval**: Performs vector similarity search +4. **Reranking**: Scores and reranks results for relevance +5. **Reasoning**: Applies Chain of Thought for complex questions +6. **Generation**: Synthesizes final answer using LLM + +### 6. Document Ingestion Pipeline + +- Processes multiple document formats +- Applies chunking strategies +- Generates embeddings +- Stores in vector database and object storage + +### 7. Data Persistence + +- **PostgreSQL**: Metadata, configuration, user data +- **Vector Databases**: Pluggable support for multiple vector DBs +- **MinIO**: Object storage for files and generated content + +### 8. External Services + +- **SPIRE**: SPIFFE workload identity for agent authentication +- **OIDC**: User authentication via IBM AppID +- **MLFlow**: Model tracking and experimentation + +## Key Data Flows + +### Search Request Flow + +1. Client → FastAPI → Search Router +2. Search Router → SearchService +3. SearchService → PipelineExecutor +4. Pipeline executes 6 stages sequentially +5. RetrievalStage queries Vector Database +6. GenerationStage calls LLM Provider +7. Response flows back through layers + +### Document Ingestion Flow + +1. Client → Collection Router → CollectionService → FileManagementService +2. FileManagementService → DocumentStore +3. DocumentStore → DocumentProcessor → Format-specific Processor +4. Processor → Chunking Strategy → Embeddings +5. Embeddings → Vector Database +6. Original files → MinIO Object Storage + +### Conversation Flow + +1. Client → Conversation Router → ConversationService +2. ConversationService → MessageProcessingOrchestrator +3. Orchestrator → SearchService (with conversation context) +4. SearchService executes pipeline with context +5. Response saved via ConversationRepository → PostgreSQL + +## Design Patterns + +- **Repository Pattern**: Data access abstraction +- **Factory Pattern**: LLM and Vector DB instantiation +- **Strategy Pattern**: Chunking strategies, LLM providers +- **Pipeline Pattern**: Stage-based RAG processing +- **Dependency Injection**: Services and repositories +- **Middleware Pattern**: Cross-cutting concerns + +## Technology Stack + +### Backend + +- **Framework**: FastAPI (Python 3.12+) +- **Database**: PostgreSQL (SQLAlchemy ORM) +- **Vector DB**: Milvus (primary), Pinecone, Weaviate, Elasticsearch, ChromaDB +- **Object Storage**: MinIO +- **Document Processing**: IBM Docling, PyMuPDF, python-docx, openpyxl + +### Frontend + +- **Framework**: React 18 with TypeScript +- **Styling**: Tailwind CSS + Carbon Design System +- **HTTP Client**: Axios +- **State Management**: React Context API + +### Infrastructure + +- **Containerization**: Docker + Docker Compose +- **CI/CD**: GitHub Actions +- **Container Registry**: GitHub Container Registry (GHCR) +- **Authentication**: SPIFFE/SPIRE (agents), OIDC (users) + +### LLM Providers + +- IBM WatsonX +- OpenAI (GPT models) +- Anthropic (Claude) + +### Audio Providers + +- ElevenLabs +- OpenAI TTS +- Ollama (local)