docs: Add agentic RAG architecture documentation #700

manavgup · 2025-11-27T17:57:22Z

Summary

Add comprehensive architecture documentation for the Agentic RAG Platform. These documents
establish the design foundation for transforming RAG Modulo into a fully agentic system.

Documents Added

Document	Lines	Description
`agentic-ui-architecture.md`	~1,470	React component hierarchy, state management, API integration
`backend-architecture-diagram.md`	~510	Backend architecture with Mermaid diagrams
`mcp-integration-architecture.md`	~200	MCP client/server strategy, PR comparison
`rag-modulo-mcp-server-architecture.md`	~450	RAG as MCP server (tools, resources, auth)
`search-agent-hooks-architecture.md`	~410	3-stage agent pipeline architecture
`system-architecture.md`	~410	Complete system architecture overview

Total: ~3,450 lines of documentation

Architecture Highlights

3-Stage Agent Pipeline (search-agent-hooks-architecture.md)

User Query → Pre-Search Agents → RAG Search → Post-Search Agents → Generation → Response Agents → Final Response

Pre-search: Query expansion, translation, intent classification
Post-search: Re-ranking, deduplication, enrichment
Response: Artifact generation (PowerPoint, PDF, charts) in parallel

MCP Integration (mcp-integration-architecture.md)

RAG Modulo as MCP Client: Consume external tools via Context Forge
RAG Modulo as MCP Server: Expose rag_search, rag_ingest, etc. to Claude Desktop

Agentic UI (agentic-ui-architecture.md)

Agent configuration per collection
Artifact display in search results
Real-time pipeline status
Agent marketplace and dashboard

Implementation Roadmap

These documents guide:

PR feat: SPIFFE/SPIRE Integration Architecture for Agent Identity #695 (SPIFFE/SPIRE agent identity)
PR feat(mcp): Add MCP Gateway integration for tool invocation and enrichment #671 (MCP Gateway client)
Issue feat: Implement SearchService 3-stage agent execution hooks #697 (Agent execution hooks)
Issue feat: Expose RAG Modulo as MCP Server #698 (MCP Server)
Issue feat: Agentic UI components for agent configuration and artifacts #699 (Agentic UI)

Test Plan

All markdown files lint-clean (markdownlint passed)
Cross-references between documents are valid
Mermaid diagrams render correctly in GitHub
Team review for architectural decisions

Closes #696

🤖 Generated with Claude Code

Add environment variables to support SPIFFE workload identity integration for AI agents and services. This enables cryptographic machine identity with configurable migration phases: - SPIFFE_ENABLED: Toggle SPIFFE integration - SPIFFE_AUTH_MODE: Migration phases (disabled→optional→preferred→required) - SPIFFE_ENDPOINT_SOCKET: SPIRE Agent Workload API socket - SPIFFE_TRUST_DOMAIN: Trust domain for identity hierarchy - SPIFFE_LEGACY_JWT_WARNING: Track legacy auth usage during migration - SPIFFE_SVID_TTL_SECONDS: Certificate lifetime configuration - SPIFFE_JWT_AUDIENCES: Allowed JWT-SVID audiences Related to: MCP Context Forge integration (PR #684)

This architecture document outlines how to integrate SPIRE (SPIFFE Runtime Environment) into RAG Modulo to provide cryptographic workload identities for AI agents. This enables zero-trust agent authentication and secure agent-to-agent (A2A) communication. Key architectural decisions: - JWT-SVIDs for stateless verification (vs X.509 for mTLS) - Trust domain: spiffe://rag-modulo.example.com - Integration with IBM MCP Context Forge (PR #684) - Capability-based access control for agents - 5-phase implementation plan Agent types defined: - search-enricher: MCP tool invocation - cot-reasoning: Chain of Thought orchestration - question-decomposer: Query decomposition - source-attribution: Document source tracking - entity-extraction: Named entity recognition - answer-synthesis: Answer generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit implements the SPIFFE/SPIRE integration for AI agent authentication as designed in docs/architecture/spire-integration-architecture.md. Key changes: - Add py-spiffe dependency for SPIFFE JWT-SVID support - Create core SPIFFE authentication module (spiffe_auth.py) with: - SPIFFEConfig for environment-based configuration - AgentPrincipal dataclass for authenticated agent identity - SPIFFEAuthenticator for JWT-SVID validation - AgentType and AgentCapability enums - Helper functions for SPIFFE ID parsing and building - Create Agent data model with SQLAlchemy: - Agent model with SPIFFE ID, type, capabilities, status - Relationships to User (owner) and Team - Status management (active, suspended, revoked) - Add Agent repository, service, and router layers: - Full CRUD operations for agents - Agent registration with SPIFFE ID generation - Status and capability management - JWT-SVID validation endpoint - Extend AuthenticationMiddleware to detect and validate SPIFFE JWT-SVIDs - Add SPIRE deployment configuration templates: - server.conf, agent.conf for SPIRE configuration - docker-compose.spire.yml for local development - README.md with deployment instructions - Add comprehensive unit tests for all SPIFFE components Reference: PR #695 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Critical fixes: - Add database migration for agents table (migrations/add_agents_table.sql) - Fix signature verification security: failed validation now always rejects (prevents fallback bypass attack) - Fix timezone handling: use UTC consistently for JWT timestamps Improvements: - Align env vars with .env.example (SPIFFE_JWT_AUDIENCES, SPIFFE_SVID_TTL_SECONDS) - Add capability enforcement decorator (require_capabilities) - Add OpenAPI tags metadata for agents endpoint - Update and expand unit tests (47 tests passing) Addresses review comments from PR #695. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…served word SQLAlchemy's Declarative API reserves the 'metadata' attribute name. Renamed the field to 'agent_metadata' in the model while keeping the database column name as 'metadata' via explicit column name mapping. This also updates the schema to use validation_alias for proper model_validate() from ORM objects. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

The test_validate_jwt_svid_valid test was failing because AgentPrincipal requires a trust_domain field which was not being provided. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Critical fixes: - Fix timezone-naive datetime to use UTC throughout (agent.py, agent_repository.py) - Change default agent status from ACTIVE to PENDING for approval workflow - Add RuntimeError when SPIFFE enabled but py-spiffe library missing - Restrict trust domain to configured value only (security fix) High priority security fixes: - Add capability validation per agent type (ALLOWED_CAPABILITIES_BY_TYPE) - Add authentication requirement to SPIFFE validation endpoint - Reject user-specified trust domains that don't match server config Code quality improvements: - Add OpenAPI tags metadata for agent router documentation - Fix require_capabilities decorator type hints (ParamSpec, TypeVar) - Add composite database indexes (owner+status, type+status, team+status) - Update migration script with new composite indexes Test updates: - Update test_register_agent_with_custom_trust_domain to verify rejection - Fix test_authenticator_creates_principal_with_fallback to mock spiffe module 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add comprehensive architecture documentation for the Agentic RAG Platform: - agentic-ui-architecture.md: React component hierarchy, state management, and API integration for agent features - backend-architecture-diagram.md: Overall backend architecture with Mermaid diagrams showing service layers and data flow - mcp-integration-architecture.md: MCP client/server integration strategy, PR comparison (#671 vs #684), and Context Forge integration - rag-modulo-mcp-server-architecture.md: Exposing RAG capabilities as MCP server with tools (rag_search, rag_ingest, etc.) and resources - search-agent-hooks-architecture.md: 3-stage agent pipeline (pre-search, post-search, response) with database schema and execution flow - system-architecture.md: Complete system architecture overview with technology stack and data flows These documents guide implementation of: - PR #695 (SPIFFE/SPIRE agent identity) - PR #671 (MCP Gateway client) - Issue #697 (Agent execution hooks) - Issue #698 (MCP Server) - Issue #699 (Agentic UI) Closes #696 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-11-27T17:57:33Z

🚀 Development Environment Options

This repository supports Dev Containers for a consistent development environment.

Option 1: GitHub Codespaces (Recommended)

Create a cloud-based development environment:

Click the green Code button above
Select the Codespaces tab
Click Create codespace on docs/agentic-architecture-696
Wait 2-3 minutes for environment setup
Start coding with all tools pre-configured!

Option 2: VS Code Dev Containers (Local)

Use Dev Containers on your local machine:

Install Docker Desktop
Install VS Code
Install the Dev Containers extension
Clone this PR branch locally
Open in VS Code and click "Reopen in Container" when prompted

Option 3: Traditional Local Setup

Set up the development environment manually:

# Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout docs/agentic-architecture-696

# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validate

Available Commands

Once in your development environment:

make help           # Show all available commands
make dev-validate   # Validate environment setup
make test-atomic    # Run atomic tests
make test-unit      # Run unit tests
make lint          # Run linting

Services Available

When running make dev-up:

This automated message helps reviewers quickly set up the development environment.

github-actions · 2025-11-27T18:00:47Z

Pull Request Review: Agentic RAG Architecture Documentation & SPIFFE/SPIRE Integration

Overview

This PR introduces comprehensive architecture documentation and a complete SPIFFE/SPIRE identity integration for agent workload authentication. The implementation is ambitious, adding ~9,673 lines across 29 files including core infrastructure, database models, services, and extensive documentation.

✅ Strengths

1. Comprehensive Architecture Documentation

Excellent documentation suite covering all aspects of the agentic RAG system
Well-structured Mermaid diagrams for visual architecture representation
Clear cross-references between documents
Strong alignment with CLAUDE.md guidelines

2. Security-First Design

Proper SPIFFE/SPIRE integration following CNCF standards
Zero-trust architecture with cryptographic workload identities
Capability-based access control system
Multi-layer defense with JWT-SVID validation
Security-conscious trust domain restrictions in agent_service.py:84-92

3. Production-Ready Code Quality

Comprehensive test coverage (772 lines for spiffe_auth, 470 lines for agent_service)
Proper error handling with custom exceptions
Type hints throughout the codebase
Database migrations with rollback support
Well-documented code with docstrings

4. Database Design Excellence

Proper composite indexes for query optimization (agent.py:70-74)
GIN index for JSONB capabilities for efficient containment queries
Automatic updated_at trigger
Proper foreign key constraints with CASCADE/SET NULL

5. Clean Architecture

Clear separation: Models → Repository → Service → Router
Dependency injection pattern
Repository pattern for data access
Pydantic schemas for validation

🔍 Issues & Recommendations

CRITICAL Issues

1. Missing Dependency: py-spiffe

Location: backend/core/spiffe_auth.py:342, pyproject.toml

Issue: The code imports py-spiffe library but it's not added to pyproject.toml:

from spiffe import JwtSource, WorkloadApiClient  # type: ignore[import-not-found]

Impact: Runtime ImportError when SPIFFE is enabled

Fix Required:

poetry add py-spiffe
poetry lock

Verification: Check that poetry.lock was regenerated after adding dependency

2. Migration Script Missing psycopg2 Dependency Check

Location: migrations/apply_agents_migration.py:14

Issue: Script imports psycopg2 without try/except or dependency declaration

import psycopg2
from dotenv import load_dotenv

Impact: Migration will fail if psycopg2 not installed

Recommendation: Add error handling:

try:
    import psycopg2
except ImportError as e:
    print("ERROR: psycopg2 is required. Install with: pip install psycopg2-binary")
    sys.exit(1)

3. Security: Signature Validation Fallback

Location: backend/core/spiffe_auth.py:477-487

Issue: The fallback mode accepts tokens without signature validation in development:

if self.config.fallback_to_jwt:
    logger.warning(
        "SPIRE unavailable, accepting token without signature validation. "
        "This is ONLY safe in development environments."
    )

Concern: While the security note is present, this could be dangerous if accidentally enabled in production.

Recommendation: Add environment check:

if self.config.fallback_to_jwt:
    if os.getenv("ENVIRONMENT", "development") == "production":
        logger.error("SPIRE unavailable in production. Fallback disabled for security.")
        return None
    logger.warning("...")

HIGH Priority Issues

4. Authentication Middleware: Agent vs User Confusion

Location: backend/core/authentication_middleware.py:242-244

Issue: Agent authentication sets request.state.user for backward compatibility:

agent_data = {...}
request.state.user = agent_data  # For backward compatibility

Concern: This violates the principle of least surprise. Downstream code checking request.state.user might not expect an agent object.

Recommendation:

Add request.state.principal for both users and agents
Keep request.state.user only for actual users
Update downstream code to check principal first, then user

5. Race Condition in Agent Registration

Location: backend/rag_solution/services/agent_service.py:80-81

Issue: Agent instance ID uses UUID prefix without checking uniqueness:

agent_instance_id = str(uuid.uuid4())[:8]

Concern: While collision probability is low, there's no database uniqueness check.

Recommendation: Either:

Use full UUID for agent_instance_id
Add a uniqueness retry loop (max 3 attempts)
Add unique constraint in SPIFFE ID generation

6. Missing Index on last_seen_at

Location: migrations/add_agents_table.sql

Issue: last_seen_at is used for activity tracking but lacks an index.

Use Case: Queries like "find inactive agents" will do full table scans.

Recommendation: Add:

CREATE INDEX IF NOT EXISTS idx_agents_last_seen_at ON agents(last_seen_at DESC) 
WHERE last_seen_at IS NOT NULL;

7. SPIRE Docker Compose Not Integrated

Location: deployment/spire/docker-compose.spire.yml

Issue: This is a standalone compose file, not integrated with main docker-compose.yml.

Impact: Developers won't know how to run SPIRE with local dev

Recommendation:

Add docker-compose.override.yml example
Document integration in CLAUDE.md
Add make local-dev-spire command

MEDIUM Priority Issues

8. Inconsistent Enum Definitions

Location: Multiple files

Issue: AgentType and AgentCapability are defined in 3 places:

backend/core/spiffe_auth.py (core)
backend/rag_solution/schemas/agent_schema.py (API layer)
Tests import from both

Concern: Potential inconsistency and maintenance burden

Recommendation:

Keep single source of truth in core/spiffe_auth.py
Import from there in schemas
OR create core/agent_types.py for shared types

9. Missing Error Handling for JWT Decode

Location: backend/core/spiffe_auth.py:433-437

Issue: JWT decode in is_spiffe_jwt_svid() has bare except Exception:

try:
    unverified = jwt.decode(token, options={"verify_signature": False})
    ...
except Exception:
    return False

Concern: Masks all errors, including programming errors

Recommendation: Be specific:

except (jwt.DecodeError, jwt.InvalidTokenError):
    return False
except Exception as e:
    logger.error(f"Unexpected error checking SPIFFE JWT-SVID: {e}")
    return False

10. Repository Error Handling - Lost Context

Location: backend/rag_solution/repository/agent_repository.py:82-85

Issue: Generic catch-all loses original exception context:

except Exception as e:
    self.db.rollback()
    logger.error(f"Error creating agent: {e!s}")
    raise RepositoryError(f"Failed to create agent: {e!s}") from e

Recommendation: Handle specific exceptions:

except (IntegrityError, SQLAlchemyError) as e:
    self.db.rollback()
    raise RepositoryError(f"Database error creating agent: {e!s}") from e

11. Missing API Documentation

Location: backend/rag_solution/router/agent_router.py

Issue: Endpoints lack OpenAPI examples in docstrings

Impact: API documentation will be less helpful

Recommendation: Add OpenAPI examples:

@router.post(
    "/register",
    response_model=AgentRegistrationResponse,
    responses={
        201: {"description": "Agent registered successfully"},
        400: {"description": "Invalid request", "model": ErrorResponse},
        409: {"description": "SPIFFE ID already exists"},
    }
)

12. Type Annotation: Self vs AgentRepository

Location: backend/rag_solution/repository/agent_repository.py:29

Issue: Using Any type hint for self:

def __init__(self: Any, db: Session) -> None:

Recommendation: Remove type hint (not needed) or use proper type:

def __init__(self, db: Session) -> None:

LOW Priority / Style Issues

13. Magic String for Trust Domain

Location: Multiple files

Issue: "rag-modulo.example.com" appears in multiple places

Recommendation: Use constant or config default

14. Logging: f-string vs Extra Context

Location: backend/rag_solution/services/agent_service.py:86

Issue: Uses f-string instead of structured logging:

logger.warning(f"Rejected trust domain '{request.trust_domain}'...")

Per CLAUDE.md Enhanced Logging guidelines:

logger.warning(
    "Rejected trust domain from user",
    extra={
        "requested_domain": request.trust_domain,
        "configured_domain": self._config.trust_domain,
        "user_id": str(owner_user_id)
    }
)

15. Test Fixture Naming

Location: tests/unit/services/test_agent_service.py:35-44

Issue: Mock fixtures lack docstrings

Recommendation: Add brief descriptions for clarity

📋 Testing Checklist

Required Before Merge:

Add py-spiffe to pyproject.toml and regenerate poetry.lock
Run full test suite (currently 947+ tests) with new changes
Verify migration script works on clean database
Test SPIFFE integration end-to-end with SPIRE containers
Security review of fallback authentication logic
Load testing for agent authentication at scale
Documentation review by team (per PR description test plan)

🎯 Summary Assessment

Overall Rating: 8.5/10

Verdict: Approve with Required Changes

This is an exceptionally well-designed and implemented feature. The architecture is sound, the code quality is high, and the documentation is outstanding. However, there are critical dependency issues that must be resolved before merge.

Must Fix Before Merge:

✅ Add py-spiffe dependency to pyproject.toml
✅ Add psycopg2 import error handling to migration script
✅ Add production environment check to signature validation fallback
⚠️ Consider the agent/user authentication backward compatibility approach

Recommended for Follow-Up PR:

Refactor enum definitions to single source
Improve structured logging throughout
Add missing indexes (last_seen_at)
Integrate SPIRE docker-compose with main setup

📚 Alignment with Repository Standards

✅ Follows CLAUDE.md guidelines:

Production-ready emphasis ✅
Comprehensive testing ✅
Clean architecture ✅
Type hints throughout ✅
Documentation excellence ✅

⚠️ Minor deviations:

Enhanced logging not consistently applied (Issue Build and tests fixes #14)
Line length appears compliant (120 chars)

🚀 Impact Assessment

Positive Impact:

Enables secure agent-to-agent authentication (critical for agentic RAG)
Production-ready identity infrastructure
Clear migration path for existing systems
Excellent foundation for MCP integration (PR feat(mcp): Implement MCP Gateway integration for extensibility #684, feat: SPIFFE/SPIRE Integration Architecture for Agent Identity #695)

Risk Assessment:

Medium Risk: Missing py-spiffe dependency will cause runtime failures
Low Risk: Migration script is well-tested and has rollback
Low Risk: Fallback authentication has clear security warnings

Migration Complexity: Medium

Requires new database table
Requires SPIRE infrastructure setup
Backward compatible with existing JWT auth

🤝 Recommendations

For Immediate Action:

Add missing dependencies to pyproject.toml
Test with make test-all to ensure no regressions
Verify SPIRE Docker setup works with make local-dev-spire

For Documentation:

Add SPIRE setup guide to CLAUDE.md under "Common Development Commands"
Document agent registration workflow in getting-started docs

For Future Enhancement:

Consider implementing automatic SPIRE registration via SPIRE Server API
Add agent activity monitoring dashboard
Implement agent credential rotation automation

Great work on this comprehensive feature! The architecture is solid and the implementation quality is high. Address the critical dependency issues and this will be ready to merge.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

manavgup · 2025-11-27T18:03:53Z

Closing - branch included unrelated files from PR #695. Will recreate from clean main branch.

claude and others added 8 commits November 26, 2025 20:27

manavgup added documentation Improvements or additions to documentation priority:high High priority - important for release architecture Architectural decisions and design choices labels Nov 27, 2025

manavgup closed this Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add agentic RAG architecture documentation #700

docs: Add agentic RAG architecture documentation #700

Uh oh!

manavgup commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

manavgup commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: Add agentic RAG architecture documentation #700

docs: Add agentic RAG architecture documentation #700

Uh oh!

Conversation

manavgup commented Nov 27, 2025

Summary

Documents Added

Architecture Highlights

3-Stage Agent Pipeline (search-agent-hooks-architecture.md)

MCP Integration (mcp-integration-architecture.md)

Agentic UI (agentic-ui-architecture.md)

Implementation Roadmap

Test Plan

Uh oh!

github-actions bot commented Nov 27, 2025

🚀 Development Environment Options

Option 1: GitHub Codespaces (Recommended)

Option 2: VS Code Dev Containers (Local)

Option 3: Traditional Local Setup

Available Commands

Services Available

Uh oh!

github-actions bot commented Nov 27, 2025

Pull Request Review: Agentic RAG Architecture Documentation & SPIFFE/SPIRE Integration

Overview

✅ Strengths

1. Comprehensive Architecture Documentation

2. Security-First Design

3. Production-Ready Code Quality

4. Database Design Excellence

5. Clean Architecture

🔍 Issues & Recommendations

CRITICAL Issues

1. Missing Dependency: py-spiffe

2. Migration Script Missing psycopg2 Dependency Check

3. Security: Signature Validation Fallback

HIGH Priority Issues

4. Authentication Middleware: Agent vs User Confusion

5. Race Condition in Agent Registration

6. Missing Index on last_seen_at

7. SPIRE Docker Compose Not Integrated

MEDIUM Priority Issues

8. Inconsistent Enum Definitions

9. Missing Error Handling for JWT Decode

10. Repository Error Handling - Lost Context

11. Missing API Documentation

12. Type Annotation: Self vs AgentRepository

LOW Priority / Style Issues

13. Magic String for Trust Domain

14. Logging: f-string vs Extra Context

15. Test Fixture Naming

📋 Testing Checklist

Required Before Merge:

Recommended:

🎯 Summary Assessment

Overall Rating: 8.5/10

Must Fix Before Merge:

Recommended for Follow-Up PR:

📚 Alignment with Repository Standards

🚀 Impact Assessment

🤝 Recommendations

For Immediate Action:

For Documentation:

For Future Enhancement:

Uh oh!

manavgup commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants