-
Notifications
You must be signed in to change notification settings - Fork 4
feat: SPIFFE/SPIRE Integration Architecture for Agent Identity #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SPIFFE/SPIRE Integration Architecture for Agent Identity #695
Conversation
Add environment variables to support SPIFFE workload identity integration for AI agents and services. This enables cryptographic machine identity with configurable migration phases: - SPIFFE_ENABLED: Toggle SPIFFE integration - SPIFFE_AUTH_MODE: Migration phases (disabled→optional→preferred→required) - SPIFFE_ENDPOINT_SOCKET: SPIRE Agent Workload API socket - SPIFFE_TRUST_DOMAIN: Trust domain for identity hierarchy - SPIFFE_LEGACY_JWT_WARNING: Track legacy auth usage during migration - SPIFFE_SVID_TTL_SECONDS: Certificate lifetime configuration - SPIFFE_JWT_AUDIENCES: Allowed JWT-SVID audiences Related to: MCP Context Forge integration (PR #684)
This architecture document outlines how to integrate SPIRE (SPIFFE Runtime Environment) into RAG Modulo to provide cryptographic workload identities for AI agents. This enables zero-trust agent authentication and secure agent-to-agent (A2A) communication. Key architectural decisions: - JWT-SVIDs for stateless verification (vs X.509 for mTLS) - Trust domain: spiffe://rag-modulo.example.com - Integration with IBM MCP Context Forge (PR #684) - Capability-based access control for agents - 5-phase implementation plan Agent types defined: - search-enricher: MCP tool invocation - cot-reasoning: Chain of Thought orchestration - question-decomposer: Query decomposition - source-attribution: Document source tracking - entity-extraction: Named entity recognition - answer-synthesis: Answer generation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🚀 Development Environment OptionsThis repository supports Dev Containers for a consistent development environment. Option 1: GitHub Codespaces (Recommended)Create a cloud-based development environment:
Option 2: VS Code Dev Containers (Local)Use Dev Containers on your local machine:
Option 3: Traditional Local SetupSet up the development environment manually: # Clone the repository
git clone https://github.com/manavgup/rag_modulo.git
cd rag_modulo
git checkout claude/integrate-spire-attestation-01V2xmqcX6H8q8veyW32RBSF
# Initialize development environment
make dev-init
make dev-build
make dev-up
make dev-validateAvailable CommandsOnce in your development environment: make help # Show all available commands
make dev-validate # Validate environment setup
make test-atomic # Run atomic tests
make test-unit # Run unit tests
make lint # Run lintingServices AvailableWhen running
This automated message helps reviewers quickly set up the development environment. |
Code Review: SPIFFE/SPIRE Integration ArchitectureThis PR introduces a comprehensive 900-line architecture document for SPIFFE/SPIRE integration. Overall, this is an excellent architecture proposal with strong technical foundation and clear implementation roadmap. ✅ Major Strengths
🔍 Key Areas for Improvement1. Trust Domain Naming Inconsistency
|
This commit implements the SPIFFE/SPIRE integration for AI agent authentication as designed in docs/architecture/spire-integration-architecture.md. Key changes: - Add py-spiffe dependency for SPIFFE JWT-SVID support - Create core SPIFFE authentication module (spiffe_auth.py) with: - SPIFFEConfig for environment-based configuration - AgentPrincipal dataclass for authenticated agent identity - SPIFFEAuthenticator for JWT-SVID validation - AgentType and AgentCapability enums - Helper functions for SPIFFE ID parsing and building - Create Agent data model with SQLAlchemy: - Agent model with SPIFFE ID, type, capabilities, status - Relationships to User (owner) and Team - Status management (active, suspended, revoked) - Add Agent repository, service, and router layers: - Full CRUD operations for agents - Agent registration with SPIFFE ID generation - Status and capability management - JWT-SVID validation endpoint - Extend AuthenticationMiddleware to detect and validate SPIFFE JWT-SVIDs - Add SPIRE deployment configuration templates: - server.conf, agent.conf for SPIRE configuration - docker-compose.spire.yml for local development - README.md with deployment instructions - Add comprehensive unit tests for all SPIFFE components Reference: PR #695 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review: SPIFFE/SPIRE Integration Architecture for Agent IdentityThank you for this comprehensive PR! This is an excellent architectural foundation for bringing cryptographic workload identity to RAG Modulo. I've reviewed the 5,000+ lines of changes across 21 files, and here's my detailed feedback: ✅ Strengths1. Excellent Architecture & Documentation
2. Solid Code Quality
3. Security-Conscious Design
4. Production-Ready Features
🔴 Critical Issues1. Missing Database Migration 🚨Severity: High - Will cause runtime errors The Evidence:
Impact:
Resolution Required: # Generate Alembic migration
cd backend
alembic revision --autogenerate -m "Add agents table for SPIFFE workload identity"
alembic upgrade headExpected Migration Schema: CREATE TABLE agents (
id UUID PRIMARY KEY,
spiffe_id VARCHAR(512) UNIQUE NOT NULL,
agent_type VARCHAR(100) NOT NULL,
name VARCHAR(255) NOT NULL,
description TEXT,
owner_user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
team_id UUID REFERENCES teams(id) ON DELETE SET NULL,
capabilities JSONB NOT NULL DEFAULT '[]',
metadata JSONB NOT NULL DEFAULT '{}',
status VARCHAR(50) NOT NULL DEFAULT 'pending',
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
last_seen_at TIMESTAMP
);
CREATE INDEX idx_agents_spiffe_id ON agents(spiffe_id);
CREATE INDEX idx_agents_agent_type ON agents(agent_type);
CREATE INDEX idx_agents_owner_user_id ON agents(owner_user_id);
CREATE INDEX idx_agents_status ON agents(status);2. Signature Verification Fallback is Unsafe
|
Critical fixes: - Add database migration for agents table (migrations/add_agents_table.sql) - Fix signature verification security: failed validation now always rejects (prevents fallback bypass attack) - Fix timezone handling: use UTC consistently for JWT timestamps Improvements: - Align env vars with .env.example (SPIFFE_JWT_AUDIENCES, SPIFFE_SVID_TTL_SECONDS) - Add capability enforcement decorator (require_capabilities) - Add OpenAPI tags metadata for agents endpoint - Update and expand unit tests (47 tests passing) Addresses review comments from PR #695. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
🔍 PR Review: SPIFFE/SPIRE Integration ArchitectureOverall Assessment: This is a high-quality, production-ready implementation. Score: 8.8/10 ⭐⭐⭐⭐⭐ Recommendation: APPROVE WITH CHANGES ✅ ✅ Strengths1. Excellent Documentation (900 lines)
2. Production-Ready Implementation
3. Strong Security
🚨 MUST FIX Before Merge (5 Critical Issues)1. Missing Database Relationship Back-References ❌Files: Issue: Agent model defines relationships but User/Team models are missing back-references. Fix Required: # user.py
class User(Base):
agents: Mapped[list["Agent"]] = relationship("Agent", back_populates="owner")
# team.py
class Team(Base):
agents: Mapped[list["Agent"]] = relationship("Agent", back_populates="team")Impact: Without these, SQLAlchemy raises 2. Datetime Timezone Inconsistency 🕐File: Issue: Uses naive Fix Required: from datetime import UTC, datetime
created_at: Mapped[datetime] = mapped_column(DateTime, default=lambda: datetime.now(UTC))
updated_at: Mapped[datetime] = mapped_column(DateTime, default=lambda: datetime.now(UTC),
onupdate=lambda: datetime.now(UTC))
def update_last_seen(self) -> None:
self.last_seen_at = datetime.now(UTC)Impact: Mixing naive and aware datetimes causes 3. Agent Status Not Checked in Middleware 🔴File: Issue: Validates JWT-SVID signature but never checks if agent is suspended/revoked. Current Flow: Middleware → validate_jwt_svid() → ✅ Signature valid → Allow Security Risk: Suspended agents can still authenticate if JWT-SVID hasn't expired! 4. SQL Migration Rollback Missing 🔄File: Issue: No rollback script provided. Required: Add BEGIN;
DROP INDEX IF EXISTS ix_agents_spiffe_id;
DROP INDEX IF EXISTS ix_agents_agent_type;
DROP INDEX IF EXISTS ix_agents_owner_user_id;
DROP INDEX IF EXISTS ix_agents_team_id;
DROP INDEX IF EXISTS ix_agents_status;
DROP TABLE IF EXISTS agents;
COMMIT;5. Verify py-spiffe Installation in CI 📦File: Issue: PR adds
Recommendation: [tool.poetry.dependencies]
spiffe = { version = "^0.2.2", optional = true }
[tool.poetry.extras]
spiffe = ["spiffe"]
|
| Category | Score | Notes |
|---|---|---|
| Code Quality | 9/10 | Clean, typed, well-structured |
| Security | 8/10 | Strong validation, minor fallback concern |
| Testing | 8/10 | 1226+ unit tests, missing integration |
| Documentation | 9/10 | Exceptional 900-line architecture doc |
| Performance | 9/10 | Indexes, eager loading, pagination |
| Architecture | 10/10 | Perfect alignment with RAG Modulo patterns |
Overall: 8.8/10
🎯 Action Items
Before Merge (Critical)
- Add
agentsrelationship to User/Team models - Fix datetime timezone to UTC in Agent model
- Check agent status in authentication middleware
- Add migration rollback script
- Verify py-spiffe CI installation
Follow-Up PRs
- Environment-gated fallback mode
- Agent status caching (60s TTL)
- Integration tests for SPIFFE flow
- Explicit issuer validation in middleware
- Refactor long
validate_jwt_svidfunction
Nice to Have
- Rate limiting on agent endpoints
- Mermaid diagrams in docs
- FAQ and troubleshooting sections
- Capability naming convention docs
- Trust domain configuration guide
📚 References
💬 Conclusion
Excellent work on this comprehensive SPIFFE/SPIRE integration! The architecture is sound, implementation is clean, and documentation is thorough. Once the 5 critical issues are addressed, this will provide a solid foundation for zero-trust agent authentication and seamless MCP Context Forge integration.
The 5-phase implementation plan is well thought out. Looking forward to Phase 1 deployment and seeing the multi-agent capabilities in action! 🚀
Review by Claude Code | 2025-01-26
…served word SQLAlchemy's Declarative API reserves the 'metadata' attribute name. Renamed the field to 'agent_metadata' in the model while keeping the database column name as 'metadata' via explicit column name mapping. This also updates the schema to use validation_alias for proper model_validate() from ORM objects. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review: SPIFFE/SPIRE Integration ArchitectureThank you for this comprehensive PR introducing SPIFFE/SPIRE integration for agent identity! This is a well-architected solution that addresses a critical need for AI agent authentication. Below is my detailed review: ✅ Strengths1. Excellent Architecture Documentation
2. Production-Ready Security Design
3. Comprehensive Test Coverage
4. Clean Code Architecture
5. Database Design
🔍 Issues & Recommendations1. CRITICAL: Timezone-Naive datetime Usage
|
| Category | Impact | Notes |
|---|---|---|
| Code Size | +5,771 lines | Substantial but well-organized |
| Test Coverage | Excellent | 1,226 lines of tests (21% of additions) |
| Breaking Changes | None | Disabled by default, opt-in |
| Security | High Positive | Zero-trust agent authentication |
| Performance | Minimal | Cached authenticator, lazy init |
| Dependencies | +1 (py-spiffe) | Production-ready, HPE-maintained |
🎉 Conclusion
This is a high-quality PR that introduces a critical security capability for RAG Modulo's AI agent ecosystem. The architecture is sound, the implementation is thorough, and the test coverage is excellent.
Primary concerns:
- Fix the timezone-naive datetime bug (critical)
- Align default agent status between SQL and model
- Add missing
.env.exampleentry
Recommendation: ✅ Approve with minor changes
Once the critical datetime fix and status alignment are addressed, this PR is ready to merge. Excellent work on documentation and security considerations!
Review completed by: Claude Code (Sonnet 4.5)
Reference: CLAUDE.md - RAG Modulo development guidelines
The test_validate_jwt_svid_valid test was failing because AgentPrincipal requires a trust_domain field which was not being provided. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive Code Review: SPIFFE/SPIRE Integration ArchitectureThis is an excellent and well-architected PR that introduces SPIFFE/SPIRE workload identity for AI agents. The implementation is production-ready, comprehensive, and follows security best practices. Below is my detailed analysis: ✅ Strengths1. Outstanding Documentation (900 lines)
2. Clean Architecture & Separation of Concerns
3. Database Schema Design (
|
Critical fixes: - Fix timezone-naive datetime to use UTC throughout (agent.py, agent_repository.py) - Change default agent status from ACTIVE to PENDING for approval workflow - Add RuntimeError when SPIFFE enabled but py-spiffe library missing - Restrict trust domain to configured value only (security fix) High priority security fixes: - Add capability validation per agent type (ALLOWED_CAPABILITIES_BY_TYPE) - Add authentication requirement to SPIFFE validation endpoint - Reject user-specified trust domains that don't match server config Code quality improvements: - Add OpenAPI tags metadata for agent router documentation - Fix require_capabilities decorator type hints (ParamSpec, TypeVar) - Add composite database indexes (owner+status, type+status, team+status) - Update migration script with new composite indexes Test updates: - Update test_register_agent_with_custom_trust_domain to verify rejection - Fix test_authenticator_creates_principal_with_fallback to mock spiffe module 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Code Review: SPIFFE/SPIRE Integration ArchitectureOverall Assessment: ✅ APPROVED with Minor Recommendations This PR demonstrates excellent architectural thinking and implementation quality. The SPIFFE/SPIRE integration is well-designed, thoroughly documented, and follows security best practices. Key Strengths
Code Quality AssessmentArchitecture & Design: Excellent
Security Implementation: Excellent
Type Safety: Excellent
Error Handling: Excellent
Documentation: Excellent
Test CoverageUnit Tests (
Service Tests (
Test Quality: Excellent use of pytest fixtures, mocks, and parametrized tests Database DesignStrengths:
Minor Suggestion: Add partial index for active agents: CREATE INDEX idx_agents_active ON agents(agent_type, owner_user_id)
WHERE status = 'active';Deployment ConfigurationDocker Compose (
Recommendations:
Security AnalysisStrengths:
Potential Concerns & Mitigations:
Integration with Existing CodeAuthentication Middleware: PR adds SPIFFE support without breaking existing flows
MCP Gateway (PR #684) Compatibility: Excellent alignment
Performance ConsiderationsBottlenecks:
Optimizations:
Scalability: Design supports horizontal scaling
Documentation QualityArchitecture Doc (
Minor Improvements:
Issues FoundCritical: None ✅ Minor:
Recommendations SummaryMust-Do (Before Merge): None - PR is production-ready Should-Do (This PR or Follow-up):
Nice-to-Have (Future PRs):
ConclusionThis is excellent work that significantly enhances RAG Modulo's security posture. The implementation demonstrates deep understanding of:
Status: ✅ APPROVED - Ready for production deployment after CI/CD passes Minor recommendations are non-blocking and can be addressed in follow-up PRs. Great job! 🚀 Reviewed by: Claude Code |
Add comprehensive architecture documentation for the Agentic RAG Platform: - agentic-ui-architecture.md: React component hierarchy, state management, and API integration for agent features - backend-architecture-diagram.md: Overall backend architecture with Mermaid diagrams showing service layers and data flow - mcp-integration-architecture.md: MCP client/server integration strategy, PR comparison (#671 vs #684), and Context Forge integration - rag-modulo-mcp-server-architecture.md: Exposing RAG capabilities as MCP server with tools (rag_search, rag_ingest, etc.) and resources - search-agent-hooks-architecture.md: 3-stage agent pipeline (pre-search, post-search, response) with database schema and execution flow - system-architecture.md: Complete system architecture overview with technology stack and data flows These documents guide implementation of: - PR #695 (SPIFFE/SPIRE agent identity) - PR #671 (MCP Gateway client) - Issue #697 (Agent execution hooks) - Issue #698 (MCP Server) - Issue #699 (Agentic UI) Closes #696 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive architecture documentation for the Agentic RAG Platform: - agentic-ui-architecture.md: React component hierarchy, state management, and API integration for agent features - backend-architecture-diagram.md: Overall backend architecture with Mermaid diagrams showing service layers and data flow - mcp-integration-architecture.md: MCP client/server integration strategy, PR comparison (#671 vs #684), and Context Forge integration - rag-modulo-mcp-server-architecture.md: Exposing RAG capabilities as MCP server with tools (rag_search, rag_ingest, etc.) and resources - search-agent-hooks-architecture.md: 3-stage agent pipeline (pre-search, post-search, response) with database schema and execution flow - system-architecture.md: Complete system architecture overview with technology stack and data flows These documents guide implementation of: - PR #695 (SPIFFE/SPIRE agent identity) - PR #671 (MCP Gateway client) - Issue #697 (Agent execution hooks) - Issue #698 (MCP Server) - Issue #699 (Agentic UI) Closes #696 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive architecture documentation for the Agentic RAG Platform: - agentic-ui-architecture.md: React component hierarchy, state management, and API integration for agent features - backend-architecture-diagram.md: Overall backend architecture with Mermaid diagrams showing service layers and data flow - mcp-integration-architecture.md: MCP client/server integration strategy, PR comparison (#671 vs #684), and Context Forge integration - rag-modulo-mcp-server-architecture.md: Exposing RAG capabilities as MCP server with tools (rag_search, rag_ingest, etc.) and resources - search-agent-hooks-architecture.md: 3-stage agent pipeline (pre-search, post-search, response) with database schema and execution flow - system-architecture.md: Complete system architecture overview with technology stack and data flows These documents guide implementation of: - PR #695 (SPIFFE/SPIRE agent identity) - PR #671 (MCP Gateway client) - Issue #697 (Agent execution hooks) - Issue #698 (MCP Server) - Issue #699 (Agentic UI) Closes #696 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
Summary
This PR introduces a comprehensive architecture document for integrating SPIFFE/SPIRE into RAG Modulo to provide cryptographic workload identities for AI agents. This enables zero-trust agent authentication and secure agent-to-agent (A2A) communication.
Why This Matters
As RAG Modulo integrates IBM MCP Context Forge (PR #684) to support AI agents, we need a robust identity mechanism for workloads/agents that goes beyond traditional user authentication:
Key Architectural Decisions
spiffe://rag-modulo.example.compy-spiffeAgent Identity Model
The architecture defines a new
Agentdata model with SPIFFE ID integration:search-enricher/agent/search-enricher/{id}mcp:tool:invoke,search:readcot-reasoning/agent/cot-reasoning/{id}search:read,llm:invoke,pipeline:executequestion-decomposer/agent/question-decomposer/{id}search:read,llm:invokesource-attribution/agent/source-attribution/{id}document:read,search:readentity-extraction/agent/entity-extraction/{id}document:read,llm:invokeanswer-synthesis/agent/answer-synthesis/{id}search:read,llm:invoke,cot:invokeIntegration with MCP Context Forge (PR #684)
This architecture complements the MCP Gateway integration by:
mcp_jwt_tokensecurity gap identified in PR feat(mcp): Implement MCP Gateway integration for extensibility #684 reviewImplementation Phases
py-spiffeintegration, extended AuthenticationMiddlewareArchitecture Diagram
Changes
docs/architecture/spire-integration-architecture.md(900 lines)Related Issues/PRs
Test Plan
Questions for Reviewers
Trust Domain Naming: Is
spiffe://rag-modulo.example.coman appropriate naming convention, or should we use something more specific?JWT-SVID vs X.509-SVID: The document recommends JWT-SVIDs for easier integration. Should we also support X.509-SVIDs for mTLS scenarios?
Implementation Priority: Given PR feat(mcp): Implement MCP Gateway integration for extensibility #684 is in progress, should Phase 3 (MCP Gateway Integration) be prioritized over Phase 2 (Backend Integration)?
Agent Capability Model: Are the proposed agent types and capabilities comprehensive enough for the planned use cases?
References
🤖 Generated with Claude Code