-
Notifications
You must be signed in to change notification settings - Fork 326
Description
🔐 mTLS (Mutual TLS) Support for MCP Gateway
🎯 Executive Summary
Implement certificate-based mutual TLS authentication for MCP Gateway to enable zero-trust security, enterprise PKI integration, and secure plugin-to-gateway communication.
🔍 Problem Statement
Organizations require certificate-based authentication for:
- Zero-trust network architectures
- Enterprise PKI integration
- Compliance with security standards (SOC2, ISO 27001, HIPAA)
- Service-to-service authentication without shared secrets
- External plugin security
Currently, MCP Gateway relies on JWT/Bearer tokens or proxy-based authentication, requiring additional infrastructure for mTLS.
🙋♂️ User Stories
Story 1: Enterprise PKI Integration
As a: Enterprise Security Administrator
I want: To enforce certificate-based authentication for all MCP clients
So that: I can integrate with our existing PKI infrastructure and meet compliance requirements
✅ Acceptance Criteria
Scenario: Enterprise PKI Integration
Given I manage certificates for 500+ employees via our corporate PKI
When a user connects to MCP Gateway with their issued certificate
Then the gateway validates against our CA and extracts user identity from CN
And the user gains access without managing separate credentials
Story 2: Secure Plugin Communication
As a: DevOps Engineer
I want: To configure mTLS between MCP Gateway and external plugins
So that: Plugin communication is encrypted and mutually authenticated
✅ Acceptance Criteria
Scenario: Secure Plugin Communication
Given I have sensitive AI safety plugins that validate prompts
When MCP Gateway invokes these external plugins
Then the gateway presents its client certificate for authentication
And the plugin verifies the gateway's identity before processing
And all communication is encrypted with mutual TLS
Story 3: Service Mesh Deployment
As a: Platform Architect
I want: To deploy MCP Gateway with mTLS in Kubernetes using service mesh
So that: All inter-service communication is automatically secured
✅ Acceptance Criteria
Scenario: Kubernetes Service Mesh Deployment
Given I deploy MCP Gateway in an Istio-enabled Kubernetes cluster
When services communicate within the mesh
Then Istio automatically handles mTLS between all pods
And I can enforce strict peer authentication policies
And certificate rotation happens automatically via cert-manager
Story 4: Developer Certificate Authentication
As a: Developer
I want: To authenticate to MCP Gateway using my personal certificate
So that: I don't need to manage JWT tokens or passwords
✅ Acceptance Criteria
Scenario: Developer Certificate Authentication
Given I have a personal X.509 certificate from our IT department
When I connect to MCP Gateway via CLI or API
Then I authenticate using my certificate instead of username/password
And my identity is extracted from the certificate subject DN
And I don't need to manage or rotate JWT tokens
Story 5: Certificate Lifecycle Management
As a: Operations Engineer
I want: Automated monitoring and renewal of certificates
So that: Services don't experience outages due to expired certificates
✅ Acceptance Criteria
Scenario: Certificate Expiration Monitoring
Given certificates have limited validity periods
When a certificate approaches expiration (< 7 days)
Then the monitoring system alerts the operations team
And cert-manager can automatically request renewal
And services continue operating without interruption
📐 Solution Architecture
graph TB
subgraph "Current State (Proxy-based)"
C1[Client] -->|HTTPS+Cert| P1[Reverse Proxy]
P1 -->|HTTP+Headers| G1[MCP Gateway]
G1 -->|HTTP| PS1[Plugin Server]
end
subgraph "Future State (Native mTLS)"
C2[Client] -->|mTLS| G2[MCP Gateway]
G2 -->|mTLS| PS2[Plugin Server]
G2 -->|mTLS| MS[MCP Server]
PKI[Enterprise PKI] -.->|Issues Certs| C2
PKI -.->|Issues Certs| G2
PKI -.->|Issues Certs| PS2
end
style G2 fill:#4CAF50,stroke:#2E7D32,stroke-width:2px
style PKI fill:#FF9800,stroke:#F57C00,stroke-width:2px
🔄 Technical Architecture
sequenceDiagram
participant Client
participant Gateway as MCP Gateway<br/>(Native mTLS)
participant CertStore as Certificate Store
participant Plugin as External Plugin
participant MCP as MCP Server
Note over Client,MCP: Initial TLS Handshake
Client->>Gateway: ClientHello + Certificate
Gateway->>CertStore: Validate Certificate
CertStore-->>Gateway: Certificate Valid + User DN
Gateway->>Client: ServerHello + Certificate
Client->>Gateway: Verify Server Certificate
Note over Client,MCP: Authenticated Request Flow
Client->>Gateway: Request (mTLS established)
Gateway->>Gateway: Extract User from Certificate
Gateway->>Gateway: Apply RBAC Policies
alt Plugin Invocation Required
Gateway->>Plugin: mTLS Request
Plugin->>Plugin: Validate Gateway Cert
Plugin-->>Gateway: Response
end
Gateway->>MCP: Forward Request
MCP-->>Gateway: Response
Gateway-->>Client: Response
✅ Acceptance Criteria
🔒 Core mTLS Support
- Gateway can terminate TLS connections directly without reverse proxy
- Support for X.509 certificate validation against configured CA(s)
- Certificate CN/SAN extraction for user identification
- Certificate revocation checking (CRL/OCSP)
- Configurable certificate validation depth
- Support for multiple CA certificates
🔌 Plugin mTLS
- Gateway presents client certificate to external plugins
- Per-plugin certificate configuration
- Global default certificates with per-plugin overrides
- Support for password-protected private keys
- Certificate verification bypass for development (insecure mode)
⚙️ Configuration
- Environment variables for mTLS settings
- File-based certificate loading
- Secret/ConfigMap mounting in Kubernetes
- Hot-reload of certificates without restart
- Backward compatibility with existing auth methods
🛡️ Security Features
- Certificate pinning support
- Mutual authentication enforcement modes (optional/required)
- Certificate-based RBAC integration
- Audit logging of certificate details
- Metrics for certificate validation failures
- Rate limiting by certificate fingerprint
🔗 Integration
- Kubernetes Service Mesh compatibility (Istio/Linkerd)
- Cert-manager integration for automatic rotation
- HashiCorp Vault PKI backend support
- AWS Private CA integration
- OpenSSL command-line compatibility
📚 Documentation
- mTLS configuration guide
- Certificate generation examples
- Kubernetes deployment with mTLS
- Plugin mTLS setup
- Troubleshooting guide
- Migration from JWT to mTLS
🛠️ Implementation Details
Configuration Schema
# Environment Variables
MTLS_ENABLED: "true"
MTLS_MODE: "optional" # optional | required
MTLS_CA_BUNDLE: "/app/certs/ca-bundle.crt"
MTLS_CRL_FILE: "/app/certs/crl.pem"
MTLS_VERIFY_DEPTH: "2"
MTLS_CHECK_HOSTNAME: "true"
# Client Certificate for Plugins
MTLS_CLIENT_CERT: "/app/certs/gateway.crt"
MTLS_CLIENT_KEY: "/app/certs/gateway.key"
MTLS_CLIENT_KEY_PASSWORD: "${SECRET_KEY_PASSWORD}"
# Plugin-specific overrides
PLUGINS_MTLS_CA_BUNDLE: "/app/certs/plugins-ca.crt"
PLUGINS_MTLS_CLIENT_CERT: "/app/certs/plugin-client.crt"
PLUGINS_MTLS_CLIENT_KEY: "/app/certs/plugin-client.key"
PLUGINS_MTLS_VERIFY: "true"
PLUGINS_MTLS_CHECK_HOSTNAME: "true"
Certificate Validation Flow
flowchart TD
A[TLS Handshake] --> B{Client Cert Presented?}
B -->|No| C{mTLS Required?}
C -->|Yes| D[Reject: 401]
C -->|No| E[Continue with Other Auth]
B -->|Yes| F[Validate Against CA]
F -->|Invalid| D
F -->|Valid| G{Check CRL/OCSP?}
G -->|Yes| H[Check Revocation]
H -->|Revoked| D
H -->|Valid| I[Extract Subject DN]
G -->|No| I
I --> J{Certificate Pinning?}
J -->|Yes| K[Verify Pin]
K -->|Failed| D
K -->|Success| L[Set User Context]
J -->|No| L
L --> M[Process Request]
🧪 Testing Requirements
Unit Tests
- Certificate parsing and validation
- CN/SAN extraction
- Revocation checking logic
- Plugin mTLS client initialization
Integration Tests
- End-to-end mTLS handshake
- Certificate rotation during runtime
- Plugin communication with mTLS
- Mixed authentication modes
Security Tests
- Invalid certificate rejection
- Expired certificate handling
- Revoked certificate detection
- Certificate pinning validation
- Man-in-the-middle prevention
⚡ Performance Considerations
- TLS session resumption support
- Certificate caching for validation
- Connection pooling for plugin clients
- Async certificate validation
- Hardware acceleration support (AES-NI)
📂 Scope
✅ In Scope - Proxy-based mTLS Documentation
Proxy-based mTLS for Gateway
- Complete documentation for Nginx/Caddy/HAProxy reverse proxy mTLS
- Certificate generation and management scripts
- Docker Compose and Kubernetes deployment examples
- Header-based authentication (
TRUST_PROXY_AUTH
,PROXY_USER_HEADER
)
Plugin mTLS Support
- External plugin mTLS configuration (
PLUGINS_MTLS_*
environment variables) - Per-plugin certificate configuration in
plugins/config.yaml
- Certificate validation and hostname checking options
- Gateway as mTLS client to secure plugin endpoints
Enterprise Integration
- Kubernetes deployment with Istio service mesh
- Helm chart configuration with TLS ingress
- Cert-manager integration for automatic rotation
- Production security best practices and monitoring
Documentation
- Comprehensive mTLS guide (
docs/docs/manage/mtls.md
) - Certificate generation examples
- Troubleshooting and migration guides
- Multiple deployment patterns (Docker, Kubernetes, Helm)
❌ Out of Scope - Native mTLS (Future Enhancement)
Core Gateway TLS Termination
- Native TLS handling in FastAPI/Uvicorn (remove proxy requirement)
- SSL context configuration in
main.py
- Direct certificate extraction from TLS handshake
- TLS session resumption for performance
Certificate Validation
- X.509 certificate parsing and validation
- CA chain verification
- CRL/OCSP revocation checking
- Certificate storage and caching layer
Configuration & Runtime
MTLS_ENABLED
andMTLS_MODE
environment variables- Hot reload of certificates without restart
- Certificate-based RBAC integration
- Audit logging of certificate details
Testing
- Unit tests for certificate validation
- Integration tests for native TLS handshake
- Performance benchmarks for TLS overhead
- Security tests for certificate pinning
⚠️ Risks & Mitigations
Risk | Impact | Mitigation |
---|---|---|
Certificate expiration causes outages | High | Automated rotation, monitoring, alerts |
Performance degradation from TLS | Medium | Session resumption, caching, connection pooling |
Complex configuration | Medium | Sensible defaults, validation tooling, examples |
Breaking changes for existing users | High | Backward compatibility, gradual rollout, feature flags |
🔧 Dependencies
- OpenSSL or BoringSSL library
- Certificate management tooling
- Update to transport layer
- Documentation updates
- Testing infrastructure
✔️ Definition of Done
- All acceptance criteria met
- Unit test coverage >90%
- Integration tests passing
- Security review completed
- Performance benchmarks met (<100ms overhead)
- Documentation published
- Helm charts updated
- Migration guide available
- Announced in release notes