-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Infrastructure: Kubernetes/OpenShift Deployment Strategy
Overview
Implement a production-ready Kubernetes/OpenShift deployment strategy for RAG Modulo, following best practices from IBM's MCP Context Forge project while addressing current gaps in our Docker Compose-based infrastructure.
Current State Analysis
Existing Infrastructure
Services: Backend (FastAPI), Frontend (React), PostgreSQL, Milvus, MLFlow, MinIO, etcd
Gaps for Production:
- ❌ No Kubernetes manifests
- ❌ No horizontal scaling
- ❌ No resource limits defined
- ❌ No secrets management
- ❌ No ingress/load balancing
- ❌ No monitoring/observability
- ❌ No auto-scaling policies
- ❌ No Helm charts
IBM MCP Context Forge Learnings
Good Patterns:
- ✅ Separate K8s manifests per resource type
- ✅ ConfigMaps for configuration
- ✅ PersistentVolumes for stateful services
- ✅ Multi-deployment approach (Ansible, Terraform, K8s)
Improvements Needed:
- Add resource limits
- Add health probes
- Use Secrets (not ConfigMaps) for sensitive data
- Implement HA with multiple replicas
- Add HPA (Horizontal Pod Autoscaler)
Recommendation: Production-Ready K8s
Implementation Timeline: 10 weeks
Phase 1 (Weeks 1-2): Core K8s manifests
Phase 2 (Weeks 3-4): Helm charts
Phase 3 (Weeks 5-6): Auto-scaling & monitoring
Phase 4 (Weeks 7-8): CI/CD integration
Phase 5 (Weeks 9-10): Testing & migration
Directory Structure
rag_modulo/
├── deployment/
│ ├── k8s/
│ │ ├── base/
│ │ │ ├── namespace.yaml
│ │ │ ├── configmaps/
│ │ │ ├── secrets/
│ │ │ ├── deployments/
│ │ │ ├── statefulsets/
│ │ │ ├── services/
│ │ │ ├── ingress/
│ │ │ ├── storage/
│ │ │ └── jobs/
│ │ └── overlays/ (dev/staging/prod)
│ ├── helm/
│ │ └── rag-modulo/
│ │ ├── Chart.yaml
│ │ ├── values.yaml
│ │ └── templates/
│ └── scripts/
└── .github/workflows/
└── k8s-deploy-*.yml
Key Implementations
1. Backend Deployment with HA
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-modulo-backend
namespace: rag-modulo
spec:
replicas: 3 # High Availability
strategy:
type: RollingUpdate
template:
spec:
containers:
- name: backend
image: ghcr.io/manavgup/rag_modulo/backend:latest
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
readinessProbe:
httpGet:
path: /health/ready
port: 8000
initialDelaySeconds: 302. PostgreSQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: rag-modulo
spec:
serviceName: postgres-service
replicas: 1
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi3. Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: backend-hpa
spec:
scaleTargetRef:
kind: Deployment
name: rag-modulo-backend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 704. Ingress with TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: rag-modulo-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- rag-modulo.example.com
secretName: rag-modulo-tls
rules:
- host: rag-modulo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80805. Helm Chart
# values.yaml
backend:
enabled: true
replicaCount: 3
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
resources:
requests:
memory: "2Gi"
cpu: "1000m"
postgresql:
enabled: true
persistence:
enabled: true
size: 50Gi
milvus:
enabled: true
persistence:
enabled: true
size: 100Gi6. CI/CD Integration
# .github/workflows/k8s-deploy-production.yml
name: Deploy to K8s
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Build and push images
# ... docker build/push
- name: Deploy with Helm
run: |
helm upgrade --install rag-modulo ./deployment/helm/rag-modulo \
--namespace rag-modulo \
--values values-production.yaml
- name: Verify deployment
run: |
kubectl rollout status deployment/rag-modulo-backendFiles to Create (~5000 lines)
K8s Manifests (~2500 lines)
- Namespace, ConfigMaps, Secrets
- Deployments (Backend, Frontend, MLFlow)
- StatefulSets (PostgreSQL, Milvus, MinIO)
- Services (6 services)
- Ingress, Storage (4 PVCs)
- HPA, Monitoring
Helm Charts (~1500 lines)
- Chart.yaml, values.yaml (dev/staging/prod)
- Templates (all K8s resources)
CI/CD (~500 lines)
- GitHub Actions workflows (3 environments)
- Deployment scripts
Documentation (~500 lines)
- K8s deployment guide
- Helm usage guide
- Troubleshooting guide
Success Criteria
Functional:
- All services deploy to K8s
- Application accessible via ingress
- Database persistence working
- Auto-scaling responding to load
- Zero-downtime rolling updates
Non-Functional:
- Deployment time < 10 minutes
- Resource utilization optimized
- Monitoring dashboards operational
- CI/CD pipeline working
Performance:
- Backend scales 2-10 pods
- Response times < 500ms p95
- No memory leaks over 24h
OpenShift Considerations
- Security Context Constraints
- Routes instead of Ingress
- Use standard Deployments (not DeploymentConfig)
IBM Cloud Considerations
- Storage: Use
ibmc-block-gold - Ingress: IBM Cloud ALB annotations
- Monitoring: IBM Cloud Monitoring (Sysdig)
- Logging: IBM Log Analysis (LogDNA)
Migration Plan
Phase 1: Parallel run (10% traffic to K8s)
Phase 2: Gradual shift (50% traffic)
Phase 3: Full migration (100% traffic)
Phase 4: Decommission Docker Compose
Related Issues
- Related: Epic: RAG Modulo Evolution - Naive → Advanced → Modular RAG Architecture #256 - RAG Evolution Epic
- Related: Enhancement: Integrate IBM Docling for Advanced Document Processing #255 - Docling Integration
Effort Estimate
Total: 10 weeks (1-2 FTE)
- Phase 1: 2 weeks
- Phase 2: 2 weeks
- Phase 3: 2 weeks
- Phase 4: 2 weeks
- Phase 5: 2 weeks
Priority: High - Required for production deployment