Skip to content

Infrastructure: Kubernetes/OpenShift Deployment Strategy #260

@manavgup

Description

@manavgup

Infrastructure: Kubernetes/OpenShift Deployment Strategy

Overview

Implement a production-ready Kubernetes/OpenShift deployment strategy for RAG Modulo, following best practices from IBM's MCP Context Forge project while addressing current gaps in our Docker Compose-based infrastructure.

Current State Analysis

Existing Infrastructure

Services: Backend (FastAPI), Frontend (React), PostgreSQL, Milvus, MLFlow, MinIO, etcd

Gaps for Production:

  • ❌ No Kubernetes manifests
  • ❌ No horizontal scaling
  • ❌ No resource limits defined
  • ❌ No secrets management
  • ❌ No ingress/load balancing
  • ❌ No monitoring/observability
  • ❌ No auto-scaling policies
  • ❌ No Helm charts

IBM MCP Context Forge Learnings

Good Patterns:

  • ✅ Separate K8s manifests per resource type
  • ✅ ConfigMaps for configuration
  • ✅ PersistentVolumes for stateful services
  • ✅ Multi-deployment approach (Ansible, Terraform, K8s)

Improvements Needed:

  • Add resource limits
  • Add health probes
  • Use Secrets (not ConfigMaps) for sensitive data
  • Implement HA with multiple replicas
  • Add HPA (Horizontal Pod Autoscaler)

Recommendation: Production-Ready K8s

Implementation Timeline: 10 weeks

Phase 1 (Weeks 1-2): Core K8s manifests
Phase 2 (Weeks 3-4): Helm charts
Phase 3 (Weeks 5-6): Auto-scaling & monitoring
Phase 4 (Weeks 7-8): CI/CD integration
Phase 5 (Weeks 9-10): Testing & migration

Directory Structure

rag_modulo/
├── deployment/
│   ├── k8s/
│   │   ├── base/
│   │   │   ├── namespace.yaml
│   │   │   ├── configmaps/
│   │   │   ├── secrets/
│   │   │   ├── deployments/
│   │   │   ├── statefulsets/
│   │   │   ├── services/
│   │   │   ├── ingress/
│   │   │   ├── storage/
│   │   │   └── jobs/
│   │   └── overlays/ (dev/staging/prod)
│   ├── helm/
│   │   └── rag-modulo/
│   │       ├── Chart.yaml
│   │       ├── values.yaml
│   │       └── templates/
│   └── scripts/
└── .github/workflows/
    └── k8s-deploy-*.yml

Key Implementations

1. Backend Deployment with HA

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-modulo-backend
  namespace: rag-modulo
spec:
  replicas: 3  # High Availability
  strategy:
    type: RollingUpdate
  template:
    spec:
      containers:
      - name: backend
        image: ghcr.io/manavgup/rag_modulo/backend:latest
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 30

2. PostgreSQL StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: rag-modulo
spec:
  serviceName: postgres-service
  replicas: 1
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 50Gi

3. Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    kind: Deployment
    name: rag-modulo-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

4. Ingress with TLS

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rag-modulo-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - rag-modulo.example.com
    secretName: rag-modulo-tls
  rules:
  - host: rag-modulo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 8080

5. Helm Chart

# values.yaml
backend:
  enabled: true
  replicaCount: 3
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"

postgresql:
  enabled: true
  persistence:
    enabled: true
    size: 50Gi

milvus:
  enabled: true
  persistence:
    enabled: true
    size: 100Gi

6. CI/CD Integration

# .github/workflows/k8s-deploy-production.yml
name: Deploy to K8s

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Build and push images
      # ... docker build/push
    
    - name: Deploy with Helm
      run: |
        helm upgrade --install rag-modulo ./deployment/helm/rag-modulo \
          --namespace rag-modulo \
          --values values-production.yaml
    
    - name: Verify deployment
      run: |
        kubectl rollout status deployment/rag-modulo-backend

Files to Create (~5000 lines)

K8s Manifests (~2500 lines)

  1. Namespace, ConfigMaps, Secrets
  2. Deployments (Backend, Frontend, MLFlow)
  3. StatefulSets (PostgreSQL, Milvus, MinIO)
  4. Services (6 services)
  5. Ingress, Storage (4 PVCs)
  6. HPA, Monitoring

Helm Charts (~1500 lines)

  1. Chart.yaml, values.yaml (dev/staging/prod)
  2. Templates (all K8s resources)

CI/CD (~500 lines)

  1. GitHub Actions workflows (3 environments)
  2. Deployment scripts

Documentation (~500 lines)

  1. K8s deployment guide
  2. Helm usage guide
  3. Troubleshooting guide

Success Criteria

Functional:

  • All services deploy to K8s
  • Application accessible via ingress
  • Database persistence working
  • Auto-scaling responding to load
  • Zero-downtime rolling updates

Non-Functional:

  • Deployment time < 10 minutes
  • Resource utilization optimized
  • Monitoring dashboards operational
  • CI/CD pipeline working

Performance:

  • Backend scales 2-10 pods
  • Response times < 500ms p95
  • No memory leaks over 24h

OpenShift Considerations

  1. Security Context Constraints
  2. Routes instead of Ingress
  3. Use standard Deployments (not DeploymentConfig)

IBM Cloud Considerations

  1. Storage: Use ibmc-block-gold
  2. Ingress: IBM Cloud ALB annotations
  3. Monitoring: IBM Cloud Monitoring (Sysdig)
  4. Logging: IBM Log Analysis (LogDNA)

Migration Plan

Phase 1: Parallel run (10% traffic to K8s)
Phase 2: Gradual shift (50% traffic)
Phase 3: Full migration (100% traffic)
Phase 4: Decommission Docker Compose

Related Issues

Effort Estimate

Total: 10 weeks (1-2 FTE)

  • Phase 1: 2 weeks
  • Phase 2: 2 weeks
  • Phase 3: 2 weeks
  • Phase 4: 2 weeks
  • Phase 5: 2 weeks

Priority: High - Required for production deployment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestinfrastructureInfrastructure and deployment

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions