Infrastructure: Kubernetes/OpenShift Deployment Strategy

# Infrastructure: Kubernetes/OpenShift Deployment Strategy

## Overview

Implement a production-ready Kubernetes/OpenShift deployment strategy for RAG Modulo, following best practices from IBM's MCP Context Forge project while addressing current gaps in our Docker Compose-based infrastructure.

## Current State Analysis

### Existing Infrastructure

**Services**: Backend (FastAPI), Frontend (React), PostgreSQL, Milvus, MLFlow, MinIO, etcd

**Gaps for Production:**
- ❌ No Kubernetes manifests
- ❌ No horizontal scaling
- ❌ No resource limits defined
- ❌ No secrets management
- ❌ No ingress/load balancing
- ❌ No monitoring/observability
- ❌ No auto-scaling policies
- ❌ No Helm charts

### IBM MCP Context Forge Learnings

**Good Patterns:**
- ✅ Separate K8s manifests per resource type
- ✅ ConfigMaps for configuration
- ✅ PersistentVolumes for stateful services
- ✅ Multi-deployment approach (Ansible, Terraform, K8s)

**Improvements Needed:**
- Add resource limits
- Add health probes
- Use Secrets (not ConfigMaps) for sensitive data
- Implement HA with multiple replicas
- Add HPA (Horizontal Pod Autoscaler)

## Recommendation: Production-Ready K8s

### Implementation Timeline: 10 weeks

**Phase 1 (Weeks 1-2)**: Core K8s manifests  
**Phase 2 (Weeks 3-4)**: Helm charts  
**Phase 3 (Weeks 5-6)**: Auto-scaling & monitoring  
**Phase 4 (Weeks 7-8)**: CI/CD integration  
**Phase 5 (Weeks 9-10)**: Testing & migration  

## Directory Structure

```
rag_modulo/
├── deployment/
│   ├── k8s/
│   │   ├── base/
│   │   │   ├── namespace.yaml
│   │   │   ├── configmaps/
│   │   │   ├── secrets/
│   │   │   ├── deployments/
│   │   │   ├── statefulsets/
│   │   │   ├── services/
│   │   │   ├── ingress/
│   │   │   ├── storage/
│   │   │   └── jobs/
│   │   └── overlays/ (dev/staging/prod)
│   ├── helm/
│   │   └── rag-modulo/
│   │       ├── Chart.yaml
│   │       ├── values.yaml
│   │       └── templates/
│   └── scripts/
└── .github/workflows/
    └── k8s-deploy-*.yml
```

## Key Implementations

### 1. Backend Deployment with HA

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-modulo-backend
  namespace: rag-modulo
spec:
  replicas: 3  # High Availability
  strategy:
    type: RollingUpdate
  template:
    spec:
      containers:
      - name: backend
        image: ghcr.io/manavgup/rag_modulo/backend:latest
        resources:
          requests:
            memory: "2Gi"
            cpu: "1000m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8000
          initialDelaySeconds: 30
```

### 2. PostgreSQL StatefulSet

```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: rag-modulo
spec:
  serviceName: postgres-service
  replicas: 1
  volumeClaimTemplates:
  - metadata:
      name: postgres-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 50Gi
```

### 3. Horizontal Pod Autoscaler

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    kind: Deployment
    name: rag-modulo-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

### 4. Ingress with TLS

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: rag-modulo-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - rag-modulo.example.com
    secretName: rag-modulo-tls
  rules:
  - host: rag-modulo.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 8080
```

### 5. Helm Chart

```yaml
# values.yaml
backend:
  enabled: true
  replicaCount: 3
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
  resources:
    requests:
      memory: "2Gi"
      cpu: "1000m"

postgresql:
  enabled: true
  persistence:
    enabled: true
    size: 50Gi

milvus:
  enabled: true
  persistence:
    enabled: true
    size: 100Gi
```

### 6. CI/CD Integration

```yaml
# .github/workflows/k8s-deploy-production.yml
name: Deploy to K8s

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Build and push images
      # ... docker build/push
    
    - name: Deploy with Helm
      run: |
        helm upgrade --install rag-modulo ./deployment/helm/rag-modulo \
          --namespace rag-modulo \
          --values values-production.yaml
    
    - name: Verify deployment
      run: |
        kubectl rollout status deployment/rag-modulo-backend
```

## Files to Create (~5000 lines)

### K8s Manifests (~2500 lines)
1. Namespace, ConfigMaps, Secrets
2. Deployments (Backend, Frontend, MLFlow)
3. StatefulSets (PostgreSQL, Milvus, MinIO)
4. Services (6 services)
5. Ingress, Storage (4 PVCs)
6. HPA, Monitoring

### Helm Charts (~1500 lines)
7. Chart.yaml, values.yaml (dev/staging/prod)
8. Templates (all K8s resources)

### CI/CD (~500 lines)
9. GitHub Actions workflows (3 environments)
10. Deployment scripts

### Documentation (~500 lines)
11. K8s deployment guide
12. Helm usage guide
13. Troubleshooting guide

## Success Criteria

**Functional:**
- [ ] All services deploy to K8s
- [ ] Application accessible via ingress
- [ ] Database persistence working
- [ ] Auto-scaling responding to load
- [ ] Zero-downtime rolling updates

**Non-Functional:**
- [ ] Deployment time < 10 minutes
- [ ] Resource utilization optimized
- [ ] Monitoring dashboards operational
- [ ] CI/CD pipeline working

**Performance:**
- [ ] Backend scales 2-10 pods
- [ ] Response times < 500ms p95
- [ ] No memory leaks over 24h

## OpenShift Considerations

1. **Security Context Constraints**
2. **Routes instead of Ingress**
3. **Use standard Deployments (not DeploymentConfig)**

## IBM Cloud Considerations

1. **Storage**: Use `ibmc-block-gold`
2. **Ingress**: IBM Cloud ALB annotations
3. **Monitoring**: IBM Cloud Monitoring (Sysdig)
4. **Logging**: IBM Log Analysis (LogDNA)

## Migration Plan

**Phase 1**: Parallel run (10% traffic to K8s)  
**Phase 2**: Gradual shift (50% traffic)  
**Phase 3**: Full migration (100% traffic)  
**Phase 4**: Decommission Docker Compose

## Related Issues

- Related: #256 - RAG Evolution Epic
- Related: #255 - Docling Integration

## Effort Estimate

**Total**: 10 weeks (1-2 FTE)
- Phase 1: 2 weeks
- Phase 2: 2 weeks
- Phase 3: 2 weeks
- Phase 4: 2 weeks
- Phase 5: 2 weeks

**Priority**: High - Required for production deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infrastructure: Kubernetes/OpenShift Deployment Strategy #260

Infrastructure: Kubernetes/OpenShift Deployment Strategy

Overview

Current State Analysis

Existing Infrastructure

IBM MCP Context Forge Learnings

Recommendation: Production-Ready K8s

Implementation Timeline: 10 weeks

Directory Structure

Key Implementations

1. Backend Deployment with HA

2. PostgreSQL StatefulSet

3. Horizontal Pod Autoscaler

4. Ingress with TLS

5. Helm Chart

6. CI/CD Integration

Files to Create (~5000 lines)

K8s Manifests (~2500 lines)

Helm Charts (~1500 lines)

CI/CD (~500 lines)

Documentation (~500 lines)

Success Criteria

OpenShift Considerations

IBM Cloud Considerations

Migration Plan

Related Issues

Effort Estimate

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Infrastructure: Kubernetes/OpenShift Deployment Strategy #260

Description

Infrastructure: Kubernetes/OpenShift Deployment Strategy

Overview

Current State Analysis

Existing Infrastructure

IBM MCP Context Forge Learnings

Recommendation: Production-Ready K8s

Implementation Timeline: 10 weeks

Directory Structure

Key Implementations

1. Backend Deployment with HA

2. PostgreSQL StatefulSet

3. Horizontal Pod Autoscaler

4. Ingress with TLS

5. Helm Chart

6. CI/CD Integration

Files to Create (~5000 lines)

K8s Manifests (~2500 lines)

Helm Charts (~1500 lines)

CI/CD (~500 lines)

Documentation (~500 lines)

Success Criteria

OpenShift Considerations

IBM Cloud Considerations

Migration Plan

Related Issues

Effort Estimate

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions