[Feature] Ray restricted podsecuritystandards for enterprise security…

… and Kubeflow integration (ray-project#750) Kubernetes defines three different Pod Security Standards, including privileged, baseline, and restricted, to broadly cover the security spectrum. The privileged standard allows users to do known privilege escalations, and thus it is not safe enough for security-critical applications. This PR describes how to configure RayCluster YAML file to apply restricted Pod security standard. Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
lowang-bh · Dec 8, 2022 · 1e93ed1 · 1e93ed1
1 parent 8683f59
commit 1e93ed1
Show file tree

Hide file tree

Showing 6 changed files with 345 additions and 2 deletions.
diff --git a/docs/guidance/pod-security.md b/docs/guidance/pod-security.md
@@ -0,0 +1,121 @@
+# Pod Security
+
+Kubernetes defines three different Pod Security Standards, including `privileged`, `baseline`, and `restricted`, to broadly
+cover the security spectrum. The `privileged` standard allows users to do known privilege escalations, and thus it is not 
+safe enough for security-critical applications.
+
+This document describes how to configure RayCluster YAML file to apply `restricted` Pod security standard. The following 
+references can help you understand this document better:
+
+* [Kubernetes - Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted)
+* [Kubernetes - Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/)
+* [Kubernetes - Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/)
+* [KinD - Auditing](https://kind.sigs.k8s.io/docs/user/auditing/)
+
+# Step 1: Create a KinD cluster
+```bash
+# Path: ray-operator/config/security
+kind create cluster --config kind-config.yaml --image=kindest/node:v1.24.0
+```
+The `kind-config.yaml` enables audit logging with the audit policy defined in `audit-policy.yaml`. The `audit-policy.yaml`
+defines an auditing policy to listen to the Pod events in the namespace `pod-security`. With this policy, we can check
+whether our Pods violate the policies in `restricted` standard or not.
+
+The feature [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) is firstly 
+introduced in Kubernetes v1.22 (alpha) and becomes stable in Kubernetes v1.25. In addition, KubeRay currently supports 
+Kubernetes from v1.19 to v1.24. (At the time of writing, we have not tested KubeRay with Kubernetes v1.25). Hence, I use **Kubernetes v1.24** in this step.
+
+# Step 2: Check the audit logs
+```bash
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+```
+The log should be empty because the namespace `pod-security` does not exist.
+
+# Step 3: Create the `pod-security` namespace
+```bash
+kubectl create ns pod-security
+kubectl label --overwrite ns pod-security \
+  pod-security.kubernetes.io/warn=restricted \
+  pod-security.kubernetes.io/warn-version=latest \
+  pod-security.kubernetes.io/audit=restricted \
+  pod-security.kubernetes.io/audit-version=latest \
+  pod-security.kubernetes.io/enforce=restricted \
+  pod-security.kubernetes.io/enforce-version=latest
+```
+With the `pod-security.kubernetes.io` labels, the built-in Kubernetes Pod security admission controller will apply the 
+`restricted` Pod security standard to all Pods in the namespace `pod-security`. The label
+`pod-security.kubernetes.io/enforce=restricted` means that the Pod will be rejected if it violate the policies defined in 
+`restricted` security standard. See [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) for more details about the labels.
+
+# Step 4: Install the KubeRay operator
+```bash
+# Update the field securityContext in helm-chart/kuberay-operator/values.yaml
+securityContext:
+  allowPrivilegeEscalation: false
+  capabilities:
+    drop: ["ALL"]
+  runAsNonRoot: true
+  seccompProfile:
+    type: RuntimeDefault
+
+# Path: helm-chart/kuberay-operator
+helm install -n pod-security kuberay-operator .
+```
+
+# Step 5: Create a RayCluster (Choose either Step 5.1 or Step 5.2)
+* If you choose Step 5.1, no Pod will be created in the namespace `pod-security`.
+* If you choose Step 5.2, Pods can be created successfully.
+
+## Step 5.1: Create a RayCluster without proper `securityContext` configurations
+```bash
+# Path: ray-operator/config/samples
+kubectl apply -n pod-security -f ray-cluster.complete.yaml
+
+# Wait 20 seconds and check audit logs for the error messages.
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+
+# Example error messagess
+# "pods \"raycluster-complete-head-fkbf5\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"ray-head\" must set securityContext.allowPrivilegeEscalation=false) ...
+
+kubectl get pod -n pod-security
+# NAME                               READY   STATUS    RESTARTS   AGE
+# kuberay-operator-8b6d55dbb-t8msf   1/1     Running   0          62s
+
+# Clean up the RayCluster
+kubectl delete rayclusters.ray.io -n pod-security raycluster-complete
+# raycluster.ray.io "raycluster-complete" deleted
+```
+No Pod will be created in the namespace `pod-security`, and check audit logs for error messages.
+
+## Step 5.2: Create a RayCluster with proper `securityContext` configurations
+```bash
+# Path: ray-operator/config/security
+kubectl apply -n pod-security -f ray-cluster.pod-security.yaml
+
+# Wait for the RayCluster convergence and check audit logs for the messages.
+docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log
+
+# Forward the dashboard port
+kubectl port-forward --address 0.0.0.0 svc/raycluster-pod-security-head-svc -n pod-security 8265:8265
+
+# Log in to the head Pod
+kubectl exec -it -n pod-security ${YOUR_HEAD_POD} -- bash
+
+# (Head Pod) Run a sample job in the Pod
+python3 samples/xgboost_example.py
+
+# Check the job status in the dashboard on your browser.
+# http://127.0.0.1:8265/#/job => The job status should be "SUCCEEDED".
+
+# (Head Pod) Make sure Python dependencies can be installed under `restricted` security standard 
+pip3 install jsonpatch
+echo $? # Check the exit code of `pip3 install jsonpatch`. It should be 0.
+
+# Clean up the RayCluster
+kubectl delete -n pod-security -f ray-cluster.pod-security.yaml
+# raycluster.ray.io "raycluster-pod-security" deleted
+# configmap "xgboost-example" deleted
+```
+One head Pod and one worker Pod will be created as specified in `ray-cluster.pod-security.yaml`.
+First, we log in to the head Pod, run a XGBoost example script, and check the job
+status in the dashboard. Next, we use `pip` to install a Python dependency (i.e. `jsonpatch`), and the exit code of the `pip` command should be 0.
diff --git a/helm-chart/kuberay-operator/values.yaml b/helm-chart/kuberay-operator/values.yaml
@@ -58,3 +58,7 @@ rbacEnable: true
 
 batchScheduler:
   enabled: false
+
+# Set up `securityContext` to improve Pod security.
+# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/pod-security.md for further guidance.
+securityContext: {}
diff --git a/ray-operator/config/samples/ray-cluster.complete.yaml b/ray-operator/config/samples/ray-cluster.complete.yaml
@@ -11,8 +11,7 @@ metadata:
   name: raycluster-complete
 spec:
   rayVersion: '2.1.0'
-  ######################headGroupSpec#################################
-  # Ray head pod template and specs
+  # Ray head pod configuration
   headGroupSpec:
     # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer'
     serviceType: ClusterIP

diff --git a/ray-operator/config/security/audit-policy.yaml b/ray-operator/config/security/audit-policy.yaml
@@ -0,0 +1,15 @@
+apiVersion: audit.k8s.io/v1 # This is required.
+kind: Policy
+# Don't generate audit events for all requests in RequestReceived stage.
+omitStages:
+  - "RequestReceived"
+rules:
+  # Log pod changes at RequestResponse level
+  - level: Metadata
+    resources:
+    - group: ""
+      # Resource "pods" doesn't match requests to any subresource of pods,
+      # which is consistent with the RBAC policy.
+      resources: ["pods"]
+    # This rule only applies to resources in the "pod-security" namespace.
+    namespaces: ["pod-security"]
diff --git a/ray-operator/config/security/kind-config.yaml b/ray-operator/config/security/kind-config.yaml
@@ -0,0 +1,29 @@
+kind: Cluster
+apiVersion: kind.x-k8s.io/v1alpha4
+nodes:
+- role: control-plane
+  kubeadmConfigPatches:
+  - |
+    kind: ClusterConfiguration
+    apiServer:
+        # enable auditing flags on the API server
+        extraArgs:
+          audit-log-path: /var/log/kubernetes/kube-apiserver-audit.log
+          audit-policy-file: /etc/kubernetes/policies/audit-policy.yaml
+        # mount new files / directories on the control plane
+        extraVolumes:
+          - name: audit-policies
+            hostPath: /etc/kubernetes/policies
+            mountPath: /etc/kubernetes/policies
+            readOnly: true
+            pathType: "DirectoryOrCreate"
+          - name: "audit-logs"
+            hostPath: "/var/log/kubernetes"
+            mountPath: "/var/log/kubernetes"
+            readOnly: false
+            pathType: DirectoryOrCreate
+  # mount the local file on the control plane
+  extraMounts:
+  - hostPath: ./audit-policy.yaml
+    containerPath: /etc/kubernetes/policies/audit-policy.yaml
+    readOnly: true
diff --git a/ray-operator/config/security/ray-cluster.pod-security.yaml b/ray-operator/config/security/ray-cluster.pod-security.yaml
@@ -0,0 +1,175 @@
+# The resource requests and limits in this config are too small for production!
+# For examples with more realistic resource configuration, see
+# ray-cluster.complete.large.yaml and
+# ray-cluster.autoscaler.large.yaml.
+apiVersion: ray.io/v1alpha1
+kind: RayCluster
+metadata:
+  labels:
+    controller-tools.k8s.io: "1.0"
+    # A unique identifier for the head node and workers of this cluster.
+  name: raycluster-pod-security
+spec:
+  rayVersion: '2.1.0'
+  # Ray head pod configuration
+  headGroupSpec:
+    # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer'
+    serviceType: ClusterIP
+    # for the head group, replicas should always be 1.
+    # headGroupSpec.replicas is deprecated in KubeRay >= 0.3.0.
+    replicas: 1
+    # the following params are used to complete the ray start: ray start --head --block --dashboard-host: '0.0.0.0' ...
+    rayStartParams:
+      dashboard-host: '0.0.0.0'
+      block: 'true'
+    #pod template
+    template:
+      spec:
+        containers:
+        - name: ray-head
+          image: rayproject/ray-ml:2.1.0
+          ports:
+          - containerPort: 6379
+            name: gcs
+          - containerPort: 8265
+            name: dashboard
+          - containerPort: 10001
+            name: client
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh","-c","ray stop"]
+          volumeMounts:
+            - mountPath: /tmp/ray
+              name: ray-logs
+            - mountPath: /home/ray/samples
+              name: ray-example-configmap
+          resources:
+            limits:
+              cpu: 1
+              memory: 2Gi
+            requests:
+              cpu: 1
+              memory: 2Gi
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        volumes:
+          - name: ray-logs
+            emptyDir: {}
+          - name: ray-example-configmap
+            configMap:
+              name: ray-example
+              # An array of keys from the ConfigMap to create as files
+              items:
+                - key: xgboost_example.py
+                  path: xgboost_example.py
+  workerGroupSpecs:
+  # the pod replicas in this group typed worker
+  - replicas: 1
+    minReplicas: 1
+    maxReplicas: 10
+    # logical group name, for this called large-group, also can be functional
+    groupName: large-group
+    # if worker pods need to be added, we can simply increment the replicas
+    # if worker pods need to be removed, we decrement the replicas, and populate the podsToDelete list
+    # the operator will remove pods from the list until the number of replicas is satisfied
+    # when a pod is confirmed to be deleted, its name will be removed from the list below
+    #scaleStrategy:
+    #  workersToDelete:
+    #  - raycluster-complete-worker-large-group-bdtwh
+    #  - raycluster-complete-worker-large-group-hv457
+    #  - raycluster-complete-worker-large-group-k8tj7 
+    # the following params are used to complete the ray start: ray start --block
+    rayStartParams:
+      block: 'true'
+    #pod template
+    template:
+      spec:
+        containers:
+        - name: ray-worker
+          image: rayproject/ray-ml:2.1.0
+          # environment variables to set in the container.Optional.
+          # Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
+          lifecycle:
+            preStop:
+              exec:
+                command: ["/bin/sh","-c","ray stop"]
+          # use volumeMounts.Optional.
+          # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
+          volumeMounts:
+            - mountPath: /tmp/ray
+              name: ray-logs
+          resources:
+            limits:
+              cpu: 4
+              memory: 2Gi
+            requests:
+              cpu: 1
+              memory: 2Gi
+          securityContext:
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        initContainers:
+        # the env var $RAY_IP is set by the operator if missing, with the value of the head service name
+        - name: init-myservice
+          image: busybox:1.28
+          # Change the cluster postfix if you don't have a default setting
+          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
+          securityContext:
+            runAsUser: 1000
+            allowPrivilegeEscalation: false
+            capabilities:
+              drop: ["ALL"]
+            runAsNonRoot: true
+            seccompProfile:
+              type: RuntimeDefault
+        # use volumes
+        # Refer to https://kubernetes.io/docs/concepts/storage/volumes/
+        volumes:
+          - name: ray-logs
+            emptyDir: {}
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: ray-example
+data:
+  xgboost_example.py: |
+    import ray
+    from ray.train.xgboost import XGBoostTrainer
+    from ray.air.config import ScalingConfig
+
+    # Load data.
+    dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv")
+
+    # Split data into train and validation.
+    train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3)
+
+    trainer = XGBoostTrainer(
+        scaling_config=ScalingConfig(
+            # Number of workers to use for data parallelism.
+            num_workers=1,
+            # Whether to use GPU acceleration.
+            use_gpu=False,
+        ),
+        label_column="target",
+        num_boost_round=20,
+        params={
+            # XGBoost specific params
+            "objective": "binary:logistic",
+            # "tree_method": "gpu_hist",  # uncomment this to use GPU for training
+            "eval_metric": ["logloss", "error"],
+        },
+        datasets={"train": train_dataset, "valid": valid_dataset},
+    )
+    result = trainer.fit()
+    print(result.metrics)