diff --git a/docs/guidance/pod-security.md b/docs/guidance/pod-security.md new file mode 100644 index 00000000000..a4bcd271227 --- /dev/null +++ b/docs/guidance/pod-security.md @@ -0,0 +1,121 @@ +# Pod Security + +Kubernetes defines three different Pod Security Standards, including `privileged`, `baseline`, and `restricted`, to broadly +cover the security spectrum. The `privileged` standard allows users to do known privilege escalations, and thus it is not +safe enough for security-critical applications. + +This document describes how to configure RayCluster YAML file to apply `restricted` Pod security standard. The following +references can help you understand this document better: + +* [Kubernetes - Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted) +* [Kubernetes - Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) +* [Kubernetes - Auditing](https://kubernetes.io/docs/tasks/debug/debug-cluster/audit/) +* [KinD - Auditing](https://kind.sigs.k8s.io/docs/user/auditing/) + +# Step 1: Create a KinD cluster +```bash +# Path: ray-operator/config/security +kind create cluster --config kind-config.yaml --image=kindest/node:v1.24.0 +``` +The `kind-config.yaml` enables audit logging with the audit policy defined in `audit-policy.yaml`. The `audit-policy.yaml` +defines an auditing policy to listen to the Pod events in the namespace `pod-security`. With this policy, we can check +whether our Pods violate the policies in `restricted` standard or not. + +The feature [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) is firstly +introduced in Kubernetes v1.22 (alpha) and becomes stable in Kubernetes v1.25. In addition, KubeRay currently supports +Kubernetes from v1.19 to v1.24. (At the time of writing, we have not tested KubeRay with Kubernetes v1.25). Hence, I use **Kubernetes v1.24** in this step. + +# Step 2: Check the audit logs +```bash +docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log +``` +The log should be empty because the namespace `pod-security` does not exist. + +# Step 3: Create the `pod-security` namespace +```bash +kubectl create ns pod-security +kubectl label --overwrite ns pod-security \ + pod-security.kubernetes.io/warn=restricted \ + pod-security.kubernetes.io/warn-version=latest \ + pod-security.kubernetes.io/audit=restricted \ + pod-security.kubernetes.io/audit-version=latest \ + pod-security.kubernetes.io/enforce=restricted \ + pod-security.kubernetes.io/enforce-version=latest +``` +With the `pod-security.kubernetes.io` labels, the built-in Kubernetes Pod security admission controller will apply the +`restricted` Pod security standard to all Pods in the namespace `pod-security`. The label +`pod-security.kubernetes.io/enforce=restricted` means that the Pod will be rejected if it violate the policies defined in +`restricted` security standard. See [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/) for more details about the labels. + +# Step 4: Install the KubeRay operator +```bash +# Update the field securityContext in helm-chart/kuberay-operator/values.yaml +securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + +# Path: helm-chart/kuberay-operator +helm install -n pod-security kuberay-operator . +``` + +# Step 5: Create a RayCluster (Choose either Step 5.1 or Step 5.2) +* If you choose Step 5.1, no Pod will be created in the namespace `pod-security`. +* If you choose Step 5.2, Pods can be created successfully. + +## Step 5.1: Create a RayCluster without proper `securityContext` configurations +```bash +# Path: ray-operator/config/samples +kubectl apply -n pod-security -f ray-cluster.complete.yaml + +# Wait 20 seconds and check audit logs for the error messages. +docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log + +# Example error messagess +# "pods \"raycluster-complete-head-fkbf5\" is forbidden: violates PodSecurity \"restricted:latest\": allowPrivilegeEscalation != false (container \"ray-head\" must set securityContext.allowPrivilegeEscalation=false) ... + +kubectl get pod -n pod-security +# NAME READY STATUS RESTARTS AGE +# kuberay-operator-8b6d55dbb-t8msf 1/1 Running 0 62s + +# Clean up the RayCluster +kubectl delete rayclusters.ray.io -n pod-security raycluster-complete +# raycluster.ray.io "raycluster-complete" deleted +``` +No Pod will be created in the namespace `pod-security`, and check audit logs for error messages. + +## Step 5.2: Create a RayCluster with proper `securityContext` configurations +```bash +# Path: ray-operator/config/security +kubectl apply -n pod-security -f ray-cluster.pod-security.yaml + +# Wait for the RayCluster convergence and check audit logs for the messages. +docker exec kind-control-plane cat /var/log/kubernetes/kube-apiserver-audit.log + +# Forward the dashboard port +kubectl port-forward --address 0.0.0.0 svc/raycluster-pod-security-head-svc -n pod-security 8265:8265 + +# Log in to the head Pod +kubectl exec -it -n pod-security ${YOUR_HEAD_POD} -- bash + +# (Head Pod) Run a sample job in the Pod +python3 samples/xgboost_example.py + +# Check the job status in the dashboard on your browser. +# http://127.0.0.1:8265/#/job => The job status should be "SUCCEEDED". + +# (Head Pod) Make sure Python dependencies can be installed under `restricted` security standard +pip3 install jsonpatch +echo $? # Check the exit code of `pip3 install jsonpatch`. It should be 0. + +# Clean up the RayCluster +kubectl delete -n pod-security -f ray-cluster.pod-security.yaml +# raycluster.ray.io "raycluster-pod-security" deleted +# configmap "xgboost-example" deleted +``` +One head Pod and one worker Pod will be created as specified in `ray-cluster.pod-security.yaml`. +First, we log in to the head Pod, run a XGBoost example script, and check the job +status in the dashboard. Next, we use `pip` to install a Python dependency (i.e. `jsonpatch`), and the exit code of the `pip` command should be 0. diff --git a/helm-chart/kuberay-operator/values.yaml b/helm-chart/kuberay-operator/values.yaml index 36e74d344da..f19843527a9 100644 --- a/helm-chart/kuberay-operator/values.yaml +++ b/helm-chart/kuberay-operator/values.yaml @@ -58,3 +58,7 @@ rbacEnable: true batchScheduler: enabled: false + +# Set up `securityContext` to improve Pod security. +# See https://github.com/ray-project/kuberay/blob/master/docs/guidance/pod-security.md for further guidance. +securityContext: {} diff --git a/ray-operator/config/samples/ray-cluster.complete.yaml b/ray-operator/config/samples/ray-cluster.complete.yaml index 85b709353d2..14ba4b1f44f 100644 --- a/ray-operator/config/samples/ray-cluster.complete.yaml +++ b/ray-operator/config/samples/ray-cluster.complete.yaml @@ -11,8 +11,7 @@ metadata: name: raycluster-complete spec: rayVersion: '2.1.0' - ######################headGroupSpec################################# - # Ray head pod template and specs + # Ray head pod configuration headGroupSpec: # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer' serviceType: ClusterIP diff --git a/ray-operator/config/security/audit-policy.yaml b/ray-operator/config/security/audit-policy.yaml new file mode 100644 index 00000000000..13ee2ee0f07 --- /dev/null +++ b/ray-operator/config/security/audit-policy.yaml @@ -0,0 +1,15 @@ +apiVersion: audit.k8s.io/v1 # This is required. +kind: Policy +# Don't generate audit events for all requests in RequestReceived stage. +omitStages: + - "RequestReceived" +rules: + # Log pod changes at RequestResponse level + - level: Metadata + resources: + - group: "" + # Resource "pods" doesn't match requests to any subresource of pods, + # which is consistent with the RBAC policy. + resources: ["pods"] + # This rule only applies to resources in the "pod-security" namespace. + namespaces: ["pod-security"] diff --git a/ray-operator/config/security/kind-config.yaml b/ray-operator/config/security/kind-config.yaml new file mode 100644 index 00000000000..05426fcc358 --- /dev/null +++ b/ray-operator/config/security/kind-config.yaml @@ -0,0 +1,29 @@ +kind: Cluster +apiVersion: kind.x-k8s.io/v1alpha4 +nodes: +- role: control-plane + kubeadmConfigPatches: + - | + kind: ClusterConfiguration + apiServer: + # enable auditing flags on the API server + extraArgs: + audit-log-path: /var/log/kubernetes/kube-apiserver-audit.log + audit-policy-file: /etc/kubernetes/policies/audit-policy.yaml + # mount new files / directories on the control plane + extraVolumes: + - name: audit-policies + hostPath: /etc/kubernetes/policies + mountPath: /etc/kubernetes/policies + readOnly: true + pathType: "DirectoryOrCreate" + - name: "audit-logs" + hostPath: "/var/log/kubernetes" + mountPath: "/var/log/kubernetes" + readOnly: false + pathType: DirectoryOrCreate + # mount the local file on the control plane + extraMounts: + - hostPath: ./audit-policy.yaml + containerPath: /etc/kubernetes/policies/audit-policy.yaml + readOnly: true diff --git a/ray-operator/config/security/ray-cluster.pod-security.yaml b/ray-operator/config/security/ray-cluster.pod-security.yaml new file mode 100644 index 00000000000..60dec71c645 --- /dev/null +++ b/ray-operator/config/security/ray-cluster.pod-security.yaml @@ -0,0 +1,175 @@ +# The resource requests and limits in this config are too small for production! +# For examples with more realistic resource configuration, see +# ray-cluster.complete.large.yaml and +# ray-cluster.autoscaler.large.yaml. +apiVersion: ray.io/v1alpha1 +kind: RayCluster +metadata: + labels: + controller-tools.k8s.io: "1.0" + # A unique identifier for the head node and workers of this cluster. + name: raycluster-pod-security +spec: + rayVersion: '2.1.0' + # Ray head pod configuration + headGroupSpec: + # Kubernetes Service Type, valid values are 'ClusterIP', 'NodePort' and 'LoadBalancer' + serviceType: ClusterIP + # for the head group, replicas should always be 1. + # headGroupSpec.replicas is deprecated in KubeRay >= 0.3.0. + replicas: 1 + # the following params are used to complete the ray start: ray start --head --block --dashboard-host: '0.0.0.0' ... + rayStartParams: + dashboard-host: '0.0.0.0' + block: 'true' + #pod template + template: + spec: + containers: + - name: ray-head + image: rayproject/ray-ml:2.1.0 + ports: + - containerPort: 6379 + name: gcs + - containerPort: 8265 + name: dashboard + - containerPort: 10001 + name: client + lifecycle: + preStop: + exec: + command: ["/bin/sh","-c","ray stop"] + volumeMounts: + - mountPath: /tmp/ray + name: ray-logs + - mountPath: /home/ray/samples + name: ray-example-configmap + resources: + limits: + cpu: 1 + memory: 2Gi + requests: + cpu: 1 + memory: 2Gi + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + volumes: + - name: ray-logs + emptyDir: {} + - name: ray-example-configmap + configMap: + name: ray-example + # An array of keys from the ConfigMap to create as files + items: + - key: xgboost_example.py + path: xgboost_example.py + workerGroupSpecs: + # the pod replicas in this group typed worker + - replicas: 1 + minReplicas: 1 + maxReplicas: 10 + # logical group name, for this called large-group, also can be functional + groupName: large-group + # if worker pods need to be added, we can simply increment the replicas + # if worker pods need to be removed, we decrement the replicas, and populate the podsToDelete list + # the operator will remove pods from the list until the number of replicas is satisfied + # when a pod is confirmed to be deleted, its name will be removed from the list below + #scaleStrategy: + # workersToDelete: + # - raycluster-complete-worker-large-group-bdtwh + # - raycluster-complete-worker-large-group-hv457 + # - raycluster-complete-worker-large-group-k8tj7 + # the following params are used to complete the ray start: ray start --block + rayStartParams: + block: 'true' + #pod template + template: + spec: + containers: + - name: ray-worker + image: rayproject/ray-ml:2.1.0 + # environment variables to set in the container.Optional. + # Refer to https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ + lifecycle: + preStop: + exec: + command: ["/bin/sh","-c","ray stop"] + # use volumeMounts.Optional. + # Refer to https://kubernetes.io/docs/concepts/storage/volumes/ + volumeMounts: + - mountPath: /tmp/ray + name: ray-logs + resources: + limits: + cpu: 4 + memory: 2Gi + requests: + cpu: 1 + memory: 2Gi + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + initContainers: + # the env var $RAY_IP is set by the operator if missing, with the value of the head service name + - name: init-myservice + image: busybox:1.28 + # Change the cluster postfix if you don't have a default setting + command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"] + securityContext: + runAsUser: 1000 + allowPrivilegeEscalation: false + capabilities: + drop: ["ALL"] + runAsNonRoot: true + seccompProfile: + type: RuntimeDefault + # use volumes + # Refer to https://kubernetes.io/docs/concepts/storage/volumes/ + volumes: + - name: ray-logs + emptyDir: {} +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: ray-example +data: + xgboost_example.py: | + import ray + from ray.train.xgboost import XGBoostTrainer + from ray.air.config import ScalingConfig + + # Load data. + dataset = ray.data.read_csv("s3://anonymous@air-example-data/breast_cancer.csv") + + # Split data into train and validation. + train_dataset, valid_dataset = dataset.train_test_split(test_size=0.3) + + trainer = XGBoostTrainer( + scaling_config=ScalingConfig( + # Number of workers to use for data parallelism. + num_workers=1, + # Whether to use GPU acceleration. + use_gpu=False, + ), + label_column="target", + num_boost_round=20, + params={ + # XGBoost specific params + "objective": "binary:logistic", + # "tree_method": "gpu_hist", # uncomment this to use GPU for training + "eval_metric": ["logloss", "error"], + }, + datasets={"train": train_dataset, "valid": valid_dataset}, + ) + result = trainer.fit() + print(result.metrics)