Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Istio #1005

Closed
2 tasks done
kevin85421 opened this issue Apr 3, 2023 · 14 comments
Closed
2 tasks done

[Feature] Support Istio #1005

kevin85421 opened this issue Apr 3, 2023 · 14 comments
Assignees
Labels
enhancement New feature or request

Comments

@kevin85421
Copy link
Member

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Support Istio with KubeRay

Use case

kubeflow/manifests#2383

Related issues

#948

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421 kevin85421 added the enhancement New feature or request label Apr 3, 2023
@kevin85421 kevin85421 self-assigned this Apr 3, 2023
@juliusvonkohout
Copy link

@kevin85421 are there any internal plans to support this?

@kevin85421
Copy link
Member Author

cc @Yicheng-Lu-llll

@sip-aravind-g
Copy link

@kevin85421 if this supports then could it be possible to submit the jobs using external DNS ? I have tried but no luck.

I have updated my query here :
https://discuss.ray.io/t/need-assistance-to-submit-job-using-external-dns/13354

@rueian
Copy link
Contributor

rueian commented Apr 7, 2024

Hi @kevin85421,

After digging into this for a couple of days, I have managed to make Ray work with Istio on three different levels with some modifications in KubeRay.

Level 1: Plain TCP

This is the basic level of Istio integration that doesn't do any traffic encryption and should work out of the box with the current KubeRay.

One thing to note is that, with the latest Istio 1.21.0, users MUST apply the following Istio config to let the Istio sidecar proxy the plain grpc traffic correctly, or the Ray worker node won't be able to start.

  meshConfig:
    defaultConfig:
      runtimeValues:
        envoy.reloadable_features.sanitize_te: "false"

There is an issue opened for this istio/istio#49685

Level 2: mTLS

This is the second level integration that Istio must learn L4 information about Ray to do traffic encryption. That is when a TCP packet goes in or out of an Istio sidecar, Istio must be able to know where it does come from or go from its source or destination ip+port tuple. And, In Ray, a pod will connect to another pod directly without the help of a virtual IP like a normal k8s service. So, we must apply a headless service to all Ray pods to let Istio know these L4 information correctly, for example:

apiVersion: v1
kind: Service
metadata:
  labels:
    ray.io/headless-worker-svc: raycluster-mini
  name: raycluster-mini-headless-svc
  namespace: default
spec:
  clusterIP: None
  selector:
    ray.io/cluster: raycluster-mini
  ports:
  - name: node-manager-port
    port: 6380
  - name: object-manager-port
    port: 6381
  - name: runtime-env-agent-port
    port: 6382
  - name: dashboard-agent-grpc-port
    port: 6383
  - name: dashboard-agent-listen-port
    port: 52365
  - name: metrics-export-port
    port: 8080
  - name: p10002
    port: 10002
  - name: p10003
    port: 10003
  - name: p10004
    port: 10004
  - name: p10005
    port: 10005
    ...

All the ports listed in https://docs.ray.io/en/latest/ray-core/configure.html#all-nodes, including worker ports, MUST be explicitly configured in the rayStartParams and carried to the headless service.

Furthermore, if the Istio is operated in the STRICT mode, there are two more changes should be done on the KubeRay side

  1. Disable the wait-gcs-ready init container by setting the env ENABLE_INIT_CONTAINER_INJECTION=false. This init container will not be able to connect to the Ray GCS in Istio STRICT mode and will never be finished because it starts before the Istio sidecar.
  2. KubeRay controller talks to the underlying RayCluster of a RayService and RayJob directly. In the case of Istio STRICT mode, the controller should also have an Istio sidecar deployed or it should talk to the RayCluster through the kubeapi proxy.

Level 3: L7 Visibilities

This is the strictest level of integration that Istio knows L7 details about Ray's internal traffic. We not only need to set appProtocol correctly in the above headless service but also let Ray use FQDN hostname to communicate with each other internally because Istio uses the Host/Authority header for L7 routing. That is, setting the --node-ip-address to every Ray node is needed.

Here is a quick and dirty hack to the KubeRay to enable mTLS+L7 Visibilities for a RayCluster named raycluster-mini:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-mini
spec:
  rayVersion: '2.9.0'
  headGroupSpec:
    rayStartParams:
      num-cpus: '1'
      node-manager-port: '6380'
      object-manager-port: '6381'
      runtime-env-agent-port: '6382'
      dashboard-agent-grpc-port: '6383'
      dashboard-grpc-port: '6384'
    #pod template
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.9.0
          ports:
          - containerPort: 6379
            name: redis
          - containerPort: 8265 # Ray dashboard
            name: dashboard
          - containerPort: 10001
            name: client
  workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 1
      groupName: small-group
      rayStartParams:
        num-cpus: '1'
        node-manager-port: '6380'
        object-manager-port: '6381'
        runtime-env-agent-port: '6382'
        dashboard-agent-grpc-port: '6383'
      #pod template
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.9.0

The hack:

diff --git a/ray-operator/controllers/ray/common/pod.go b/ray-operator/controllers/ray/common/pod.go
index 0182ead..b79f7e5 100644
--- a/ray-operator/controllers/ray/common/pod.go
+++ b/ray-operator/controllers/ray/common/pod.go
@@ -339,6 +339,7 @@ func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNo
 	ulimitCmd := "ulimit -n 65536"
 	// Generate the `ray start` command.
 	rayStartCmd := generateRayStartCommand(ctx, rayNodeType, rayStartParams, pod.Spec.Containers[utils.RayContainerIndex].Resources)
+	rayStartCmd += ` --node-ip-address $(echo "$POD_IP" | sed 's/\./-/g').raycluster-mini-headless-svc.default.svc.cluster.local`
 
 	// Check if overwrites the generated container command or not.
 	isOverwriteRayContainerCmd := false
@@ -580,7 +581,14 @@ func setContainerEnvVars(pod *corev1.Pod, rayNodeType rayv1.RayNodeType, rayStar
 			},
 		},
 	}
-	container.Env = append(container.Env, rayCloudInstanceID)
+	container.Env = append(container.Env, rayCloudInstanceID, corev1.EnvVar{
+		Name: "POD_IP",
+		ValueFrom: &corev1.EnvVarSource{
+			FieldRef: &corev1.ObjectFieldSelector{
+				FieldPath: "status.podIP",
+			},
+		},
+	})
 
 	// RAY_NODE_TYPE_NAME is used by Ray Autoscaler V2 (alpha). See https://github.com/ray-project/kuberay/issues/1965 for more details.
 	nodeGroupNameEnv := corev1.EnvVar{
diff --git a/ray-operator/controllers/ray/common/service.go b/ray-operator/controllers/ray/common/service.go
index 7ffb8f5..becc958 100644
--- a/ray-operator/controllers/ray/common/service.go
+++ b/ray-operator/controllers/ray/common/service.go
@@ -3,7 +3,9 @@ package common
 import (
 	"context"
 	"fmt"
+	"k8s.io/utils/pointer"
 	"sort"
+	"strings"
 
 	corev1 "k8s.io/api/core/v1"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
@@ -68,6 +70,15 @@ func BuildServiceForHeadPod(ctx context.Context, cluster rayv1.RayCluster, label
 	ports := []corev1.ServicePort{}
 	for name, port := range ports_int {
 		svcPort := corev1.ServicePort{Name: name, Port: port, AppProtocol: &defaultAppProtocol}
+		switch port {
+		case 6379, 10001:
+			svcPort.AppProtocol = pointer.String("grpc")
+		case 8265:
+			svcPort.AppProtocol = pointer.String("http")
+		}
+		if strings.Contains(name, "grpc") {
+			svcPort.AppProtocol = pointer.String("grpc")
+		}
 		ports = append(ports, svcPort)
 	}
 	if cluster.Spec.HeadGroupSpec.HeadService != nil {
@@ -285,8 +296,7 @@ func BuildHeadlessServiceForRayCluster(rayCluster rayv1.RayCluster) (*corev1.Ser
 		utils.RayClusterHeadlessServiceLabelKey: rayCluster.Name,
 	}
 	selectorLabels := map[string]string{
-		utils.RayClusterLabelKey:  rayCluster.Name,
-		utils.RayNodeTypeLabelKey: string(rayv1.WorkerNode),
+		utils.RayClusterLabelKey: rayCluster.Name, // Apply headless service to all Ray nodes.
 	}
 
 	headlessService := &corev1.Service{
@@ -296,11 +306,51 @@ func BuildHeadlessServiceForRayCluster(rayCluster rayv1.RayCluster) (*corev1.Ser
 			Labels:    labels,
 		},
 		Spec: corev1.ServiceSpec{
-			ClusterIP: "None",
-			Selector:  selectorLabels,
-			Type:      corev1.ServiceTypeClusterIP,
+			ClusterIP:                "None",
+			Selector:                 selectorLabels,
+			Type:                     corev1.ServiceTypeClusterIP,
+			PublishNotReadyAddresses: true, // Ray head will not be able to start without this.
+			Ports: []corev1.ServicePort{
+				{ // All the following port definitions, including worker ports, should be parsed from the rayStartParams
+					Name:        "node-manager-port",
+					AppProtocol: pointer.String("grpc"),
+					Port:        6380,
+				},
+				{
+					Name:        "object-manager-port",
+					AppProtocol: pointer.String("grpc"),
+					Port:        6381,
+				},
+				{
+					Name:        "runtime-env-agent-port",
+					AppProtocol: pointer.String("grpc"),
+					Port:        6382,
+				},
+				{
+					Name:        "dashboard-agent-grpc-port",
+					AppProtocol: pointer.String("grpc"),
+					Port:        6383,
+				},
+				{
+					Name:        "dashboard-agent-listen-port",
+					AppProtocol: pointer.String("http"),
+					Port:        52365,
+				},
+				{
+					Name:        "metrics-export-port",
+					AppProtocol: pointer.String("http"),
+					Port:        8080,
+				},
+			},
 		},
 	}
+	for i := int32(10002); i < 10100; i++ {
+		headlessService.Spec.Ports = append(headlessService.Spec.Ports, corev1.ServicePort{
+			Name:        fmt.Sprintf("p%d", i),
+			AppProtocol: pointer.String("grpc"),
+			Port:        i,
+		})
+	}
 
 	return headlessService, nil
 }
diff --git a/ray-operator/controllers/ray/raycluster_controller.go b/ray-operator/controllers/ray/raycluster_controller.go
index 63b584c..3d51cec 100644
--- a/ray-operator/controllers/ray/raycluster_controller.go
+++ b/ray-operator/controllers/ray/raycluster_controller.go
@@ -593,6 +593,7 @@ func (r *RayClusterReconciler) reconcileHeadlessService(ctx context.Context, ins
 		}
 	}
 
+	isMultiHost = true // Always create the headless service
 	if isMultiHost {
 		services := corev1.ServiceList{}
 		options := common.RayClusterHeadlessServiceListOptions(instance)
diff --git a/ray-operator/controllers/ray/utils/constant.go b/ray-operator/controllers/ray/utils/constant.go
index 9bad224..0302324 100644
--- a/ray-operator/controllers/ray/utils/constant.go
+++ b/ray-operator/controllers/ray/utils/constant.go
@@ -85,8 +85,8 @@ const (
 	ComponentName = "kuberay-operator"
 
 	// The default suffix for Headless Service for multi-host worker groups.
-	// The full name will be of the form "${RayCluster_Name}-headless-worker-svc".
-	HeadlessServiceSuffix = "headless-worker-svc"
+	// The full name will be of the form "${RayCluster_Name}-headless-svc".
+	HeadlessServiceSuffix = "headless-svc"
 
 	// Use as container env variable
 	RAY_CLUSTER_NAME                        = "RAY_CLUSTER_NAME"

Summary

The above hacky patch made a couple of changes to KubeRay to enable mTLS+L7 visibilities with Istio:

  1. Extend the current headless service for TPU Multi Host to all Ray nodes. This generates FQDN for each Ray node.
  2. Hard code all port definitions to the headless service for this demo purpose. This fulfills the criteria of auto mTLS.
  3. Specify --node-ip-address FQDN to all Ray nodes. This, along with the correct appProtocol, fulfills the criteria of L7 visibilities.

Related issues:
#948
#1025
#1946

@rueian
Copy link
Contributor

rueian commented Apr 7, 2024

Discussions

Although the above patch doesn't change too many lines, it brings some critical issues:

  1. All the ports must be explicitly stated in the rayStartParams but the default worker port range is too large. I have tried to state all the default 10002~19999 ports in my local Kind cluster and they caused all pods to be killed by OOM. We probably should document this memory issue and recommend users use a smaller worker ports range if they want to enable auto mTLS.
  2. The --node-ip-address FQDN option may bring a compatibility issue: The old KubeDNS uses <pod-ip>.<headless-svc>.<ns>.cluster.local, but the new spec uses <pod-hostname>.<headless-svc>.<ns>.cluster.local. The CoreDNS keeps the former format for backward compatibility, but other implementations may only accept the latter one.

@juliusvonkohout
Copy link

Thank you @rueian i hope that istio/istio#49685 gets fixed soon then.

@andrewsykim
Copy link
Collaborator

Relevant KEP in Kubernetes: kubernetes/enhancements#2611

There were some previous discussions around support port ranges as well, but I can't seem to find that discussion

@kevin85421
Copy link
Member Author

ray-project/ray#44801 Ruei-An has already added a doc for Istio. Close this issue.

@t-indumathy
Copy link

@rueian Do these changes need a specific version of kube-dns ?

@andrewsykim
Copy link
Collaborator

andrewsykim commented Sep 12, 2024

From my previous testing of kuberay and Istio, it's not supported with kube-dns because kube-dns does not support Pod FQDN resolution for headless services. This is however supported with CoreDNS and most other DNS providers. @rueian can correct me if I'm wrong

@t-indumathy
Copy link

@andrewsykim Is it because the format in kube-dns supports only hostname and not pod-ip in FQDN?

@andrewsykim
Copy link
Collaborator

Yeah, it doesn't support the <POD IP ADDRESS>.raycluster-istio-headless-svc.default.svc.cluster.local that is referenced in https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/istio.html.

I haven't tested if there are other viable workarounds

@rueian
Copy link
Contributor

rueian commented Sep 13, 2024

Hi @andrewsykim and @t-indumathy,

I find a note regarding hostname A records for Pod here https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields:

image

So the current workaround, if your kube-dns doesn't support the old Pod IP FQDN, is setting the hostname and subdomain fields on Pods manually to generate hostname A records.

This workaround is quite awkward since we can have only ONE replica in a worker group in order to set Pod hostnames manually. Here is a modified example from the KubeRay Istio tutorial that applies the workaround in the headGroupSpec and the workerGroupSpecs.

kubectl apply -f - <<EOF
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: raycluster-istio
spec:
  rayVersion: '2.35.0'
  headGroupSpec:
    rayStartParams:
      num-cpus: '1'
      node-manager-port: '6380'
      object-manager-port: '6381'
      runtime-env-agent-port: '6382'
      dashboard-agent-grpc-port: '6383'
      dashboard-agent-listen-port: '52365'
      metrics-export-port: '8080'
      max-worker-port: '10012'
      node-ip-address: $(hostname --fqdn)
    template:
      spec:
        hostname: head
        subdomain: raycluster-istio-headless-svc
        containers:
        - name: ray-head
          image: rayproject/ray:2.35.0
  workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 1
      groupName: small-group
      rayStartParams:
        num-cpus: '1'
        node-manager-port: '6380'
        object-manager-port: '6381'
        runtime-env-agent-port: '6382'
        dashboard-agent-grpc-port: '6383'
        dashboard-agent-listen-port: '52365'
        metrics-export-port: '8080'
        max-worker-port: '10012'
        node-ip-address: $(hostname --fqdn)
      template:
        spec:
          hostname: worker-1
          subdomain: raycluster-istio-headless-svc
          containers:
          - name: ray-worker
            image: rayproject/ray:2.35.0
EOF

@t-indumathy
Copy link

Thanks @rueian ! Will test this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants