Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Note that the YAML file in this example uses `serveConfigV2`. You need KubeRay v

```sh
# Create a RayService
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-service.mobilenet.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.mobilenet.yaml
```

* The [mobilenet.py](https://github.com/ray-project/serve_config_examples/blob/master/mobilenet/mobilenet.py) file needs `tensorflow` as a dependency. Hence, the YAML file uses `rayproject/ray-ml` image instead of `rayproject/ray` image.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ The KubeRay operator Pod must be on the CPU node if you have set up the taint fo

## Step 2: Submit the RayJob

Create the RayJob custom resource with [ray-job.batch-inference.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-job.batch-inference.yaml).
Create the RayJob custom resource with [ray-job.batch-inference.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-job.batch-inference.yaml).

Download the file with `curl`:

```bash
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-job.batch-inference.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.batch-inference.yaml
```

Note that the `RayJob` spec contains a spec for the `RayCluster`. This tutorial uses a single-node cluster with 4 GPUs. For production use cases, use a multi-node cluster where the head node doesn't have GPUs, so that Ray can automatically schedule GPU workloads on worker nodes which won't interfere with critical Ray processes on the head node.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ kind create cluster --image=kindest/node:v1.26.0
```sh
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
# Install both CRDs and KubeRay operator v1.4.2.
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
# Install both CRDs and KubeRay operator v1.5.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
```

### Method 2: Kustomize

```sh
# Install CRD and KubeRay operator.
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.4.2"
kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v1.5.0"
```

## Step 3: Validate Installation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Once the KubeRay operator is running, you're ready to deploy a RayCluster. Creat

```sh
# Deploy a sample RayCluster CR from the KubeRay Helm chart repo:
helm install raycluster kuberay/ray-cluster --version 1.4.2
helm install raycluster kuberay/ray-cluster --version 1.5.0
```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ Follow the [KubeRay Operator Installation](kuberay-operator-deploy) to install t
## Step 3: Install a RayJob

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-job.sample.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.sample.yaml
```

## Step 4: Verify the Kubernetes cluster status
Expand Down Expand Up @@ -163,13 +163,13 @@ The Python script `sample_code.py` used by `entrypoint` is a simple Ray script t
## Step 6: Delete the RayJob

```sh
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-job.sample.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.sample.yaml
```

## Step 7: Create a RayJob with `shutdownAfterJobFinishes` set to true

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-job.shutdown.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.shutdown.yaml
```

The `ray-job.shutdown.yaml` defines a RayJob custom resource with `shutdownAfterJobFinishes: true` and `ttlSecondsAfterFinished: 10`.
Expand Down Expand Up @@ -197,7 +197,7 @@ kubectl get raycluster

```sh
# Step 10.1: Delete the RayJob
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-job.shutdown.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.shutdown.yaml

# Step 10.2: Delete the KubeRay operator
helm uninstall kuberay-operator
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## Prerequisites

This guide mainly focuses on the behavior of KubeRay v1.4.2 and Ray 2.46.0.
This guide mainly focuses on the behavior of KubeRay v1.5.0 and Ray 2.46.0.

## What's a RayService?

Expand Down Expand Up @@ -35,7 +35,7 @@ Note that the YAML file in this example uses `serveConfigV2` to specify a multi-
## Step 3: Install a RayService

```sh
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-service.sample.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.sample.yaml
```

## Step 4: Verify the Kubernetes cluster status
Expand Down Expand Up @@ -129,7 +129,7 @@ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:800

```sh
# Delete the RayService.
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-service.sample.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-service.sample.yaml

# Uninstall the KubeRay operator.
helm uninstall kuberay-operator
Expand Down
16 changes: 8 additions & 8 deletions doc/source/cluster/kubernetes/k8s-ecosystem/ingress.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,10 @@ Four examples show how to use ingress to access your Ray cluster:
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.4.2
helm install raycluster kuberay/ray-cluster --version 1.5.0

# Step 3: Edit the `ray-operator/config/samples/ray-cluster-alb-ingress.yaml`
#
Expand Down Expand Up @@ -123,10 +123,10 @@ Now run the following commands:
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.4.2
helm install raycluster kuberay/ray-cluster --version 1.5.0

# Step 3: Edit ray-cluster-gclb-ingress.yaml to replace the service name with the name of the head service from the RayCluster. (Output of `kubectl get svc`)

Expand Down Expand Up @@ -186,12 +186,12 @@ kubectl wait --namespace ingress-nginx \
# Step 3: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0

# Step 4: Install RayCluster and create an ingress separately.
# More information about change of setting was documented in https://github.com/ray-project/kuberay/pull/699
# and `ray-operator/config/samples/ray-cluster.separate-ingress.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.separate-ingress.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.separate-ingress.yaml
kubectl apply -f ray-cluster.separate-ingress.yaml

# Step 5: Check the ingress created in Step 4.
Expand Down Expand Up @@ -230,10 +230,10 @@ kubectl describe ingress raycluster-ingress-head-ingress
# Step 1: Install KubeRay operator and CRD
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0

# Step 2: Install a RayCluster
helm install raycluster kuberay/ray-cluster --version 1.4.2
helm install raycluster kuberay/ray-cluster --version 1.5.0

# Step 3: Edit the `ray-operator/config/samples/ray-cluster-agc-gatewayapi.yaml`
#
Expand Down
2 changes: 1 addition & 1 deletion doc/source/cluster/kubernetes/k8s-ecosystem/istio.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ In this mode, you _must_ disable the KubeRay init container injection by setting

```bash
# Set ENABLE_INIT_CONTAINER_INJECTION=false on the KubeRay operator.
helm upgrade kuberay-operator kuberay/kuberay-operator --version 1.4.2 \
helm upgrade kuberay-operator kuberay/kuberay-operator --version 1.5.0 \
--set env\[0\].name=ENABLE_INIT_CONTAINER_INJECTION \
--set-string env\[0\].value=false

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ kubectl get all -n prometheus-system
* Set `metrics.serviceMonitor.enabled=true` when installing the KubeRay operator with Helm to create a ServiceMonitor that scrapes metrics exposed by the KubeRay operator's service.
```sh
# Enable the ServiceMonitor and set the label `release: prometheus` to the ServiceMonitor so that Prometheus can discover it
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2 \
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 \
--set metrics.serviceMonitor.enabled=true \
--set metrics.serviceMonitor.selector.release=prometheus
```
Expand Down Expand Up @@ -104,7 +104,7 @@ curl localhost:8080
* `# HELP`: Describe the meaning of this metric.
* `# TYPE`: See [this document](https://prometheus.io/docs/concepts/metric_types/) for more details.

* Three required environment variables are defined in [ray-cluster.embed-grafana.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-cluster.embed-grafana.yaml). See [Configuring and Managing Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) for more details about these environment variables.
* Three required environment variables are defined in [ray-cluster.embed-grafana.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.embed-grafana.yaml). See [Configuring and Managing Ray Dashboard](https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html) for more details about these environment variables.
```yaml
env:
- name: RAY_GRAFANA_IFRAME_HOST
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ You need to have the access to configure Kubernetes control plane to replace the
KubeRay v1.4.0 and later versions support scheduler plugins.

```sh
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2 --set batchScheduler.name=scheduler-plugins
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=scheduler-plugins
```

## Step 4: Deploy a RayCluster with gang scheduling
Expand Down
8 changes: 4 additions & 4 deletions doc/source/cluster/kubernetes/k8s-ecosystem/volcano.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ batchScheduler:
* Pass the `--set batchScheduler.name=volcano` flag when running on the command line:
```shell
# Install the Helm chart with the --batch-scheduler=volcano flag
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2 --set batchScheduler.name=volcano
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=volcano
```

### Step 4: Install a RayCluster with the Volcano scheduler
Expand All @@ -45,7 +45,7 @@ The RayCluster custom resource must include the `ray.io/scheduler-name: volcano`
```shell
# Path: kuberay/ray-operator/config/samples
# Includes label `ray.io/scheduler-name: volcano` in the metadata.labels
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml
kubectl apply -f ray-cluster.volcano-scheduler.yaml

# Check the RayCluster
Expand Down Expand Up @@ -113,7 +113,7 @@ Next, create a RayCluster with a head node (1 CPU + 2Gi of RAM) and two workers
```shell
# Path: kuberay/ray-operator/config/samples
# Includes the `ray.io/scheduler-name: volcano` and `volcano.sh/queue-name: kuberay-test-queue` labels in the metadata.labels
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.volcano-scheduler-queue.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.volcano-scheduler-queue.yaml
kubectl apply -f ray-cluster.volcano-scheduler-queue.yaml
```

Expand Down Expand Up @@ -332,7 +332,7 @@ Starting with KubeRay 1.5.0, KubeRay supports gang scheduling for RayJob custom
First, create a queue with a capacity of 4 CPUs and 6Gi of RAM and RayJob a with a head node (1 CPU + 2Gi of RAM), two workers (1 CPU + 1Gi of RAM each) and a submitter pod (0.5 CPU + 200Mi of RAM), for a total of 3500m CPU and 4296Mi of RAM

```shell
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/refs/tags/v1.5.0-rc.0/ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml
kubectl apply -f ray-job.volcano-scheduler-queue.yaml
```

Expand Down
2 changes: 1 addition & 1 deletion doc/source/cluster/kubernetes/k8s-ecosystem/yunikorn.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ See [Get Started](https://yunikorn.apache.org/docs/) for Apache YuniKorn install
When installing KubeRay operator using Helm, pass the `--set batchScheduler.name=yunikorn` flag at the command line:

```shell
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2 --set batchScheduler.name=yunikorn
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0 --set batchScheduler.name=yunikorn
```

## Step 4: Use Apache YuniKorn for gang scheduling
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ Follow [this document](kuberay-operator-deploy) to install the latest stable Kub
### Step 3: Create a RayCluster custom resource with autoscaling enabled

```bash
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler.yaml
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler.yaml
```

### Step 4: Verify the Kubernetes cluster status
Expand All @@ -87,7 +87,7 @@ kubectl get configmaps
```

The RayCluster has one head Pod and zero worker Pods. The head Pod has two containers: a Ray head container and a Ray Autoscaler sidecar container.
Additionally, the [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler.yaml) includes a ConfigMap named `ray-example` that contains two Python scripts: `detached_actor.py` and `terminate_detached_actor.py`.
Additionally, the [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler.yaml) includes a ConfigMap named `ray-example` that contains two Python scripts: `detached_actor.py` and `terminate_detached_actor.py`.

* `detached_actor.py` is a Python script that creates a detached actor which requires 1 CPU.
```py
Expand Down Expand Up @@ -247,7 +247,7 @@ kubectl logs $HEAD_POD -c autoscaler | tail -n 20

```bash
# Delete RayCluster and ConfigMap
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler.yaml
kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler.yaml

# Uninstall the KubeRay operator
helm uninstall kuberay-operator
Expand All @@ -259,7 +259,7 @@ kind delete cluster
(kuberay-autoscaling-config)=
## KubeRay Autoscaling Configurations

The [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler.yaml) used in the quickstart example contains detailed comments about the configuration options.
The [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler.yaml) used in the quickstart example contains detailed comments about the configuration options.
***It's recommended to read this section in conjunction with the YAML file.***

### 1. Enabling autoscaling
Expand Down Expand Up @@ -333,7 +333,7 @@ If you omit `rayStartParams` and want to use autoscaling, the autoscaling image
The Ray Autoscaler reads the `rayStartParams` field or the Ray container's resource limits in the RayCluster custom resource specification to determine the Ray Pod's resource requirements.
The information regarding the number of CPUs is essential for the Ray Autoscaler to scale the cluster.
Therefore, without this information, the Ray Autoscaler reports an error and fails to start.
Take [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler.yaml) as an example below:
Take [ray-cluster.autoscaler.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler.yaml) as an example below:

* If users set `num-cpus` in `rayStartParams`, Ray Autoscaler would work regardless of the resource limits on the container.
* If users don't set `rayStartParams`, the Ray container must have a specified CPU resource limit.
Expand Down Expand Up @@ -458,7 +458,7 @@ Node: 2d5fd3d4337ba5b5a8c3106c572492abb9a8de2dee9da7f6c24c1346
2. **Stability**
Autoscaler V2 makes significant improvements to idle node handling. The V1 autoscaler could stop nodes that became active during termination processing, potentially failing tasks or actors. V2 uses Ray's graceful draining mechanism, which safely stops idle nodes without disrupting ongoing work.

[ray-cluster.autoscaler-v2.yaml](https://github.com/ray-project/kuberay/blob/v1.4.2/ray-operator/config/samples/ray-cluster.autoscaler-v2.yaml) is an example YAML file of a RayCluster with Autoscaler V2 enabled that works with the latest KubeRay version.
[ray-cluster.autoscaler-v2.yaml](https://github.com/ray-project/kuberay/blob/v1.5.0/ray-operator/config/samples/ray-cluster.autoscaler-v2.yaml) is an example YAML file of a RayCluster with Autoscaler V2 enabled that works with the latest KubeRay version.

If you're using KubeRay >= 1.4.0, enable V2 by setting `RayCluster.spec.autoscalerOptions.version: v2`.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,8 @@ In a cluster without the Ray Operator Addon enabled, KubeRay can be manually ins
```sh
helm repo add kuberay https://ray-project.github.io/kuberay-helm/

# Install both CRDs and KubeRay operator v1.4.2.
helm install kuberay-operator kuberay/kuberay-operator --version 1.4.2
# Install both CRDs and KubeRay operator v1.5.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.5.0
```

GKE provides a [validating and mutating webhook](https://github.com/ai-on-gke/kuberay-tpu-webhook) to handle TPU Pod scheduling and bootstrap certain environment variables used for [JAX](https://github.com/google/jax) initialization. The Ray TPU webhook requires a KubeRay operator version of at least v1.1.0. GKE automatically installs the Ray TPU webhook through the [Ray Operator Addon](https://cloud.google.com/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke) with GKE versions 1.30.0-gke.1747000 or later.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ gsutil iam ch serviceAccount:my-iam-sa@my-project-id.iam.gserviceaccount.com:rol
You can download the RayCluster YAML manifest for this tutorial with `curl` as follows:

```bash
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.4.2/ray-operator/config/samples/ray-cluster.gke-bucket.yaml
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.0/ray-operator/config/samples/ray-cluster.gke-bucket.yaml
```

The key parts are the following lines:
Expand Down
Loading