This repository contains a Terraform template for running Ray on Google Kubernetes Engine. We've also included some example notebooks, including one that serves a GPT-J-6B model with Ray AIR (see here for the original notebook).
The solution is split into platform
and user
resources.
Platform resources (deployed once):
- GKE Cluster
- Nvidia GPU drivers
- Kuberay operator and CRDs
User resources (deployed once per user):
- User namespace
- Kubernetes service accounts
- Kuberay cluster
- Prometheus monitoring
- Logging container
- Jupyter notebook
Note: Terraform keeps state metadata in a local file called terraform.tfstate
.
If you need to reinstall any resources, make sure to delete this file as well.
-
cd platform
-
Edit
variables.tf
with your GCP settings. -
Run
terraform init
-
Run
terraform apply
-
cd user
-
Edit
variables.tf
with your GCP settings. -
Run
terraform init
-
Run
terraform apply
-
Run
kubectl get services -n <namespace>
-
Copy the external IP for the notebook.
-
Open the external IP in a browser and login. The default user names and passwords can be found in the Jupyter settings file.
-
The Ray cluster is available at
ray://example-cluster-kuberay-head-svc:10001
. To access the cluster, you can open one of the sample notebooks underexample_notebooks
(viaFile
->Open from URL
in the Jupyter notebook window and use the raw file URL from GitHub) and run through the example. Ex url: https://raw.githubusercontent.com/richardsliu/ray-on-gke/main/example_notebooks/gpt-j-online.ipynb -
To use the Ray dashboard, run the following command to port-forward:
kubectl port-forward -n ray service/example-cluster-kuberay-head-svc 8265:8265
And then open the dashboard using the following URL:
http://localhost:8265
For demo purposes, this repo creates a public IP for the Jupyter notebook with basic dummy authentication. To secure your cluster, it is strong recommended to replace this with your own secure endpoints.
For more information, please take a look at the following links:
- https://cloud.google.com/iap/docs/enabling-kubernetes-howto
- https://cloud.google.com/endpoints/docs/openapi/get-started-kubernetes-engine
- https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/authenticators-users-basics.html
This example is adapted from Ray AIR's examples here.
-
Open the
gpt-j-online.ipynb
notebook. -
Open a terminal in the Jupyter session and install Ray AIR:
pip install ray[air]
- Run through the notebook cells. You can change the prompt in the last cell:
prompt = (
## Input your own prompt here
)
- This should output a generated text response.
This repository comes with out-of-the-box integrations with Google Cloud Logging and Managed Prometheus for monitoring. To see your Ray cluster logs:
- Open Cloud Console and open Logging
- Use the following query parameters:
resource.type="k8s_container"
resource.labels.cluster_name=%CLUSTER_NAME%
resource.labels.pod_name=%RAY_HEAD_POD_NAME%
resource.labels.container_name="fluentbit"
To see monitoring metrics:
- Open Cloud Console and open Metrics Explorer
- In "Target", select "Prometheus Target" and then "Ray".
- Select the metric you want to view, and then click "Apply".