Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
76bf522
[Feat] Added kuberay installation script via helm. Initial commit.
insukim1994 May 8, 2025
3a0a1ce
Added initial helm chart template file for ray cluster creation.
insukim1994 May 10, 2025
2b2cdfa
[Feat] Fixed typo at ray cluster template file. Added example values …
insukim1994 May 10, 2025
34aa920
[Feat] Removed unused fields at the moment. Bugfixed conflicting reso…
insukim1994 May 10, 2025
720cebf
[Feat] Added startup probe to check if all ray cluster nodes are up.
insukim1994 May 11, 2025
c1fa817
[Feat] Added vllm command composing and execute logic in the backgrou…
insukim1994 May 11, 2025
e9c23a4
[Feat] Added pod relevant settings from servingEngineSpec for both he…
insukim1994 May 11, 2025
b88fa17
[Feat] Added env templates for head and worker spec.
insukim1994 May 11, 2025
d35cdb4
[Feat] Added volumemounts template for head and worker spec.
insukim1994 May 11, 2025
12ccab7
[Feat] Adeed templates for resource, probe, port and etc.
insukim1994 May 11, 2025
90f3f1c
[Feat] Initial working example.
insukim1994 May 11, 2025
96dedc7
[Doc] Added documentation to run vllm with kuberay for pipeline paral…
insukim1994 May 11, 2025
0f878ec
[Doc] Elaborated tutorial documentation.
insukim1994 May 11, 2025
d1c62fa
[Chore] Fixed typo in kuberay operator installation tutorial document.
insukim1994 May 11, 2025
fa8a722
[Chore] Fixed a wording in kuberay operator installation tutorial doc…
insukim1994 May 11, 2025
e3507e1
[Chore] Fixed typo in kuberay operator installation tutorial document.
insukim1994 May 11, 2025
6148ff7
[Chore] Removed unused value from helm chart default value.
insukim1994 May 11, 2025
17cee8a
[Chore] Elaborated expression on tutorial document.
insukim1994 May 11, 2025
6058820
[Chore] Elaborated expression on tutorial document.
insukim1994 May 11, 2025
2967dd2
[Feat] Set readiness httpGet probe for ray head node. Removed unused …
insukim1994 May 12, 2025
6779ebf
[Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray inst…
insukim1994 May 12, 2025
38979b9
[Feat] Added missing dashboard related setting and a step for reinsta…
insukim1994 May 12, 2025
428738b
[Feat] Removed initContainer section that will be overwritted by kube…
insukim1994 May 12, 2025
c27154b
[Feat] Kuberay operator version updated needed.
insukim1994 May 12, 2025
62ec649
[Doc] Minor fix in tutorial.
insukim1994 May 12, 2025
a833fea
[Doc] Added sample gpu usage example for each ray head and worker node.
insukim1994 May 12, 2025
618c50c
[Chore] Fixed typo in basic pipeline parallel tutorial doc.
insukim1994 May 12, 2025
57bab87
[Chore] Reverted unnecessary change.
insukim1994 May 12, 2025
f41ba79
[Chore] Fixed typo in kuberay install util script.
insukim1994 May 12, 2025
b4168ac
[Doc] Added utility script to install kubeadm.
insukim1994 May 15, 2025
9100a42
[Doc] Added cri-o container runtime installation script & a script to…
insukim1994 May 15, 2025
3d8b58a
[Doc] Added script to join worker nodes. Elaborated control plane ini…
insukim1994 May 15, 2025
52ec887
[Doc] Added nvidia gpu setup script for each node.
insukim1994 May 15, 2025
9c1bdee
[Doc] Script modification during testing.
insukim1994 May 16, 2025
f4cf1dd
[Doc] Elaborated k8s controlplane initialization and worker node join…
insukim1994 May 16, 2025
29fde46
[Doc] Elaborated basic pipeline parallelism tutorial document.
insukim1994 May 17, 2025
4516b63
[Doc] Added guide for settig up kubernetes cluster with 2 nodes (cont…
insukim1994 May 17, 2025
6d0a8ec
[Doc] Elaborated K8s cluster initialization guide and applied a revie…
insukim1994 May 18, 2025
57d3aad
[Chore] Strict total number of ray node checking. Tested helm chart w…
insukim1994 May 18, 2025
85fc5ce
[Doc] Elaborated important note when applying pipeline parallelism (w…
insukim1994 May 18, 2025
9c5d2f8
[Doc] Elaborated basic pipeline parallelism tutorial example.
insukim1994 May 19, 2025
a657a12
[Doc] Review updates (prevent duplicated line appends & added warning…
insukim1994 May 20, 2025
3c7810f
[Doc] Review updates (elaborated prerequisites for kuberay operator i…
insukim1994 May 23, 2025
90f3298
[Bugfix] Fixed version typo of lmcache from toml file.
insukim1994 May 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion helm/templates/deployment-vllm-multi.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.servingEngineSpec.enableEngine -}}
{{- if and .Values.servingEngineSpec.enableEngine (not (hasKey .Values.servingEngineSpec "raySpec")) -}}
{{- range $modelSpec := .Values.servingEngineSpec.modelSpec }}
{{- $kv_role := "kv_both" }}
{{- $kv_rank := 0 }}
Expand Down
620 changes: 620 additions & 0 deletions helm/templates/ray-cluster.yaml

Large diffs are not rendered by default.

411 changes: 411 additions & 0 deletions tutorials/00-a-install-multinode-kubernetes-env.md

Large diffs are not rendered by default.

85 changes: 85 additions & 0 deletions tutorials/00-b-install-kuberay-operator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Tutorial: Setting Up a Kuberay Operator on Your Kubernetes Environment

## Introduction

This tutorial provides a step-by-step guide to installing and configuring the KubeRay operator within a Kubernetes environment. We will use the helm chart to set up kuberay, enabling distributed inference with vLLM. By the end of this tutorial, you will have a fully operational KubeRay operator ready to support the deployment of the vLLM Production Stack.

## Table of Contents

- [Introduction](#introduction)
- [Table of Contents](#table-of-contents)
- [Prerequisites](#prerequisites)
- [Steps](#steps)
- [Step 1: Install the KubeRay Operator Using Helm](#step-1-install-the-kuberay-operator-using-helm)
- [Step 2: Verify the KubeRay Configuration](#step-2-verify-the-kuberay-configuration)

## Prerequisites

Before you begin, ensure the following:

1. **GPU Server Requirements:**
- A server with a GPU and drivers properly installed (e.g., NVIDIA drivers).
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed for GPU workloads.

2. **Access and Permissions:**
- Root or administrative access to the server.
- Internet connectivity to download required packages and tools.

3. **Environment Setup:**
- A Linux-based operating system (e.g., Ubuntu 20.04 or later).
- Basic understanding of Linux shell commands.

4. **Kubernetes Installation:**
- To quickly and easily set up a single-node Kubernetes environment, you may install Minikube by following the instructions provided in[`00-install-kubernetes-env.md`](00-install-kubernetes-env.md).
- For setting up a multi-node cluster or a more generalized Kubernetes environment, you may install Kubernetes from scratch using Kubeadm. This involves configuring the container runtime and container network interface (CNI), as outlined in [`00-a-install-multinode-kubernetes-env.md`](00-a-install-multinode-kubernetes-env.md)
- If you already have a running Kubernetes cluster, you may skip this step.

5. **Kuberay Concept Review:**
- Review the [`official KubeRay documentation`](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) for additional context and best practices.

## Steps

### Step 1: Install the KubeRay Operator Using Helm

1. Add the KubeRay Helm repository:

```bash
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
```

2. Install the Custom Resource Definitions (CRDs) and the KubeRay operator (version 1.2.0) in the default namespace:

```bash
helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0
```

3. **Explanation:**
This step deploys the stable KubeRay operator in your Kubernetes cluster. The operator is essential for managing Ray clusters and enables you to scale multiple vLLM instances for distributed inference workloads.

### Step 2: Verify the KubeRay Configuration

1. **Check the Operator Pod Status:**
- Ensure that the KubeRay operator pod is running in the default namespace:

```bash
kubectl get pods
```

2. **Expected Output:**
Example output:

```plaintext
NAME READY STATUS RESTARTS AGE
kuberay-operator-975995b7d-75jqd 1/1 Running 0 25h
```

## Conclusion

You have now successfully installed and verified the KubeRay operator in your Kubernetes environment. This setup lays the foundation for deploying and managing the vLLM Production Stack for distributed inference or training workloads.

For advanced configurations and workload-specific tuning, refer to the official documentation for kuberay, kubectl, helm, and minikube.

What's next:

- [15-basic-pipeline-parallel](https://github.com/vllm-project/production-stack/blob/main/tutorials/15-basic-pipeline-parallel.md)
309 changes: 309 additions & 0 deletions tutorials/15-basic-pipeline-parallel.md

Large diffs are not rendered by default.

24 changes: 24 additions & 0 deletions tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
servingEngineSpec:
runtimeClassName: ""
raySpec:
headNode:
requestCPU: 2
requestMemory: "20Gi"
requestGPU: 2
modelSpec:
- name: "distilgpt2"
repository: "vllm/vllm-openai"
tag: "latest"
modelURL: "distilbert/distilgpt2"

replicaCount: 1

requestCPU: 2
requestMemory: "20Gi"
requestGPU: 2

vllmConfig:
tensorParallelSize: 2
pipelineParallelSize: 2

shmSize: "20Gi"
63 changes: 63 additions & 0 deletions utils/init-nvidia-gpu-setup-k8s.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash
set -e

# Allow users to override the paths for the NVIDIA tools.
: "${NVIDIA_SMI_PATH:=nvidia-smi}"
: "${NVIDIA_CTK_PATH:=nvidia-ctk}"

# --- Debug and Environment Setup ---
echo "Current PATH: $PATH"
echo "Operating System: $(uname -a)"

# Get the script directory to reference local scripts reliably.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# --- Install Prerequisites ---
echo "Installing kubectl and helm..."
bash "$SCRIPT_DIR/install-kubectl.sh"
bash "$SCRIPT_DIR/install-helm.sh"

# --- Configure BPF (if available) ---
if [ -f /proc/sys/net/core/bpf_jit_harden ]; then
echo "Configuring BPF: Setting net.core.bpf_jit_harden=0"
if ! grep -q "net.core.bpf_jit_harden=0" /etc/sysctl.conf; then
echo "net.core.bpf_jit_harden=0" | sudo tee -a /etc/sysctl.conf
fi
sudo sysctl -p
else
echo "BPF JIT hardening configuration not available, skipping..."
fi

# --- NVIDIA GPU Setup ---
GPU_AVAILABLE=false
if command -v "$NVIDIA_SMI_PATH" >/dev/null 2>&1; then
echo "NVIDIA GPU detected via nvidia-smi at: $(command -v "$NVIDIA_SMI_PATH")"
if command -v "$NVIDIA_CTK_PATH" >/dev/null 2>&1; then
echo "nvidia-ctk found at: $(command -v "$NVIDIA_CTK_PATH")"
GPU_AVAILABLE=true
else
echo "nvidia-ctk not found. Please install the NVIDIA Container Toolkit to enable GPU support."
fi
fi

if [ "$GPU_AVAILABLE" = true ]; then
# Configure Docker for GPU support.
echo "Configuring Docker runtime for GPU support..."
if sudo "$NVIDIA_CTK_PATH" runtime configure --runtime=docker; then
echo "Restarting Docker to apply changes..."
echo "WARNING: Restarting Docker will stop and restart all containers."
sudo systemctl restart docker
echo "Docker runtime configured successfully."
else
echo "Error: Failed to configure Docker runtime using the NVIDIA Container Toolkit."
exit 1
fi

# Install the GPU Operator via Helm.
echo "Adding NVIDIA helm repo and updating..."
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
echo "Installing GPU Operator..."
helm install --wait gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.1
fi

echo "NVIDIA GPU Setup complete."
10 changes: 10 additions & 0 deletions utils/install-calico.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash

# Refer to https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
# for more information.

# Install the Tigera operator and custom resource definitions:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/tigera-operator.yaml

# Install Calico by creating the necessary custom resources:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/custom-resources.yaml
46 changes: 46 additions & 0 deletions utils/install-cri-o.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

# Refer to https://github.com/cri-o/packaging/blob/main/README.md#distributions-using-deb-packages
# and
# https://github.com/cri-o/cri-o/blob/main/contrib/cni/README.md#configuration-directory
# for more information.

# Install the dependencies for adding repositories
sudo apt-get update
sudo apt-get install -y software-properties-common curl

export CRIO_VERSION=v1.32

# Add the CRI-O repository
curl -fsSL https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key |
sudo gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg

echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/ /" |
sudo tee /etc/apt/sources.list.d/cri-o.list

# Install the packages
sudo apt-get update
sudo apt-get install -y cri-o

# Update crio config by creating (or editing) /etc/crio/crio.conf
sudo tee /etc/crio/crio.conf > /dev/null <<EOF
[crio.image]
pause_image="registry.k8s.io/pause:3.10"

[crio.runtime]
conmon_cgroup = "pod"
cgroup_manager = "systemd"
EOF

# Start CRI-O
sudo systemctl start crio.service

sudo swapoff -a
sudo modprobe br_netfilter
sudo sysctl -w net.ipv4.ip_forward=1

# Apply sysctl params without reboot
sudo sysctl --system

# Verify that net.ipv4.ip_forward is set to 1 with:
sudo sysctl net.ipv4.ip_forward
25 changes: 25 additions & 0 deletions utils/install-kubeadm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

# Refer to https://v1-32.docs.kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
# for more detailed explanation of kubeadm installation.
# Following instructions are for linux distributions like Ubuntu, Debian, etc.
# This script is from above official documentation, but modified to work with Debian 11 (bullseye).

sudo apt-get update
# apt-transport-https may be a dummy package; if so, you can skip that package
sudo apt-get install -y apt-transport-https ca-certificates curl gpg

# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below.
sudo mkdir -p -m 755 /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# (Optional) Enable the kubelet service before running kubeadm:
sudo systemctl enable --now kubelet
18 changes: 18 additions & 0 deletions utils/install-kuberay.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/bash

# original kuberay installation reference: https://github.com/ray-project/kuberay?tab=readme-ov-file#helm-charts

# Add the Helm repo
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update

# Confirm the repo exists
helm search repo kuberay --devel

# Install both CRDs and KubeRay operator v1.2.0.
helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0

# Check the KubeRay operator Pod in `default` namespace
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-f89ddb644-psts7 1/1 Running 0 33m