vllm-project · YuhanLiu11 · May 26, 2025 · May 8, 2025 · May 10, 2025 · May 10, 2025
diff --git a/helm/templates/deployment-vllm-multi.yaml b/helm/templates/deployment-vllm-multi.yaml
@@ -1,4 +1,4 @@
-{{- if .Values.servingEngineSpec.enableEngine -}}
+{{- if and .Values.servingEngineSpec.enableEngine (not (hasKey .Values.servingEngineSpec "raySpec")) -}}
 {{- range $modelSpec := .Values.servingEngineSpec.modelSpec }}
 {{- $kv_role := "kv_both" }}
 {{- $kv_rank := 0 }}

diff --git a/helm/templates/ray-cluster.yaml b/helm/templates/ray-cluster.yaml
diff --git a/tutorials/00-a-install-multinode-kubernetes-env.md b/tutorials/00-a-install-multinode-kubernetes-env.md
diff --git a/tutorials/00-b-install-kuberay-operator.md b/tutorials/00-b-install-kuberay-operator.md
@@ -0,0 +1,85 @@
+# Tutorial: Setting Up a Kuberay Operator on Your Kubernetes Environment
+
+## Introduction
+
+This tutorial provides a step-by-step guide to installing and configuring the KubeRay operator within a Kubernetes environment. We will use the helm chart to set up kuberay, enabling distributed inference with vLLM. By the end of this tutorial, you will have a fully operational KubeRay operator ready to support the deployment of the vLLM Production Stack.
+
+## Table of Contents
+
+- [Introduction](#introduction)
+- [Table of Contents](#table-of-contents)
+- [Prerequisites](#prerequisites)
+- [Steps](#steps)
+  - [Step 1: Install the KubeRay Operator Using Helm](#step-1-install-the-kuberay-operator-using-helm)
+  - [Step 2: Verify the KubeRay Configuration](#step-2-verify-the-kuberay-configuration)
+
+## Prerequisites
+
+Before you begin, ensure the following:
+
+1. **GPU Server Requirements:**
+   - A server with a GPU and drivers properly installed (e.g., NVIDIA drivers).
+   - [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed for GPU workloads.
+
+2. **Access and Permissions:**
+   - Root or administrative access to the server.
+   - Internet connectivity to download required packages and tools.
+
+3. **Environment Setup:**
+   - A Linux-based operating system (e.g., Ubuntu 20.04 or later).
+   - Basic understanding of Linux shell commands.
+
+4. **Kubernetes Installation:**
+   - To quickly and easily set up a single-node Kubernetes environment, you may install Minikube by following the instructions provided in[`00-install-kubernetes-env.md`](00-install-kubernetes-env.md).
+   - For setting up a multi-node cluster or a more generalized Kubernetes environment, you may install Kubernetes from scratch using Kubeadm. This involves configuring the container runtime and container network interface (CNI), as outlined in [`00-a-install-multinode-kubernetes-env.md`](00-a-install-multinode-kubernetes-env.md)
+   - If you already have a running Kubernetes cluster, you may skip this step.
+
+5. **Kuberay Concept Review:**
+   - Review the [`official KubeRay documentation`](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) for additional context and best practices.
+
+## Steps
+
+### Step 1: Install the KubeRay Operator Using Helm
+
+1. Add the KubeRay Helm repository:
+
+   ```bash
+   helm repo add kuberay https://ray-project.github.io/kuberay-helm/
+   helm repo update
+   ```
+
+2. Install the Custom Resource Definitions (CRDs) and the KubeRay operator (version 1.2.0) in the default namespace:
+
+   ```bash
+   helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0
+   ```
+
+3. **Explanation:**
+   This step deploys the stable KubeRay operator in your Kubernetes cluster. The operator is essential for managing Ray clusters and enables you to scale multiple vLLM instances for distributed inference workloads.
+
+### Step 2: Verify the KubeRay Configuration
+
+1. **Check the Operator Pod Status:**
+   - Ensure that the KubeRay operator pod is running in the default namespace:
+
+     ```bash
+     kubectl get pods
+     ```
+
+2. **Expected Output:**
+   Example output:
+
+   ```plaintext
+   NAME                                          READY   STATUS    RESTARTS   AGE
+   kuberay-operator-975995b7d-75jqd              1/1     Running   0          25h
+   ```
+
+## Conclusion
+
+You have now successfully installed and verified the KubeRay operator in your Kubernetes environment. This setup lays the foundation for deploying and managing the vLLM Production Stack for distributed inference or training workloads.
+
+For advanced configurations and workload-specific tuning, refer to the official documentation for kuberay, kubectl, helm, and minikube.
+
+What's next:
+
+- [15-basic-pipeline-parallel](https://github.com/vllm-project/production-stack/blob/main/tutorials/15-basic-pipeline-parallel.md)
diff --git a/tutorials/15-basic-pipeline-parallel.md b/tutorials/15-basic-pipeline-parallel.md
diff --git a/tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml b/tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml
@@ -0,0 +1,24 @@
+servingEngineSpec:
+  runtimeClassName: ""
+  raySpec:
+    headNode:
+      requestCPU: 2
+      requestMemory: "20Gi"
+      requestGPU: 2
+  modelSpec:
+  - name: "distilgpt2"
+    repository: "vllm/vllm-openai"
+    tag: "latest"
+    modelURL: "distilbert/distilgpt2"
+
+    replicaCount: 1
+
+    requestCPU: 2
+    requestMemory: "20Gi"
+    requestGPU: 2
+
+    vllmConfig:
+      tensorParallelSize: 2
+      pipelineParallelSize: 2
+
+    shmSize: "20Gi"
diff --git a/utils/init-nvidia-gpu-setup-k8s.sh b/utils/init-nvidia-gpu-setup-k8s.sh
@@ -0,0 +1,63 @@
+#!/bin/bash
+set -e
+
+# Allow users to override the paths for the NVIDIA tools.
+: "${NVIDIA_SMI_PATH:=nvidia-smi}"
+: "${NVIDIA_CTK_PATH:=nvidia-ctk}"
+
+# --- Debug and Environment Setup ---
+echo "Current PATH: $PATH"
+echo "Operating System: $(uname -a)"
+
+# Get the script directory to reference local scripts reliably.
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+# --- Install Prerequisites ---
+echo "Installing kubectl and helm..."
+bash "$SCRIPT_DIR/install-kubectl.sh"
+bash "$SCRIPT_DIR/install-helm.sh"
+
+# --- Configure BPF (if available) ---
+if [ -f /proc/sys/net/core/bpf_jit_harden ]; then
+    echo "Configuring BPF: Setting net.core.bpf_jit_harden=0"
+    if ! grep -q "net.core.bpf_jit_harden=0" /etc/sysctl.conf; then
+        echo "net.core.bpf_jit_harden=0" | sudo tee -a /etc/sysctl.conf
+    fi
+    sudo sysctl -p
+else
+    echo "BPF JIT hardening configuration not available, skipping..."
+fi
+
+# --- NVIDIA GPU Setup ---
+GPU_AVAILABLE=false
+if command -v "$NVIDIA_SMI_PATH" >/dev/null 2>&1; then
+    echo "NVIDIA GPU detected via nvidia-smi at: $(command -v "$NVIDIA_SMI_PATH")"
+    if command -v "$NVIDIA_CTK_PATH" >/dev/null 2>&1; then
+      echo "nvidia-ctk found at: $(command -v "$NVIDIA_CTK_PATH")"
+      GPU_AVAILABLE=true
+    else
+      echo "nvidia-ctk not found. Please install the NVIDIA Container Toolkit to enable GPU support."
+    fi
+fi
+
+if [ "$GPU_AVAILABLE" = true ]; then
+    # Configure Docker for GPU support.
+    echo "Configuring Docker runtime for GPU support..."
+    if sudo "$NVIDIA_CTK_PATH" runtime configure --runtime=docker; then
+      echo "Restarting Docker to apply changes..."
+      echo "WARNING: Restarting Docker will stop and restart all containers."
+      sudo systemctl restart docker
+      echo "Docker runtime configured successfully."
+    else
+      echo "Error: Failed to configure Docker runtime using the NVIDIA Container Toolkit."
+      exit 1
+    fi
+
+    # Install the GPU Operator via Helm.
+    echo "Adding NVIDIA helm repo and updating..."
+    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
+    echo "Installing GPU Operator..."
+    helm install --wait gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.1
+fi
+
+echo "NVIDIA GPU Setup complete."
diff --git a/utils/install-calico.sh b/utils/install-calico.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# Refer to https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
+# for more information.
+
+# Install the Tigera operator and custom resource definitions:
+kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/tigera-operator.yaml
+
+# Install Calico by creating the necessary custom resources:
+kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/custom-resources.yaml
diff --git a/utils/install-cri-o.sh b/utils/install-cri-o.sh
@@ -0,0 +1,46 @@
+#!/bin/bash
+
+# Refer to https://github.com/cri-o/packaging/blob/main/README.md#distributions-using-deb-packages
+# and
+# https://github.com/cri-o/cri-o/blob/main/contrib/cni/README.md#configuration-directory
+# for more information.
+
+# Install the dependencies for adding repositories
+sudo apt-get update
+sudo apt-get install -y software-properties-common curl
+
+export CRIO_VERSION=v1.32
+
+# Add the CRI-O repository
+curl -fsSL https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key |
+    sudo gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg
+
+echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/ /" |
+    sudo tee /etc/apt/sources.list.d/cri-o.list
+
+# Install the packages
+sudo apt-get update
+sudo apt-get install -y cri-o
+
+# Update crio config by creating (or editing) /etc/crio/crio.conf
+sudo tee /etc/crio/crio.conf > /dev/null <<EOF
+[crio.image]
+pause_image="registry.k8s.io/pause:3.10"
+
+[crio.runtime]
+conmon_cgroup = "pod"
+cgroup_manager = "systemd"
+EOF
+
+# Start CRI-O
+sudo systemctl start crio.service
+
+sudo swapoff -a
+sudo modprobe br_netfilter
+sudo sysctl -w net.ipv4.ip_forward=1
+
+# Apply sysctl params without reboot
+sudo sysctl --system
+
+# Verify that net.ipv4.ip_forward is set to 1 with:
+sudo sysctl net.ipv4.ip_forward
diff --git a/utils/install-kubeadm.sh b/utils/install-kubeadm.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# Refer to https://v1-32.docs.kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
+# for more detailed explanation of kubeadm installation.
+# Following instructions are for linux distributions like Ubuntu, Debian, etc.
+# This script is from above official documentation, but modified to work with Debian 11 (bullseye).
+
+sudo apt-get update
+# apt-transport-https may be a dummy package; if so, you can skip that package
+sudo apt-get install -y apt-transport-https ca-certificates curl gpg
+
+# If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below.
+sudo mkdir -p -m 755 /etc/apt/keyrings
+curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
+
+# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
+echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
+
+# Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version:
+sudo apt-get update
+sudo apt-get install -y kubelet kubeadm kubectl
+sudo apt-mark hold kubelet kubeadm kubectl
+
+# (Optional) Enable the kubelet service before running kubeadm:
+sudo systemctl enable --now kubelet
diff --git a/utils/install-kuberay.sh b/utils/install-kuberay.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+
+# original kuberay installation reference: https://github.com/ray-project/kuberay?tab=readme-ov-file#helm-charts
+
+# Add the Helm repo
+helm repo add kuberay https://ray-project.github.io/kuberay-helm/
+helm repo update
+
+# Confirm the repo exists
+helm search repo kuberay --devel
+
+# Install both CRDs and KubeRay operator v1.2.0.
+helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0
+
+# Check the KubeRay operator Pod in `default` namespace
+kubectl get pods
+# NAME                                READY   STATUS    RESTARTS   AGE
+# kuberay-operator-f89ddb644-psts7    1/1     Running   0          33m