-
Notifications
You must be signed in to change notification settings - Fork 316
Feat/basic pipeline parallelism #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
YuhanLiu11
merged 44 commits into
vllm-project:main
from
insukim1994:feat/basic-pipeline-parallelism
May 26, 2025
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
76bf522
[Feat] Added kuberay installation script via helm. Initial commit.
insukim1994 3a0a1ce
Added initial helm chart template file for ray cluster creation.
insukim1994 2b2cdfa
[Feat] Fixed typo at ray cluster template file. Added example values …
insukim1994 34aa920
[Feat] Removed unused fields at the moment. Bugfixed conflicting reso…
insukim1994 720cebf
[Feat] Added startup probe to check if all ray cluster nodes are up.
insukim1994 c1fa817
[Feat] Added vllm command composing and execute logic in the backgrou…
insukim1994 e9c23a4
[Feat] Added pod relevant settings from servingEngineSpec for both he…
insukim1994 b88fa17
[Feat] Added env templates for head and worker spec.
insukim1994 d35cdb4
[Feat] Added volumemounts template for head and worker spec.
insukim1994 12ccab7
[Feat] Adeed templates for resource, probe, port and etc.
insukim1994 90f3f1c
[Feat] Initial working example.
insukim1994 96dedc7
[Doc] Added documentation to run vllm with kuberay for pipeline paral…
insukim1994 0f878ec
[Doc] Elaborated tutorial documentation.
insukim1994 d1c62fa
[Chore] Fixed typo in kuberay operator installation tutorial document.
insukim1994 fa8a722
[Chore] Fixed a wording in kuberay operator installation tutorial doc…
insukim1994 e3507e1
[Chore] Fixed typo in kuberay operator installation tutorial document.
insukim1994 6148ff7
[Chore] Removed unused value from helm chart default value.
insukim1994 17cee8a
[Chore] Elaborated expression on tutorial document.
insukim1994 6058820
[Chore] Elaborated expression on tutorial document.
insukim1994 2967dd2
[Feat] Set readiness httpGet probe for ray head node. Removed unused …
insukim1994 6779ebf
[Feat] Added VLLM_HOST_IP based on official vllm docs. Added ray inst…
insukim1994 38979b9
[Feat] Added missing dashboard related setting and a step for reinsta…
insukim1994 428738b
[Feat] Removed initContainer section that will be overwritted by kube…
insukim1994 c27154b
[Feat] Kuberay operator version updated needed.
insukim1994 62ec649
[Doc] Minor fix in tutorial.
insukim1994 a833fea
[Doc] Added sample gpu usage example for each ray head and worker node.
insukim1994 618c50c
[Chore] Fixed typo in basic pipeline parallel tutorial doc.
insukim1994 57bab87
[Chore] Reverted unnecessary change.
insukim1994 f41ba79
[Chore] Fixed typo in kuberay install util script.
insukim1994 b4168ac
[Doc] Added utility script to install kubeadm.
insukim1994 9100a42
[Doc] Added cri-o container runtime installation script & a script to…
insukim1994 3d8b58a
[Doc] Added script to join worker nodes. Elaborated control plane ini…
insukim1994 52ec887
[Doc] Added nvidia gpu setup script for each node.
insukim1994 9c1bdee
[Doc] Script modification during testing.
insukim1994 f4cf1dd
[Doc] Elaborated k8s controlplane initialization and worker node join…
insukim1994 29fde46
[Doc] Elaborated basic pipeline parallelism tutorial document.
insukim1994 4516b63
[Doc] Added guide for settig up kubernetes cluster with 2 nodes (cont…
insukim1994 6d0a8ec
[Doc] Elaborated K8s cluster initialization guide and applied a revie…
insukim1994 57d3aad
[Chore] Strict total number of ray node checking. Tested helm chart w…
insukim1994 85fc5ce
[Doc] Elaborated important note when applying pipeline parallelism (w…
insukim1994 9c5d2f8
[Doc] Elaborated basic pipeline parallelism tutorial example.
insukim1994 a657a12
[Doc] Review updates (prevent duplicated line appends & added warning…
insukim1994 3c7810f
[Doc] Review updates (elaborated prerequisites for kuberay operator i…
insukim1994 90f3298
[Bugfix] Fixed version typo of lmcache from toml file.
insukim1994 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| # Tutorial: Setting Up a Kuberay Operator on Your Kubernetes Environment | ||
|
|
||
| ## Introduction | ||
|
|
||
| This tutorial provides a step-by-step guide to installing and configuring the KubeRay operator within a Kubernetes environment. We will use the helm chart to set up kuberay, enabling distributed inference with vLLM. By the end of this tutorial, you will have a fully operational KubeRay operator ready to support the deployment of the vLLM Production Stack. | ||
|
|
||
| ## Table of Contents | ||
|
|
||
| - [Introduction](#introduction) | ||
| - [Table of Contents](#table-of-contents) | ||
| - [Prerequisites](#prerequisites) | ||
| - [Steps](#steps) | ||
| - [Step 1: Install the KubeRay Operator Using Helm](#step-1-install-the-kuberay-operator-using-helm) | ||
| - [Step 2: Verify the KubeRay Configuration](#step-2-verify-the-kuberay-configuration) | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before you begin, ensure the following: | ||
|
|
||
| 1. **GPU Server Requirements:** | ||
| - A server with a GPU and drivers properly installed (e.g., NVIDIA drivers). | ||
| - [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed for GPU workloads. | ||
|
|
||
| 2. **Access and Permissions:** | ||
| - Root or administrative access to the server. | ||
| - Internet connectivity to download required packages and tools. | ||
|
|
||
| 3. **Environment Setup:** | ||
| - A Linux-based operating system (e.g., Ubuntu 20.04 or later). | ||
| - Basic understanding of Linux shell commands. | ||
|
|
||
| 4. **Kubernetes Installation:** | ||
| - To quickly and easily set up a single-node Kubernetes environment, you may install Minikube by following the instructions provided in[`00-install-kubernetes-env.md`](00-install-kubernetes-env.md). | ||
| - For setting up a multi-node cluster or a more generalized Kubernetes environment, you may install Kubernetes from scratch using Kubeadm. This involves configuring the container runtime and container network interface (CNI), as outlined in [`00-a-install-multinode-kubernetes-env.md`](00-a-install-multinode-kubernetes-env.md) | ||
| - If you already have a running Kubernetes cluster, you may skip this step. | ||
|
|
||
| 5. **Kuberay Concept Review:** | ||
| - Review the [`official KubeRay documentation`](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) for additional context and best practices. | ||
|
|
||
| ## Steps | ||
|
|
||
| ### Step 1: Install the KubeRay Operator Using Helm | ||
|
|
||
| 1. Add the KubeRay Helm repository: | ||
|
|
||
| ```bash | ||
| helm repo add kuberay https://ray-project.github.io/kuberay-helm/ | ||
| helm repo update | ||
| ``` | ||
|
|
||
| 2. Install the Custom Resource Definitions (CRDs) and the KubeRay operator (version 1.2.0) in the default namespace: | ||
|
|
||
| ```bash | ||
| helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0 | ||
| ``` | ||
|
|
||
| 3. **Explanation:** | ||
| This step deploys the stable KubeRay operator in your Kubernetes cluster. The operator is essential for managing Ray clusters and enables you to scale multiple vLLM instances for distributed inference workloads. | ||
|
|
||
| ### Step 2: Verify the KubeRay Configuration | ||
|
|
||
| 1. **Check the Operator Pod Status:** | ||
| - Ensure that the KubeRay operator pod is running in the default namespace: | ||
|
|
||
| ```bash | ||
| kubectl get pods | ||
| ``` | ||
|
|
||
| 2. **Expected Output:** | ||
| Example output: | ||
|
|
||
| ```plaintext | ||
| NAME READY STATUS RESTARTS AGE | ||
| kuberay-operator-975995b7d-75jqd 1/1 Running 0 25h | ||
| ``` | ||
|
|
||
| ## Conclusion | ||
|
|
||
| You have now successfully installed and verified the KubeRay operator in your Kubernetes environment. This setup lays the foundation for deploying and managing the vLLM Production Stack for distributed inference or training workloads. | ||
|
|
||
| For advanced configurations and workload-specific tuning, refer to the official documentation for kuberay, kubectl, helm, and minikube. | ||
|
|
||
| What's next: | ||
|
|
||
| - [15-basic-pipeline-parallel](https://github.com/vllm-project/production-stack/blob/main/tutorials/15-basic-pipeline-parallel.md) |
Large diffs are not rendered by default.
Oops, something went wrong.
24 changes: 24 additions & 0 deletions
24
tutorials/assets/values-15-minimal-pipeline-parallel-example.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| servingEngineSpec: | ||
| runtimeClassName: "" | ||
| raySpec: | ||
| headNode: | ||
| requestCPU: 2 | ||
| requestMemory: "20Gi" | ||
| requestGPU: 2 | ||
| modelSpec: | ||
| - name: "distilgpt2" | ||
| repository: "vllm/vllm-openai" | ||
| tag: "latest" | ||
| modelURL: "distilbert/distilgpt2" | ||
|
|
||
| replicaCount: 1 | ||
|
|
||
| requestCPU: 2 | ||
| requestMemory: "20Gi" | ||
| requestGPU: 2 | ||
|
|
||
| vllmConfig: | ||
| tensorParallelSize: 2 | ||
| pipelineParallelSize: 2 | ||
|
|
||
| shmSize: "20Gi" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| #!/bin/bash | ||
| set -e | ||
|
|
||
| # Allow users to override the paths for the NVIDIA tools. | ||
| : "${NVIDIA_SMI_PATH:=nvidia-smi}" | ||
| : "${NVIDIA_CTK_PATH:=nvidia-ctk}" | ||
|
|
||
| # --- Debug and Environment Setup --- | ||
| echo "Current PATH: $PATH" | ||
| echo "Operating System: $(uname -a)" | ||
|
|
||
| # Get the script directory to reference local scripts reliably. | ||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
|
|
||
| # --- Install Prerequisites --- | ||
| echo "Installing kubectl and helm..." | ||
| bash "$SCRIPT_DIR/install-kubectl.sh" | ||
| bash "$SCRIPT_DIR/install-helm.sh" | ||
|
|
||
| # --- Configure BPF (if available) --- | ||
| if [ -f /proc/sys/net/core/bpf_jit_harden ]; then | ||
| echo "Configuring BPF: Setting net.core.bpf_jit_harden=0" | ||
| if ! grep -q "net.core.bpf_jit_harden=0" /etc/sysctl.conf; then | ||
| echo "net.core.bpf_jit_harden=0" | sudo tee -a /etc/sysctl.conf | ||
| fi | ||
| sudo sysctl -p | ||
| else | ||
| echo "BPF JIT hardening configuration not available, skipping..." | ||
| fi | ||
|
|
||
| # --- NVIDIA GPU Setup --- | ||
| GPU_AVAILABLE=false | ||
| if command -v "$NVIDIA_SMI_PATH" >/dev/null 2>&1; then | ||
| echo "NVIDIA GPU detected via nvidia-smi at: $(command -v "$NVIDIA_SMI_PATH")" | ||
| if command -v "$NVIDIA_CTK_PATH" >/dev/null 2>&1; then | ||
| echo "nvidia-ctk found at: $(command -v "$NVIDIA_CTK_PATH")" | ||
| GPU_AVAILABLE=true | ||
| else | ||
| echo "nvidia-ctk not found. Please install the NVIDIA Container Toolkit to enable GPU support." | ||
| fi | ||
| fi | ||
|
|
||
| if [ "$GPU_AVAILABLE" = true ]; then | ||
| # Configure Docker for GPU support. | ||
| echo "Configuring Docker runtime for GPU support..." | ||
| if sudo "$NVIDIA_CTK_PATH" runtime configure --runtime=docker; then | ||
| echo "Restarting Docker to apply changes..." | ||
insukim1994 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| echo "WARNING: Restarting Docker will stop and restart all containers." | ||
| sudo systemctl restart docker | ||
| echo "Docker runtime configured successfully." | ||
| else | ||
| echo "Error: Failed to configure Docker runtime using the NVIDIA Container Toolkit." | ||
| exit 1 | ||
| fi | ||
|
|
||
| # Install the GPU Operator via Helm. | ||
| echo "Adding NVIDIA helm repo and updating..." | ||
| helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update | ||
| echo "Installing GPU Operator..." | ||
| helm install --wait gpu-operator -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.9.1 | ||
| fi | ||
|
|
||
| echo "NVIDIA GPU Setup complete." | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Refer to https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart | ||
| # for more information. | ||
|
|
||
| # Install the Tigera operator and custom resource definitions: | ||
| kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/tigera-operator.yaml | ||
|
|
||
| # Install Calico by creating the necessary custom resources: | ||
| kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.0/manifests/custom-resources.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Refer to https://github.com/cri-o/packaging/blob/main/README.md#distributions-using-deb-packages | ||
| # and | ||
| # https://github.com/cri-o/cri-o/blob/main/contrib/cni/README.md#configuration-directory | ||
| # for more information. | ||
|
|
||
| # Install the dependencies for adding repositories | ||
| sudo apt-get update | ||
| sudo apt-get install -y software-properties-common curl | ||
|
|
||
| export CRIO_VERSION=v1.32 | ||
|
|
||
| # Add the CRI-O repository | ||
| curl -fsSL https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/Release.key | | ||
| sudo gpg --dearmor -o /etc/apt/keyrings/cri-o-apt-keyring.gpg | ||
|
|
||
| echo "deb [signed-by=/etc/apt/keyrings/cri-o-apt-keyring.gpg] https://download.opensuse.org/repositories/isv:/cri-o:/stable:/$CRIO_VERSION/deb/ /" | | ||
| sudo tee /etc/apt/sources.list.d/cri-o.list | ||
|
|
||
| # Install the packages | ||
| sudo apt-get update | ||
| sudo apt-get install -y cri-o | ||
|
|
||
| # Update crio config by creating (or editing) /etc/crio/crio.conf | ||
| sudo tee /etc/crio/crio.conf > /dev/null <<EOF | ||
| [crio.image] | ||
| pause_image="registry.k8s.io/pause:3.10" | ||
|
|
||
| [crio.runtime] | ||
| conmon_cgroup = "pod" | ||
| cgroup_manager = "systemd" | ||
| EOF | ||
|
|
||
| # Start CRI-O | ||
| sudo systemctl start crio.service | ||
|
|
||
| sudo swapoff -a | ||
| sudo modprobe br_netfilter | ||
| sudo sysctl -w net.ipv4.ip_forward=1 | ||
|
|
||
| # Apply sysctl params without reboot | ||
| sudo sysctl --system | ||
|
|
||
| # Verify that net.ipv4.ip_forward is set to 1 with: | ||
| sudo sysctl net.ipv4.ip_forward |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| #!/bin/bash | ||
|
|
||
| # Refer to https://v1-32.docs.kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ | ||
| # for more detailed explanation of kubeadm installation. | ||
| # Following instructions are for linux distributions like Ubuntu, Debian, etc. | ||
| # This script is from above official documentation, but modified to work with Debian 11 (bullseye). | ||
|
|
||
| sudo apt-get update | ||
| # apt-transport-https may be a dummy package; if so, you can skip that package | ||
| sudo apt-get install -y apt-transport-https ca-certificates curl gpg | ||
|
|
||
| # If the directory `/etc/apt/keyrings` does not exist, it should be created before the curl command, read the note below. | ||
| sudo mkdir -p -m 755 /etc/apt/keyrings | ||
| curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg | ||
|
|
||
| # This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list | ||
| echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list | ||
|
|
||
| # Update the apt package index, install kubelet, kubeadm and kubectl, and pin their version: | ||
| sudo apt-get update | ||
| sudo apt-get install -y kubelet kubeadm kubectl | ||
| sudo apt-mark hold kubelet kubeadm kubectl | ||
|
|
||
| # (Optional) Enable the kubelet service before running kubeadm: | ||
| sudo systemctl enable --now kubelet |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| #!/bin/bash | ||
|
|
||
| # original kuberay installation reference: https://github.com/ray-project/kuberay?tab=readme-ov-file#helm-charts | ||
|
|
||
| # Add the Helm repo | ||
| helm repo add kuberay https://ray-project.github.io/kuberay-helm/ | ||
| helm repo update | ||
|
|
||
| # Confirm the repo exists | ||
| helm search repo kuberay --devel | ||
|
|
||
| # Install both CRDs and KubeRay operator v1.2.0. | ||
| helm install kuberay-operator kuberay/kuberay-operator --version 1.2.0 | ||
|
|
||
| # Check the KubeRay operator Pod in `default` namespace | ||
| kubectl get pods | ||
insukim1994 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # NAME READY STATUS RESTARTS AGE | ||
| # kuberay-operator-f89ddb644-psts7 1/1 Running 0 33m | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.