Skip to content

Commit 6d11ed5

Browse files
committed
[Doc] Elaborated important note when applying pipeline parallelism (with ray).
Signed-off-by: insukim1994 <insu.kim@moreh.io>
1 parent d79c847 commit 6d11ed5

File tree

1 file changed

+9
-6
lines changed

1 file changed

+9
-6
lines changed

tutorials/15-basic-pipeline-parallel.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -54,13 +54,16 @@ This tutorial provides a step-by-step guide for configuring and deploying the vL
5454
- **`requestMemory`**: Memory allocation for each Kuberay worker pod. Sufficient memory is required to load the model.
5555
- **`requestGPU`**: Specifies the number of GPUs to allocate for each Kuberay worker pod.
5656
- **`vllmConfig`**: Contains model-specific configurations:
57-
- `tensorParallelSize`: Defines the number of GPUs allocated to each worker pod.
58-
- `pipelineParallelSize`: Specifies the degree of pipeline parallelism.
57+
- `tensorParallelSize`: Specifies the number of GPUs assigned to each worker pod. This value must be identical to both `requestGPU` and `raySpec.headNode.requestGPU`.
58+
- `pipelineParallelSize`: Indicates the level of pipeline parallelism. This value must be equal to `replicaCount + 1`, representing the total number of Ray cluster nodes, including both head and worker nodes.
5959
- **Important Note:**
60-
- The total number of GPUs required is calculated as: `pipelineParallelSize` × `tensorParallelSize`
61-
- This value must exactly match the sum of:
62-
- `replicaCount` × `requestGPU` (i.e., the total number of GPUs allocated to Ray worker nodes)
63-
- `raySpec.headNode.requestGPU` (i.e., the number of GPUs allocated to the Ray head node).
60+
- The total number of GPUs required is computed as `pipelineParallelSize × tensorParallelSize`.
61+
- This total must exactly match the sum of:
62+
- `replicaCount × requestGPU` (the total number of GPUs allocated to Ray worker nodes), and
63+
- `raySpec.headNode.requestGPU` (the number of GPUs allocated to the Ray head node).
64+
- The `requestGPU` value for the Ray head node must be identical to that of each worker node.
65+
- `tensorParallelSize` defines the number of GPUs allocated per Ray node (including both head and worker nodes), and must be consistent across all nodes.
66+
- `pipelineParallelSize` represents the total number of Ray nodes, and must therefore be set to replicaCount + 1 (i.e., the number of worker nodes plus the head node).
6467
- **`shmSize`**: Configures the shared memory size to ensure adequate memory is available for inter-process communication during tensor and pipeline parallelism execution.
6568
- **`hf_token`**: The Hugging Face token for authenticating with the Hugging Face model hub.
6669

0 commit comments

Comments
 (0)