You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`tensorParallelSize`: Defines the number of GPUs allocated to each worker pod.
58
-
-`pipelineParallelSize`: Specifies the degree of pipeline parallelism.
57
+
-`tensorParallelSize`: Specifies the number of GPUs assigned to each worker pod. This value must be identical to both `requestGPU` and `raySpec.headNode.requestGPU`.
58
+
-`pipelineParallelSize`: Indicates the level of pipeline parallelism. This value must be equal to `replicaCount + 1`, representing the total number of Ray cluster nodes, including both head and worker nodes.
59
59
-**Important Note:**
60
-
- The total number of GPUs required is calculated as: `pipelineParallelSize` × `tensorParallelSize`
61
-
- This value must exactly match the sum of:
62
-
-`replicaCount` × `requestGPU` (i.e., the total number of GPUs allocated to Ray worker nodes)
63
-
-`raySpec.headNode.requestGPU` (i.e., the number of GPUs allocated to the Ray head node).
60
+
- The total number of GPUs required is computed as `pipelineParallelSize × tensorParallelSize`.
61
+
- This total must exactly match the sum of:
62
+
-`replicaCount × requestGPU` (the total number of GPUs allocated to Ray worker nodes), and
63
+
-`raySpec.headNode.requestGPU` (the number of GPUs allocated to the Ray head node).
64
+
- The `requestGPU` value for the Ray head node must be identical to that of each worker node.
65
+
-`tensorParallelSize` defines the number of GPUs allocated per Ray node (including both head and worker nodes), and must be consistent across all nodes.
66
+
-`pipelineParallelSize` represents the total number of Ray nodes, and must therefore be set to replicaCount + 1 (i.e., the number of worker nodes plus the head node).
64
67
-**`shmSize`**: Configures the shared memory size to ensure adequate memory is available for inter-process communication during tensor and pipeline parallelism execution.
65
68
-**`hf_token`**: The Hugging Face token for authenticating with the Hugging Face model hub.
0 commit comments