Skip to content

Commit e643309

Browse files
authored
[doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667)
As more and more people are trying deepseek models with multi-node inference, #7815 becomes more frequent. Let's give clear message to users. Signed-off-by: youkaichao <youkaichao@gmail.com>
1 parent e489ad7 commit e643309

File tree

2 files changed

+13
-4
lines changed

2 files changed

+13
-4
lines changed

docs/source/serving/distributed_serving.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,8 @@ bash run_cluster.sh \
6060
vllm/vllm-openai \
6161
ip_of_head_node \
6262
--head \
63-
/path/to/the/huggingface/home/in/this/node
63+
/path/to/the/huggingface/home/in/this/node \
64+
-e VLLM_HOST_IP=ip_of_this_node
6465
```
6566

6667
On the rest of the worker nodes, run the following command:
@@ -70,10 +71,11 @@ bash run_cluster.sh \
7071
vllm/vllm-openai \
7172
ip_of_head_node \
7273
--worker \
73-
/path/to/the/huggingface/home/in/this/node
74+
/path/to/the/huggingface/home/in/this/node \
75+
-e VLLM_HOST_IP=ip_of_this_node
7476
```
7577

76-
Then you get a ray cluster of containers. Note that you need to keep the shells running these commands alive to hold the cluster. Any shell disconnect will terminate the cluster. In addition, please note that the argument `ip_of_head_node` should be the IP address of the head node, which is accessible by all the worker nodes. A common misunderstanding is to use the IP address of the worker node, which is not correct.
78+
Then you get a ray cluster of containers. Note that you need to keep the shells running these commands alive to hold the cluster. Any shell disconnect will terminate the cluster. In addition, please note that the argument `ip_of_head_node` should be the IP address of the head node, which is accessible by all the worker nodes. The IP addresses of each worker node should be specified in the `VLLM_HOST_IP` environment variable, and should be different for each worker node. Please check the network configuration of your cluster to make sure the nodes can communicate with each other through the specified IP addresses.
7779

7880
Then, on any node, use `docker exec -it node /bin/bash` to enter the container, execute `ray status` to check the status of the Ray cluster. You should see the right number of nodes and GPUs.
7981

@@ -103,3 +105,7 @@ Please make sure you downloaded the model to all the nodes (with the same path),
103105

104106
When you use huggingface repo id to refer to the model, you should append your huggingface token to the `run_cluster.sh` script, e.g. `-e HF_TOKEN=`. The recommended way is to download the model first, and then use the path to refer to the model.
105107
:::
108+
109+
:::{warning}
110+
If you keep receiving the error message `Error: No available node types can fulfill resource request` but you have enough GPUs in the cluster, chances are your nodes have multiple IP addresses and vLLM cannot find the right one, especially when you are using multi-node inference. Please make sure vLLM and ray use the same IP address. You can set the `VLLM_HOST_IP` environment variable to the right IP address in the `run_cluster.sh` script (different for each node!), and check `ray status` to see the IP address used by Ray. See <gh-issue:7815> for more information.
111+
:::

vllm/executor/ray_utils.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,10 @@ def _wait_until_pg_ready(current_placement_group: "PlacementGroup"):
214214
logger.info(
215215
"Waiting for creating a placement group of specs for "
216216
"%d seconds. specs=%s. Check "
217-
"`ray status` to see if you have enough resources.",
217+
"`ray status` to see if you have enough resources,"
218+
" and make sure the IP addresses used by ray cluster"
219+
" are the same as VLLM_HOST_IP environment variable"
220+
" specified in each node if you are running on a multi-node.",
218221
int(time.time() - s), placement_group_specs)
219222

220223
try:

0 commit comments

Comments
 (0)