-
Notifications
You must be signed in to change notification settings - Fork 7k
Description
When I use below serveConfig to launch vllm
serveConfigV2: |
applications:
- name: qianwen2
import_path: ray.llm._internal.serve.deployments.prefill_decode_disagg.prefill_decode_disagg:build_app
route_prefix: /
args:
prefill_config:
model_loading_config:
model_id: qianwen2
model_source: /home/ray/models
engine_kwargs:
dtype: auto
device: auto
max_model_len: 8192
pipeline_parallel_size: 2
tensor_parallel_size: 1
max_num_seqs: 40
gpu_memory_utilization: 0.85
trust_remote_code: true
enforce_eager: true
deployment_config:
autoscaling_config:
min_replicas: 1
max_replicas: 3
runtime_env:
env_vars:
VLLM_USE_V1: "1"
decode_config:
model_loading_config:
model_id: qianwen2
model_source: /home/ray/models
engine_kwargs:
dtype: auto
device: auto
max_model_len: 8192
pipeline_parallel_size: 2
tensor_parallel_size: 1
max_num_seqs: 40
gpu_memory_utilization: 0.85
trust_remote_code: true
enforce_eager: true
deployment_config:
autoscaling_config:
min_replicas: 1
max_replicas: 3
runtime_env:
env_vars:
VLLM_USE_V1: "1"
I got below error in prefll and decoder:
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/threading.py", line 982, in run
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m self._target(*self._args, **self._kwargs)
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 469, in _nixl_handshake_listener
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m with zmq_ctx(zmq.ROUTER, path) as sock:
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/contextlib.py", line 137, in enter
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m return next(self.gen)
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m ^^^^^^^^^^^^^^
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1097, in zmq_ctx
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m yield make_zmq_socket(ctx=ctx,
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/utils/init.py", line 2601, in make_zmq_socket
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m socket.bind(path)
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/zmq/sugar/socket.py", line 311, in bind
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m super().bind(addr)
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "_zmq.py", line 898, in zmq.backend.cython._zmq.Socket.bind
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m File "_zmq.py", line 160, in zmq.backend.cython._zmq._check_rc
�[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)�[0m zmq.error.ZMQError: Cannot assign requested address (addr='tcp://10.244.124.3:41059')
It seems the address 10.244.124.3:41059 in another ray worker is not accessible from the prefiller/decoder.
From my understanding, when ray starts vllm engine, it needs to initializ nixl connector, so when it initialize nixl worker agent, it needs to talk to remote nixl agent, but because right now ray cluster doesn't know nixl port, so it hasn't exposed the port, which blocks the nixl connector initialization.