[Serve.LLM] Failed to launch disaggregated prefiller & decoder worker when enabled pipeline_parallel_size =2

When I use below serveConfig to launch vllm

> serveConfigV2: |
    applications:
        - name: qianwen2
          import_path: ray.llm._internal.serve.deployments.prefill_decode_disagg.prefill_decode_disagg:build_app
          route_prefix: /
          args:
            prefill_config:
                model_loading_config:
                    model_id: qianwen2
                    model_source: /home/ray/models
                engine_kwargs:
                    dtype: auto
                    device: auto
                    max_model_len: 8192
                    pipeline_parallel_size: 2
                    tensor_parallel_size: 1
                    max_num_seqs: 40
                    gpu_memory_utilization: 0.85
                    trust_remote_code: true
                    enforce_eager: true
                deployment_config:
                    autoscaling_config:
                        min_replicas: 1
                        max_replicas: 3
                runtime_env:
                    env_vars:
                        VLLM_USE_V1: "1"
            decode_config:
                model_loading_config:
                    model_id: qianwen2
                    model_source: /home/ray/models
                engine_kwargs:
                    dtype: auto
                    device: auto
                    max_model_len: 8192
                    pipeline_parallel_size: 2
                    tensor_parallel_size: 1
                    max_num_seqs: 40
                    gpu_memory_utilization: 0.85
                    trust_remote_code: true
                    enforce_eager: true
                deployment_config:
                    autoscaling_config:
                        min_replicas: 1
                        max_replicas: 3
                runtime_env:
                    env_vars:
                        VLLM_USE_V1: "1"


I got below error in prefll and decoder:

> [36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/threading.py", line 982, in run
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m self._target(*self._args, **self._kwargs)
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 469, in _nixl_handshake_listener
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m with zmq_ctx(zmq.ROUTER, path) as sock:
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/contextlib.py", line 137, in __enter__
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m return next(self.gen)
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m ^^^^^^^^^^^^^^
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py", line 1097, in zmq_ctx
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m yield make_zmq_socket(ctx=ctx,
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m ^^^^^^^^^^^^^^^^^^^^^^^^
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/vllm/utils/__init__.py", line 2601, in make_zmq_socket
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m socket.bind(path)
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "/home/ray/anaconda3/lib/python3.11/site-packages/zmq/sugar/socket.py", line 311, in bind
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m super().bind(addr)
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "_zmq.py", line 898, in zmq.backend.cython._zmq.Socket.bind
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m File "_zmq.py", line 160, in zmq.backend.cython._zmq._check_rc
[36m(RayWorkerWrapper pid=908, ip=10.244.121.4)[0m zmq.error.ZMQError: Cannot assign requested address (addr='tcp://10.244.124.3:41059')

It seems the address 10.244.124.3:41059 in another ray worker is not accessible from the prefiller/decoder.
From my understanding, when ray starts vllm engine, it needs to initializ nixl connector, so when it initialize nixl worker agent, it needs to talk to remote nixl agent, but because right now ray cluster doesn't know nixl port, so it hasn't exposed the port, which blocks the nixl connector initialization.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Serve.LLM] Failed to launch disaggregated prefiller & decoder worker when enabled pipeline_parallel_size =2 #55775

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Serve.LLM] Failed to launch disaggregated prefiller & decoder worker when enabled pipeline_parallel_size =2 #55775

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions