vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081

oldmikeyang · 2024-09-14T02:45:13Z

The vllm docker image is

intelanalytics/ipex-llm-serving-xpu-vllm-0.5.4-experimental:2.2.0b1

vLLM start command is

'model="/llm/models/Qwen2-72B-Instruct/"
served_model_name="Qwen2-72B-Instruct"

source /opt/intel/1ccl-wks/setvars.sh

export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=2

python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server
--served-model-name $served_model_name
--port 8000
--model $model
--trust-remote-code
--gpu-memory-utilization 0.85
--device xpu
--dtype float16
--enforce-eager
--load-in-low-bit fp8
--max-model-len 2048
--max-num-batched-tokens 2048
--max-num-seqs 24
-tp 4 -pp 2 --disable-log-requests'

The error information is

(WrapperWithLoadBit pid=35347) 2024:09:13-11:21:50:(35347) |CCL_ERROR| exchange_utils.cpp:202 sendmsg_fd: condition !check_msg_retval("sendmsg", send_bytes, iov, msg, sizeof(u.cntr_buf), sock, fd) failed
(WrapperWithLoadBit pid=35347) errno: Broken pipe
2024:09:13-11:21:50:(31157) |CCL_ERROR| exchange_utils.cpp:202 sendmsg_fd: condition !check_msg_retval("sendmsg", send_bytes, iov, msg, sizeof(u.cntr_buf), sock, fd) failed
errno: Broken pipe
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Error executing method init_device. This might cause deadlock in distributed execution.
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Traceback (most recent call last):
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] return executor(*args, **kwargs)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^^^^^
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] self.init_worker_distributed_environment()
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] get_pp_group().all_reduce(torch.zeros(1).xpu())
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] torch.distributed.all_reduce(input_, group=self.device_group)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] return func(*args, **kwargs)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] work.wait()
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] RuntimeError: oneCCL: exchange_utils.cpp:202 sendmsg_fd: EXCEPTION: errno: Broken pipe
ERROR 09-13 11:21:51 worker_base.py:386] Error executing method init_device. This might cause deadlock in distributed execution.
ERROR 09-13 11:21:51 worker_base.py:386] Traceback (most recent call last):
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
ERROR 09-13 11:21:51 worker_base.py:386] return executor(*args, **kwargs)
ERROR 09-13 11:21:51 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
ERROR 09-13 11:21:51 worker_base.py:386] self.init_worker_distributed_environment()
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
ERROR 09-13 11:21:51 worker_base.py:386] get_pp_group().all_reduce(torch.zeros(1).xpu())
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
ERROR 09-13 11:21:51 worker_base.py:386] torch.distributed.all_reduce(input_, group=self.device_group)
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
ERROR 09-13 11:21:51 worker_base.py:386] return func(*args, **kwargs)
ERROR 09-13 11:21:51 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^
ERROR 09-13 11:21:51 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
ERROR 09-13 11:21:51 worker_base.py:386] work.wait()
ERROR 09-13 11:21:51 worker_base.py:386] RuntimeError: oneCCL: exchange_utils.cpp:202 sendmsg_fd: EXCEPTION: errno: Broken pipe
Process Process-65:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 220, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, port, load_in_low_bit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 27, in init
self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 43, in from_engine_args
return super().from_engine_args(engine_args, start_engine_loop, usage_context, stat_loggers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 476, in from_engine_args
engine = cls(
^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 29, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 381, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 557, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 255, in init
self.model_executor = executor_class(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_xpu_executor.py", line 35, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 555, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/distributed_gpu_executor.py", line 25, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/xpu_executor.py", line 53, in init
self._init_executor()
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 61, in _init_executor
self._init_workers_ray(placement_group)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 230, in _init_workers_ray
self._run_workers("init_device")
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 468, in run_workers
self.driver_worker.execute_method(method, *driver_args,
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 387, in execute_method
raise e
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
return executor(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
self.init_worker_distributed_environment()
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
get_pp_group().all_reduce(torch.zeros(1).xpu())
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/distributed/parallel_state.py", line 293, in all_reduce
torch.distributed.all_reduce(input, group=self.device_group)
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/c10d_logger.py", line 47, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/distributed/distributed_c10d.py", line 2055, in all_reduce
work.wait()

The workaround is

vi /usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py
            get_pp_group().all_gather(torch.zeros(1).xpu())
            #get_pp_group().all_reduce(torch.zeros(1).xpu())

The text was updated successfully, but these errors were encountered:

oldmikeyang · 2024-09-14T03:05:21Z

After use modify the /usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py with the
get_pp_group().all_gather(torch.zeros(1).xpu())

vLLM start with the following error

2024:09:14-11:02:35:( 241) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
2024:09:14-11:02:35:( 241) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices
-----> current rank: 0, world size: 4, byte_count: 33554432
(WrapperWithLoadBit pid=3548) -----> current rank: 1, world size: 4, byte_count: 33554432
(WrapperWithLoadBit pid=4874) INFO 09-14 11:01:33 selector.py:127] Cannot use _Backend.FLASH_ATTN backend on XPU. [repeated 13x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(WrapperWithLoadBit pid=4874) INFO 09-14 11:01:33 selector.py:76] Using IPEX attention backend. [repeated 13x across cluster]
(WrapperWithLoadBit pid=4874) 2024:09:14-11:01:32:( 4874) |CCL_WARN| did not find MPI-launcher specific variables, switch to ATL/OFI, to force enable ATL/MPI set CCL_ATL_TRANSPORT=mpi [repeated 6x across cluster]
(WrapperWithLoadBit pid=4874) 2024:09:14-11:01:33:( 5291) |CCL_WARN| no membind support for NUMA node 1, skip thread membind [repeated 6x across cluster]
(WrapperWithLoadBit pid=3548) 2024:09:14-11:02:35:( 3548) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 392x across cluster]
(WrapperWithLoadBit pid=4211) -----> current rank: 0, world size: 4, byte_count: 33554432 [repeated 3x across cluster]
(WrapperWithLoadBit pid=4211) 2024:09:14-11:02:42:( 4211) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 168x across cluster]
(WrapperWithLoadBit pid=4211) GPU-Xeon4410Y-ARC770:rank4: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen
(WrapperWithLoadBit pid=4211) 2024-09-14 11:02:34,843 - INFO - Loading model weights took 9.7260 GB [repeated 6x across cluster]
(WrapperWithLoadBit pid=4211) [1726282973.117141429] GPU-Xeon4410Y-ARC770:rank4.perWithLoadBit.execute_method: Reading from remote process' memory failed. Disabling CMA support
(WrapperWithLoadBit pid=4874) -----> current rank: 3, world size: 4, byte_count: 33554432 [repeated 3x across cluster]
(WrapperWithLoadBit pid=4874) 2024:09:14-11:02:42:( 4874) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 168x across cluster]
(WrapperWithLoadBit pid=4211) /usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
(WrapperWithLoadBit pid=4211) warnings.warn('resource_tracker: There appear to be %d '
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffb4b97d156cb581f7dbb82f2701000000 Worker ID: 3604d35e3933d71bde300919a86052b60242e5e8b0941e93374b84dc Node ID: 5c6f23e2660bea6d8af3fd5fd8ab94aa9233ae8d1bbfa29dfe86f788 Worker IP address: 10.240.108.91 Worker port: 38205 Worker PID: 4211 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
Process Process-65:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 220, in run_rpc_server
server = AsyncEngineRPCServer(async_engine_args, usage_context, port, load_in_low_bit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/entrypoints/openai/rpc/server.py", line 27, in init
self.engine = AsyncLLMEngine.from_engine_args(async_engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 43, in from_engine_args
return super().from_engine_args(engine_args, start_engine_loop, usage_context, stat_loggers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 476, in from_engine_args
engine = cls(
^^^^
File "/usr/local/lib/python3.11/dist-packages/ipex_llm/vllm/xpu/engine/engine.py", line 29, in init
super().init(*args, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 381, in init
self.engine = self._init_engine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/async_llm_engine.py", line 557, in _init_engine
return engine_class(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 270, in init
self._initialize_kv_caches()
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 369, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/distributed_gpu_executor.py", line 38, in determine_num_available_blocks
num_blocks = self._run_workers("determine_num_available_blocks", )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/executor/ray_gpu_executor.py", line 481, in _run_workers
ray_worker_outputs = ray.get(ray_worker_outputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/_private/worker.py", line 2661, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/ray/_private/worker.py", line 873, in get_objects
raise value
ray.exceptions.ActorDiedError: The actor died unexpectedly before finishing this task.
class_name: get_ipex_llm_wrapper..WrapperWithLoadBit
actor_id: b4b97d156cb581f7dbb82f2701000000
pid: 4211
namespace: ac18545b-bc5c-4a95-a517-cfa1e1af06de
ip: 10.240.108.91
The actor is dead because its worker process has died. Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. There are some potential root causes. (1) The process is killed by SIGKILL by OOM killer due to high memory usage. (2) ray stop --force is called. (3) The worker is crashed unexpectedly due to SIGSEGV or other unexpected errors.
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

xiangyuT · 2024-09-19T02:30:00Z

2024:09:13-11:21:50:(31157) |CCL_ERROR| exchange_utils.cpp:202 sendmsg_fd: condition !check_msg_retval("sendmsg", send_bytes, iov, msg, sizeof(u.cntr_buf), sock, fd) failed
errno: Broken pipe
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Error executing method init_device. This might cause deadlock in distributed execution.
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] Traceback (most recent call last):
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/worker_base.py", line 378, in execute_method
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] return executor(*args, **kwargs)
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] ^^^^^^^^^^^^^^^^^^^^^^^^^
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 105, in init_device
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] self.init_worker_distributed_environment()
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] File "/usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 205, in init_worker_distributed_environment
(WrapperWithLoadBit pid=35347) ERROR 09-13 11:21:50 worker_base.py:386] get_pp_group().all_reduce(torch.zeros(1).xpu())

The first issue could be solved by this modification:

vi /usr/local/lib/python3.11/dist-packages/vllm-0.5.4+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py
            get_pp_group().all_gather(torch.zeros(1).xpu())
            #get_pp_group().all_reduce(torch.zeros(1).xpu())

(WrapperWithLoadBit pid=4874) 2024:09:14-11:01:33:( 5291) |CCL_WARN| no membind support for NUMA node 1, skip thread membind [repeated 6x across cluster]
(WrapperWithLoadBit pid=3548) 2024:09:14-11:02:35:( 3548) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 392x across cluster]
(WrapperWithLoadBit pid=4211) -----> current rank: 0, world size: 4, byte_count: 33554432 [repeated 3x across cluster]
(WrapperWithLoadBit pid=4211) 2024:09:14-11:02:42:( 4211) |CCL_WARN| topology recognition shows PCIe connection between devices. If this is not correct, you can disable topology recognition, with CCL_TOPO_FABRIC_VERTEX_CONNECTION_CHECK=0. This will assume XeLinks across devices [repeated 168x across cluster]
(WrapperWithLoadBit pid=4211) GPU-Xeon4410Y-ARC770:rank4: Assertion failure at psm3/ptl_am/ptl.c:196: nbytes == req->req_data.recv_msglen

We were unable to reproduce the second issue in our environment. It may be related to settings in the startup container script.

glorysdj assigned xiangyuT Sep 14, 2024

glorysdj added user issue multi-arc labels Sep 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081

vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081

oldmikeyang commented Sep 14, 2024

oldmikeyang commented Sep 14, 2024

xiangyuT commented Sep 19, 2024

vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081

vLLM 0.5.4 failure to start the TP+ PP mode on 8 ARC #12081

Comments

oldmikeyang commented Sep 14, 2024

The vllm docker image is

vLLM start command is

The error information is

The workaround is

oldmikeyang commented Sep 14, 2024

vLLM start with the following error

xiangyuT commented Sep 19, 2024