Skip to content

[Bug][Failing Test] distributed tests (4 GPUS) - v1/test_async_llm_dp.py::test_load #18466

@markmc

Description

@markmc

Your current environment

Still failing on main as of commit 0c15c2e

🐛 Describe the bug

Failing tests: https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests?branch=main&period=2days&query=test_async_llm_dp&commit=Search

FAILED v1/test_async_llm_dp.py::test_load[RequestOutputKind.DELTA] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED v1/test_async_llm_dp.py::test_load[RequestOutputKind.FINAL_ONLY] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
Logs
(EngineCore_0 pid=4396) (VllmWorker rank=1 pid=4418) WARNING 05-20 22:23:45 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=40,N=128,device_name=NVIDIA_L4.json
(EngineCore_1 pid=4399) (VllmWorker rank=1 pid=4417) WARNING 05-20 22:23:45 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=40,N=128,device_name=NVIDIA_L4.json
(EngineCore_1 pid=4399) (VllmWorker rank=0 pid=4415) WARNING 05-20 22:23:45 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=40,N=128,device_name=NVIDIA_L4.json
(EngineCore_0 pid=4396) (VllmWorker rank=0 pid=4416) WARNING 05-20 22:23:45 [fused_moe.py:682] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=40,N=128,device_name=NVIDIA_L4.json
(EngineCore_0 pid=4396) INFO 05-20 22:23:46 [kv_cache_utils.py:637] GPU KV cache size: 576,432 tokens
(EngineCore_0 pid=4396) INFO 05-20 22:23:46 [kv_cache_utils.py:640] Maximum concurrency for 4,096 tokens per request: 140.73x
(EngineCore_0 pid=4396) INFO 05-20 22:23:46 [kv_cache_utils.py:637] GPU KV cache size: 576,432 tokens
(EngineCore_0 pid=4396) INFO 05-20 22:23:46 [kv_cache_utils.py:640] Maximum concurrency for 4,096 tokens per request: 140.73x
(EngineCore_1 pid=4399) INFO 05-20 22:23:46 [kv_cache_utils.py:637] GPU KV cache size: 576,432 tokens
(EngineCore_1 pid=4399) INFO 05-20 22:23:46 [kv_cache_utils.py:640] Maximum concurrency for 4,096 tokens per request: 140.73x
(EngineCore_1 pid=4399) INFO 05-20 22:23:46 [kv_cache_utils.py:637] GPU KV cache size: 576,432 tokens
(EngineCore_1 pid=4399) INFO 05-20 22:23:46 [kv_cache_utils.py:640] Maximum concurrency for 4,096 tokens per request: 140.73x
(EngineCore_1 pid=4399) INFO 05-20 22:23:46 [core.py:163] init engine (profile, create kv cache, warmup model) took 1.83 seconds
(EngineCore_0 pid=4396) INFO 05-20 22:23:46 [core.py:163] init engine (profile, create kv cache, warmup model) took 1.83 seconds
[rank1]:[E520 22:23:46.546600506 ProcessGroupNCCL.cpp:1896] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f57929785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f579290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f5792e0b422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f572268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f572269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f572269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f572269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f57129b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f58141a2ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f5814233a04 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::DistBackendError'
[rank1]:[E520 22:23:46.547487691 ProcessGroupNCCL.cpp:1896] [PG ID 4 PG GUID 17 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f57929785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f579290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f5792e0b422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f572268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f572269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f572269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f572269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f57129b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f58141a2ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f5814233a04 in /lib/x86_64-linux-gnu/libc.so.6)

terminate called recursively
Fatal Python error: Aborted

Thread 0x00007f57a77d7640 (most recent call first):
  File "/usr/lib/python3.12/threading.py", line 359 in wait
  File "/usr/lib/python3.12/threading.py", line 655 in wait
  File "/usr/local/lib/python3.12/dist-packages/tqdm/_monitor.py", line 60 in run
  File "/usr/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
  File "/usr/lib/python3.12/threading.py", line 1032 in _bootstrap

Thread 0x00007f581410d000 (most recent call first):
  File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 425 in acquire_read
  File "/usr/lib/python3.12/contextlib.py", line 137 in __enter__
  File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479 in dequeue
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 510 in worker_busy_loop
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 478 in worker_main
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 386 in make_worker_process
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 82 in _init_executor
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 67 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 379 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 679 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 478 in run_engine_core
  File "  what():  /usr/lib/python3.12/multiprocessing/process.py[PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f57929785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xe0 (0x7f579290d4a2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x3c2 (0x7f5792e0b422 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f572268b456 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x70 (0x7f572269b6f0 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x782 (0x7f572269d282 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x14d (0x7f572269ee8d in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xdc253 (0x7f57129b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #8: <unknown function> + 0x94ac3 (0x7f58141a2ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #9: clone + 0x44 (0x7f5814233a04 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from ncclCommWatchdog at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1902 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x98 (0x7f57929785e8 in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xcc7a4e (0x7f572266da4e in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x9165ed (0x7f57222bc5ed in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xdc253 (0x7f57129b3253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #4: <unknown function> + 0x94ac3 (0x7f58141a2ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: clone + 0x44 (0x7f5814233a04 in /lib/x86_64-linux-gnu/libc.so.6)
"
, line 108 in run
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314 in _bootstrap
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 71 in _launch
  File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 19 in __init__
  File "/usr/lib/python3.12/multiprocessing/context.py", line 282 in _Popen
  File "/usr/lib/python3.12/multiprocessing/process.py", line 121 in start
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/utils.py", line 142 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 404 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 734 in __init__
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py"DEBUG 05-20 22:23:46 [core_client.py:540] READY from local core engine process 0.
(EngineCore_0 pid=4396) DEBUG 05-20 22:23:46 [core.py:517] EngineCore waiting for work.
DEBUG 05-20 22:23:46 [core_client.py:540] READY from local core engine process 1.
INFO 05-20 22:23:46 [loggers.py:134] vllm cache_config_info with initialization after num_gpu_blocks is: 72054
(EngineCore_1 pid=4399) DEBUG 05-20 22:23:46 [core.py:517] EngineCore waiting for work.
DEBUG 05-20 22:23:46 [core_client.py:992] Sending start DP wave 0.
(EngineCore_0 pid=4396) DEBUG 05-20 22:23:46 [core.py:523] EngineCore loop active.
(EngineCore_1 pid=4399) DEBUG 05-20 22:23:46 [core.py:728] EngineCore starting idle loop for wave 0.
(EngineCore_1 pid=4399) DEBUG 05-20 22:23:46 [core.py:523] EngineCore loop active.
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [dump_input.py:68] Dumping input data
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [dump_input.py:70] V1 LLM engine (v0.9.1.dev18+g0c15c2e48) with config: model='ibm-research/PowerMoE-3b', speculative_config=None, tokenizer='ibm-research/PowerMoE-3b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=ibm-research/PowerMoE-3b, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"compile_sizes": [], "inductor_compile_config": {"enable_auto_functionalized_v2": false}, "cudagraph_capture_sizes": [], "max_capture_size": 0}, 
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [dump_input.py:78] Dumping scheduler output for model execution:
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [dump_input.py:79] SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=request-0,prompt_token_ids_len=7,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=True, max_tokens=10, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=[[1]],num_computed_tokens=0,lora_request=None)],scheduled_cached_reqs=[],num_scheduled_tokens={request-0: 7},total_num_scheduled_tokens=7,scheduled_spec_decode_tokens={},scheduled_encoder_inputs={},num_common_prefix_blocks=[1],finished_req_ids=[],free_encoder_input_ids=[],structured_output_request_ids={},grammar_bitmask=null,kv_connector_metadata=null)
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [dump_input.py:81] SchedulerStats(num_running_reqs=1, num_waiting_reqs=0, gpu_cache_usage=5.5513920115490833e-05, prefix_cache_stats=PrefixCacheStats(reset=False, requests=1, queries=7, hits=0), spec_decoding_stats=None)
(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] EngineCore encountered a fatal error.

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] Traceback (most recent call last):

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 215, in collective_rpc

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     result = get_response(w, dequeue_timeout)

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 198, in get_response

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     status, result = w.worker_response_mq.dequeue(

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 479, in dequeue

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     with self.acquire_read(timeout, cancel) as buf:

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/lib/python3.12/contextlib.py", line 137, in __enter__

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     return next(self.gen)

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]            ^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 443, in acquire_read

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     raise TimeoutError

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] TimeoutError

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] 

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] The above exception was the direct cause of the following exception:

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] 

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] Traceback (most recent call last):

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 482, in run_engine_core

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     engine_core.run_busy_loop()

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 746, in run_busy_loop

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     self._process_engine_step()

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 534, in _process_engine_step

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     outputs = self.step_fn()

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]               ^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 222, in step

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     model_output = self.execute_model(scheduler_output)

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 209, in execute_model

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     raise err

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 203, in execute_model

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     return self.model_executor.execute_model(scheduler_output)

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 158, in execute_model

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     (output, ) = self.collective_rpc("execute_model",

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 221, in collective_rpc

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491]     raise TimeoutError(f"RPC call to {method} timed out.") from e

(EngineCore_0 pid=4396) ERROR 05-20 22:24:26 [core.py:491] TimeoutError: RPC call to execute_model timed out.
ERROR 05-20 22:24:26 [async_llm.py:403] AsyncLLM output_handler failed.

ERROR 05-20 22:24:26 [async_llm.py:403] Traceback (most recent call last):

ERROR 05-20 22:24:26 [async_llm.py:403]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 361, in output_handler

ERROR 05-20 22:24:26 [async_llm.py:403]     outputs = await engine_core.get_output_async()

ERROR 05-20 22:24:26 [async_llm.py:403]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ERROR 05-20 22:24:26 [async_llm.py:403]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async

ERROR 05-20 22:24:26 [async_llm.py:403]     raise self._format_exception(outputs) from None

ERROR 05-20 22:24:26 [async_llm.py:403] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_1 pid=4399) DEBUG 05-20 22:24:26 [core.py:485] EngineCore exiting.
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 3 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
FAILED

=================================== FAILURES ===================================
______________________ test_load[RequestOutputKind.DELTA] ______________________

output_kind = <RequestOutputKind.DELTA: 1>

    @pytest.mark.parametrize(
        "output_kind", [RequestOutputKind.DELTA, RequestOutputKind.FINAL_ONLY])
    @pytest.mark.asyncio
    async def test_load(output_kind: RequestOutputKind):
    
        with ExitStack() as after:
    
            prompt = "This is a test of data parallel"
    
            engine = AsyncLLM.from_engine_args(engine_args)
            after.callback(engine.shutdown)
    
            NUM_REQUESTS = 100
            NUM_EXPECTED_TOKENS = 10
    
            request_ids = [f"request-{i}" for i in range(NUM_REQUESTS)]
    
            # Create concurrent requests.
            tasks = []
            for request_id in request_ids:
                tasks.append(
                    asyncio.create_task(
                        generate(engine, request_id, prompt, output_kind,
                                 NUM_EXPECTED_TOKENS)))
    
            # Confirm that we got all the EXPECTED tokens from the requests.
            done, pending = await asyncio.wait(tasks,
                                               return_when=asyncio.FIRST_EXCEPTION)
            for task in pending:
                task.cancel()
            for task in done:
>               num_generated_tokens, request_id = await task

v1/test_async_llm_dp.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:361: in output_handler
    outputs = await engine_core.get_output_async()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.DPAsyncMPClient object at 0x7f55bebbe8d0>

    async def get_output_async(self) -> EngineCoreOutputs:
        self._ensure_output_queue_task()
        # If an exception arises in process_outputs_socket task,
        # it is forwarded to the outputs_queue so we can raise it
        # from this (run_output_handler) task to shut down the server.
        assert self.outputs_queue is not None
        outputs = await self.outputs_queue.get()
        if isinstance(outputs, Exception):
>           raise self._format_exception(outputs) from None
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:806: EngineDeadError
___________________ test_load[RequestOutputKind.FINAL_ONLY] ____________________

output_kind = <RequestOutputKind.FINAL_ONLY: 2>

    @pytest.mark.parametrize(
        "output_kind", [RequestOutputKind.DELTA, RequestOutputKind.FINAL_ONLY])
    @pytest.mark.asyncio
    async def test_load(output_kind: RequestOutputKind):
    
        with ExitStack() as after:
    
            prompt = "This is a test of data parallel"
    
            engine = AsyncLLM.from_engine_args(engine_args)
            after.callback(engine.shutdown)
    
            NUM_REQUESTS = 100
            NUM_EXPECTED_TOKENS = 10
    
            request_ids = [f"request-{i}" for i in range(NUM_REQUESTS)]
    
            # Create concurrent requests.
            tasks = []
            for request_id in request_ids:
                tasks.append(
                    asyncio.create_task(
                        generate(engine, request_id, prompt, output_kind,
                                 NUM_EXPECTED_TOKENS)))
    
            # Confirm that we got all the EXPECTED tokens from the requests.
            done, pending = await asyncio.wait(tasks,
                                               return_when=asyncio.FIRST_EXCEPTION)
            for task in pending:
                task.cancel()
            for task in done:
>               num_generated_tokens, request_id = await task

v1/test_async_llm_dp.py:92: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
v1/test_async_llm_dp.py:46: in generate
    async for out in engine.generate(request_id=request_id,
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:310: in generate
    out = q.get_nowait() or await q.get()
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py:51: in get
    raise output
/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py:361: in output_handler
    outputs = await engine_core.get_output_async()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vllm.v1.engine.core_client.DPAsyncMPClient object at 0x7f57f85341d0>

    async def get_output_async(self) -> EngineCoreOutputs:
        self._ensure_output_queue_task()
        # If an exception arises in process_outputs_socket task,
        # it is forwarded to the outputs_queue so we can raise it
        # from this (run_output_handler) task to shut down the server.
        assert self.outputs_queue is not None
        outputs = await self.outputs_queue.get()
        if isinstance(outputs, Exception):
>           raise self._format_exception(outputs) from None
E           vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py:806: EngineDeadError
---------------------------- Captured log teardown -----------------------------
ERROR    asyncio:base_events.py:1833 Task exception was never retrieved
future: <Task finished name='Task-103' coro=<generate() done, defined at /vllm-workspace/tests/v1/test_async_llm_dp.py:31> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
                             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 242, in <lambda>
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/_pytest/threadexception.py", line 92, in pytest_runtest_call
    yield from thread_exception_runtest_hook()
  File "/usr/local/lib/python3.12/dist-packages/_pytest/threadexception.py", line 68, in thread_exception_runtest_hook
    yield
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/_pytest/unraisableexception.py", line 95, in pytest_runtest_call
    yield from unraisable_exception_runtest_hook()
  File "/usr/local/lib/python3.12/dist-packages/_pytest/unraisableexception.py", line 70, in unraisable_exception_runtest_hook
    yield
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/_pytest/logging.py", line 846, in pytest_runtest_call
    yield from self._runtest_for(item, "call")
  File "/usr/local/lib/python3.12/dist-packages/_pytest/logging.py", line 829, in _runtest_for
    yield
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/_pytest/capture.py", line 880, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/_pytest/skipping.py", line 257, in pytest_runtest_call
    return (yield)
            ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 174, in pytest_runtest_call
    item.runtest()
  File "/usr/local/lib/python3.12/dist-packages/pytest_asyncio/plugin.py", line 457, in runtest
    super().runtest()
  File "/usr/local/lib/python3.12/dist-packages/_pytest/python.py", line 1627, in runtest
    self.ihook.pytest_pyfunc_call(pyfuncitem=self)
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_hooks.py", line 513, in __call__
    return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_manager.py", line 120, in _hookexec
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 182, in _multicall
    return outcome.get_result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_result.py", line 100, in get_result
    raise exc.with_traceback(exc.__traceback__)
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 167, in _multicall
    teardown.throw(outcome._exception)
  File "/usr/local/lib/python3.12/dist-packages/schemathesis/extra/pytest_plugin.py", line 312, in pytest_pyfunc_call
    yield
  File "/usr/local/lib/python3.12/dist-packages/pluggy/_callers.py", line 103, in _multicall
    res = hook_impl.function(*args)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/_pytest/python.py", line 159, in pytest_pyfunc_call
    result = testfunction(**testargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pytest_asyncio/plugin.py", line 929, in inner
    _loop.run_until_complete(task)
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 92, in test_load
    num_generated_tokens, request_id = await task
                                       ^^^^^^^^^^
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/vllm-workspace/tests/v1/test_async_llm_dp.py", line 46, in generate
    async for out in engine.generate(request_id=request_id,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 310, in generate
    out = q.get_nowait() or await q.get()
                            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/output_processor.py", line 51, in get
    raise output
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 361, in output_handler
    outputs = await engine_core.get_output_async()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async
    raise self._format_exception(outputs) from None
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
=============================== warnings summary ===============================
../../usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305
  /usr/local/lib/python3.12/dist-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/v1/test_async_llm_dp.py::test_load[RequestOutputKind.DELTA]
tests/v1/test_async_llm_dp.py::test_load[RequestOutputKind.DELTA]
tests/v1/test_async_llm_dp.py::test_load[RequestOutputKind.FINAL_ONLY]
tests/v1/test_async_llm_dp.py::test_load[RequestOutputKind.FINAL_ONLY]
  /usr/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=3694) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED v1/test_async_llm_dp.py::test_load[RequestOutputKind.DELTA] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
FAILED v1/test_async_llm_dp.py::test_load[RequestOutputKind.FINAL_ONLY] - vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
================== 2 failed, 5 warnings in 125.53s (0:02:05) ===================
Task exception was never retrieved
future: <Task finished name='Task-206' coro=<generate() done, defined at /vllm-workspace/tests/v1/test_async_llm_dp.py:31> exception=EngineDeadError('EngineCore encountered an issue. See stack trace (above) for the root cause.')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/_pytest/runner.py", line 341, in from_call
    result: TResult | None = func()
                             ^^^^^^
...
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 361, in output_handler
    outputs = await engine_core.get_output_async()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 806, in get_output_async
    raise self._format_exception(outputs) from None
vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
^^^ +++
🚨 Error: The command exited with status 1
^^^ +++
user command error: The plugin docker command hook exited with status 1
~~~ Running global pre-exit hook
$ /etc/buildkite-agent/hooks/pre-exit
~~~ Running plugin docker pre-exit hook
$ /var/lib/buildkite-agent/plugins/bk-gpu-4-queue-ci-i-0673491754b6e6e0f-1/github-com-buildkite-plugins-docker-buildkite-plugin-v5-2-0/hooks/pre-exit

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingci-failureIssue about an unexpected test failure in CI

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions