Skip to content

[Bug]: The Qwen3-235B-A22-W8A8 quantized version causes model crashes when calling tools. #3009

@LYK918

Description

@LYK918

Your current environment

v0.10.0rc1的docker镜像
910B3

🐛 Describe the bug

[rank6]:[W918 06:56:43.920069840 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 6140] 2025-09-18-06:56:43.535.507 The error from device(chipId:6, dieId:0), serial number is 17, there is an exception of aivec error, core id is 14, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0xdb1dd707a9, mte error info: 0x6203000071, ifu error info: 0x20003fffeda00, ccu error info: 0x81e139000000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0x3000071, fixp_error1 info: 0x62, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=6, stream_id=2, report_stream_id=2, task_id=35872, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
[rank0]:[W918 06:56:43.920516612 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5131] 2025-09-18-06:56:43.537.956 The error from device(chipId:0, dieId:0), serial number is 15, there is an exception of aivec error, core id is 32, error code = 0, dump info: pc start: 0x12c0c189b000, current: 0x12c0c189b95c, vec error info: 0x770d9d4caf, mte error info: 0x267c7f9fe, ifu error info: 0x20003fffeda00, ccu error info: 0x9ae41e1600000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c100508000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0x7c7f9fe, fixp_error1 info: 0x2, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=0, stream_id=1345, report_stream_id=1345, task_id=35872, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
[rank5]:[W918 06:56:43.922575499 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5892] 2025-09-18-06:56:43.538.540 The error from device(chipId:5, dieId:0), serial number is 17, there is an exception of aivec error, core id is 11, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0x6016754824, mte error info: 0x1463007013, ifu error info: 0x20003fffeda00, ccu error info: 0xb7c1136700000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0x3007013, fixp_error1 info: 0x14, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=5, stream_id=2, report_stream_id=2, task_id=35873, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
[rank1]:[W918 06:56:43.922759639 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5137] 2025-09-18-06:56:43.535.303 The error from device(chipId:1, dieId:0), serial number is 17, there is an exception of aivec error, core id is 15, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0xf110833e93, mte error info: 0x39ff113f6c, ifu error info: 0x20003fffeda00, ccu error info: 0x7489f8de00000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0xf113f6c, fixp_error1 info: 0x39, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=1, stream_id=2, report_stream_id=2, task_id=35873, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=6 pid=6140) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=0 pid=5131) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
[rank3]:[W918 06:56:43.925074687 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5396] 2025-09-18-06:56:43.538.104 The error from device(chipId:3, dieId:0), serial number is 17, there is an exception of aivec error, core id is 10, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0xf111542fef, mte error info: 0x9dfd1e0b71, ifu error info: 0x20003fffeda00, ccu error info: 0x8dc1c3b00000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508000.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0xd1e0b71, fixp_error1 info: 0x9d, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=3, stream_id=2, report_stream_id=2, task_id=35872, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
ERROR 09-18 06:56:43 [dump_input.py:69] Dumping input data for V1 LLM engine (v0.10.0) with config: model='/data1/Qwen3-235B', speculative_config=None, tokenizer='/data1/Qwen3-235B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=ascend, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=Qwen3-32B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.unified_ascend_attention_with_output","vllm.unified_ascend_attention_with_output"],"use_inductor":false,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":32,"local_cache_dir":null},
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [dump_input.py:76] Dumping scheduler output for model execution: SchedulerOutput(scheduled_new_reqs=[NewRequestData(req_id=chatcmpl-69e8f16a1f244907bbefe13e5c3675fc,prompt_token_ids_len=7742,mm_inputs=[],mm_hashes=[],mm_positions=[],sampling_params=SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, seed=None, stop=[], stop_token_ids=[151643], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=16384, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None),block_ids=([2591, 2592, 2593, 2594, 2595, 2596, 2597, 2598, 2599, 2600, 2601, 2602, 2603, 2604, 2605, 2606, 2607, 2608, 2609, 2610, 2611, 2612, 2613, 2614, 2615, 2616, 2617, 2618, 2619, 2620, 2621, 2622, 2812, 1504, 1503, 1502, 1501, 1500, 1499, 1498, 1497, 1496, 1495, 1494, 1493, 1492, 1491, 1490, 1489, 1488, 1487, 1591, 1486, 1485, 1676, 1675, 1731, 1647, 1646, 1645, 1644],),num_computed_tokens=6912,lora_request=None)], scheduled_cached_reqs=CachedRequestData(req_ids=['chatcmpl-770f41c908324730bdd8328ddec06c6b'], resumed_from_preemption=[false], new_token_ids=[], new_block_ids=[[[]]], num_computed_tokens=[2775]), num_scheduled_tokens={chatcmpl-770f41c908324730bdd8328ddec06c6b: 1, chatcmpl-69e8f16a1f244907bbefe13e5c3675fc: 830}, total_num_scheduled_tokens=831, scheduled_spec_decode_tokens={}, scheduled_encoder_inputs={}, num_common_prefix_blocks=[0], finished_req_ids=[], free_encoder_input_ids=[], structured_output_request_ids={}, grammar_bitmask=null, kv_connector_metadata=null)
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=5 pid=5892) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=1 pid=5137) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=3 pid=5396) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
[rank2]:[W918 06:56:43.932961133 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5155] 2025-09-18-06:56:43.536.410 The error from device(chipId:2, dieId:0), serial number is 17, there is an exception of aivec error, core id is 38, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0x6216b0a104, mte error info: 0x1edf0882fd, ifu error info: 0x20003fffeda00, ccu error info: 0xfc8dd83000000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0xf0882fd, fixp_error1 info: 0x1e, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=2, stream_id=2, report_stream_id=2, task_id=35873, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
[rank4]:[W918 06:56:43.933040153 compiler_depend.ts:57] Warning: EZ9999: Inner Error!
EZ9999: [PID: 5644] 2025-09-18-06:56:43.537.239 The error from device(chipId:4, dieId:0), serial number is 17, there is an exception of aivec error, core id is 7, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0xfd17f650c0, mte error info: 0xfff06d3039, ifu error info: 0x20003fffeda00, ccu error info: 0x9bf9b09900000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
TraceBack (most recent call last):
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0x6d3039, fixp_error1 info: 0xff, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=4, stream_id=2, report_stream_id=2, task_id=35873, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
rtStreamSynchronize execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
synchronize stream failed, runtime result = 507035[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
(function copy_between_host_and_device_opapi)
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=4 pid=5644) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: ACL stream synchronize failed, error code:507035
(VllmWorker rank=2 pid=5155) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
[rank7]:[E918 06:56:43.938424081 compiler_depend.ts:429] operator():build/CMakeFiles/torch_npu.dir/compiler_depend.ts:32 NPU function error: call aclnnInplaceUniform failed, error code is 507035
[ERROR] 2025-09-18-06:56:43 (PID:6388, Device:7, RankID:-1) ERR00100 PTA call acl api failed
[Error]: The vector core execution is abnormal.
Rectify the fault based on the error information in the ascend log.
EZ9903: [PID: 6388] 2025-09-18-06:56:43.611.115 rtGeneralCtrl failed, runtime error code: 507035
Solution: In this scenario, collect the plog when the fault occurs and locate the fault based on the plog.
TraceBack (most recent call last):
The error from device(chipId:7, dieId:0), serial number is 17, there is an exception of aivec error, core id is 22, error code = 0, dump info: pc start: 0x12c20189b000, current: 0x12c20189b95c, vec error info: 0x67041e5e04, mte error info: 0xdacbc2fbf0, ifu error info: 0x20003fffeda00, ccu error info: 0x18a918a000000000, cube error info: 0, biu error info: 0, aic error mask: 0x6500020bd00028c, para base: 0x12c240508400.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:303]
The extend info: errcode:(0, 0x8000, 0) errorStr: When the D-cache reads and writes data to the UB, the response value returned by the bus is a non-zero value. fixp_error0 info: 0xbc2fbf0, fixp_error1 info: 0xda, fsmId:1, tslot:0, thread:0, ctxid:0, blk:0, sublk:0, subErrType:4.[FUNC:ProcessStarsCoreErrorInfo][FILE:device_error_core_proc.cc][LINE:322]
Kernel task happen error, retCode=0x31, [vector core exception].[FUNC:PreCheckTaskErr][FILE:davinci_kernel_task.cc][LINE:1539]
AIV Kernel happen error, retCode=0x31.[FUNC:GetError][FILE:stream.cc][LINE:1183]
[AIC_INFO] after execute:args print end[FUNC:GetError][FILE:stream.cc][LINE:1183]
Aicore kernel execute failed, device_id=7, stream_id=2, report_stream_id=2, task_id=35873, flip_num=100, fault kernel_name=ApplyTopKTopPWithSorted_fp32_high_performance_0, fault kernel info ext=none, program id=422, hash=15861197690901397285.[FUNC:GetError][FILE:stream.cc][LINE:1183]
task submit failed, streamId=2, taskId=0, sqeType=15, retCode=0x715005e[FUNC:StarsLaunch][FILE:context.cc][LINE:3727]
rtStarsTaskLaunchWithFlag execute failed, reason=[vector core exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
rtGeneralCtrl failed, runtime error code: 507035
Distribute failed
DSA task distribute error.
launch failed for DSARandomUniform, errno:561000.

Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:32 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xffff9c0e3ea4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe4 (0xffff9c083e44 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: + 0x1f24d08 (0xffff8e504d08 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: + 0x22887d4 (0xffff8e8687d4 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: + 0x8fb170 (0xffff8cedb170 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8fd504 (0xffff8cedd504 in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: + 0x8f9e2c (0xffff8ced9e2c in /usr/local/python3.11.13/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: + 0xd31fc (0xffff9bef31fc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #8: + 0x7d5b8 (0xffffa81bd5b8 in /lib/aarch64-linux-gnu/libc.so.6)
frame #9: + 0xe5edc (0xffffa8225edc in /lib/aarch64-linux-gnu/libc.so.6)

ERROR 09-18 06:56:43 [dump_input.py:79] Dumping scheduler stats: SchedulerStats(num_running_reqs=2, num_waiting_reqs=0, kv_cache_usage=0.02131979695431474, prefix_cache_stats=PrefixCacheStats(reset=False, requests=1, queries=7742, hits=6912), spec_decoding_stats=None, num_corrupted_reqs=0)
ERROR 09-18 06:56:43 [core.py:634] EngineCore encountered a fatal error.
ERROR 09-18 06:56:43 [core.py:634] Traceback (most recent call last):
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 625, in run_engine_core
ERROR 09-18 06:56:43 [core.py:634] engine_core.run_busy_loop()
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 652, in run_busy_loop
ERROR 09-18 06:56:43 [core.py:634] self._process_engine_step()
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 677, in _process_engine_step
ERROR 09-18 06:56:43 [core.py:634] outputs, model_executed = self.step_fn()
ERROR 09-18 06:56:43 [core.py:634] ^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 267, in step
ERROR 09-18 06:56:43 [core.py:634] model_output = self.execute_model_with_error_logging(
ERROR 09-18 06:56:43 [core.py:634] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 253, in execute_model_with_error_logging
ERROR 09-18 06:56:43 [core.py:634] raise err
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 244, in execute_model_with_error_logging
ERROR 09-18 06:56:43 [core.py:634] return model_fn(scheduler_output)
ERROR 09-18 06:56:43 [core.py:634] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 167, in execute_model
ERROR 09-18 06:56:43 [core.py:634] (output, ) = self.collective_rpc(
ERROR 09-18 06:56:43 [core.py:634] ^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 237, in collective_rpc
ERROR 09-18 06:56:43 [core.py:634] result = get_response(w, dequeue_timeout)
ERROR 09-18 06:56:43 [core.py:634] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [core.py:634] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 224, in get_response
ERROR 09-18 06:56:43 [core.py:634] raise RuntimeError(
ERROR 09-18 06:56:43 [core.py:634] RuntimeError: Worker failed with error 'ACL stream synchronize failed, error code:507035', please check the stack trace above for the root cause
ERROR 09-18 06:56:43 [async_llm.py:416] AsyncLLM output_handler failed.
ERROR 09-18 06:56:43 [async_llm.py:416] Traceback (most recent call last):
ERROR 09-18 06:56:43 [async_llm.py:416] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 375, in output_handler
ERROR 09-18 06:56:43 [async_llm.py:416] outputs = await engine_core.get_output_async()
ERROR 09-18 06:56:43 [async_llm.py:416] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [async_llm.py:416] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 751, in get_output_async
ERROR 09-18 06:56:43 [async_llm.py:416] raise self._format_exception(outputs) from None
ERROR 09-18 06:56:43 [async_llm.py:416] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 09-18 06:56:43 [async_llm.py:342] Request chatcmpl-770f41c908324730bdd8328ddec06c6b failed (engine dead).
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
INFO 09-18 06:56:43 [async_llm.py:342] Request chatcmpl-69e8f16a1f244907bbefe13e5c3675fc failed (engine dead).
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnInplaceUniform.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] [ERROR] 2025-09-18-06:56:43 (PID:6388, Device:7, RankID:-1) ERR00100 PTA call acl api failed.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 201, in execute_model
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] output = self.model_runner.execute_model(scheduler_output,
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1725, in execute_model
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] valid_sampled_token_ids = sampled_token_ids.tolist()
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnInplaceUniform.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546] [ERROR] 2025-09-18-06:56:43 (PID:6388, Device:7, RankID:-1) ERR00100 PTA call acl api failed.
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
(VllmWorker rank=7 pid=6388) ERROR 09-18 06:56:43 [multiproc_executor.py:546]
ERROR 09-18 06:56:43 [serving_chat.py:932] Error in chat completion stream generator.
ERROR 09-18 06:56:43 [serving_chat.py:932] Traceback (most recent call last):
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 497, in chat_completion_stream_generator
ERROR 09-18 06:56:43 [serving_chat.py:932] async for res in result_generator:
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 323, in generate
ERROR 09-18 06:56:43 [serving_chat.py:932] out = q.get_nowait() or await q.get()
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 57, in get
ERROR 09-18 06:56:43 [serving_chat.py:932] raise output
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/entrypoints/utils.py", line 110, in wrapper
ERROR 09-18 06:56:43 [serving_chat.py:932] return await func(*args, **kwargs)
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 644, in create_chat_completion
ERROR 09-18 06:56:43 [serving_chat.py:932] generator = await handler.create_chat_completion(request, raw_request)
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 282, in create_chat_completion
ERROR 09-18 06:56:43 [serving_chat.py:932] return await self.chat_completion_full_generator(
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 953, in chat_completion_full_generator
ERROR 09-18 06:56:43 [serving_chat.py:932] async for res in result_generator:
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 323, in generate
ERROR 09-18 06:56:43 [serving_chat.py:932] out = q.get_nowait() or await q.get()
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 57, in get
ERROR 09-18 06:56:43 [serving_chat.py:932] raise output
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 375, in output_handler
ERROR 09-18 06:56:43 [serving_chat.py:932] outputs = await engine_core.get_output_async()
ERROR 09-18 06:56:43 [serving_chat.py:932] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 09-18 06:56:43 [serving_chat.py:932] File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 751, in get_output_async
ERROR 09-18 06:56:43 [serving_chat.py:932] raise self._format_exception(outputs) from None
ERROR 09-18 06:56:43 [serving_chat.py:932] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions