-
Notifications
You must be signed in to change notification settings - Fork 561
Description
Your current environment
environment in 910B:
vllm 0.10.0+empty pypi_0 pypi
vllm-ascend 0.10.0rc1 pypi_0 pypi
🐛 Describe the bug
start:
export VLLM_USE_V1=1
vllm serve /mnt/data/models/Qwen3-32B-W8A8 --port 8000 -tp 2 --no-enable-prefix-caching --quantization ascend --max-num-batched-tokens 8192
error:
Loading safetensors checkpoint shards: 0% Completed | 0/11 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 9% Completed | 1/11 [00:02<00:21, 2.11s/it]
Loading safetensors checkpoint shards: 18% Completed | 2/11 [00:04<00:19, 2.20s/it]
Loading safetensors checkpoint shards: 27% Completed | 3/11 [00:07<00:19, 2.47s/it]
Loading safetensors checkpoint shards: 36% Completed | 4/11 [00:09<00:16, 2.38s/it]
Loading safetensors checkpoint shards: 45% Completed | 5/11 [00:11<00:14, 2.34s/it]
Loading safetensors checkpoint shards: 55% Completed | 6/11 [00:13<00:11, 2.24s/it]
Loading safetensors checkpoint shards: 64% Completed | 7/11 [00:15<00:08, 2.08s/it]
Loading safetensors checkpoint shards: 73% Completed | 8/11 [00:17<00:06, 2.05s/it]
Loading safetensors checkpoint shards: 82% Completed | 9/11 [00:18<00:03, 1.88s/it]
Loading safetensors checkpoint shards: 91% Completed | 10/11 [00:20<00:01, 1.85s/it]
Loading safetensors checkpoint shards: 100% Completed | 11/11 [00:22<00:00, 1.90s/it]
Loading safetensors checkpoint shards: 100% Completed | 11/11 [00:22<00:00, 2.07s/it]
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151) INFO 08-13 09:32:58 [default_loader.py:262] Loading weights took 22.91 seconds
(VllmWorker rank=1 pid=3762406) INFO 08-13 09:32:58 [default_loader.py:262] Loading weights took 22.94 seconds
(VllmWorker rank=0 pid=3762151) INFO 08-13 09:32:59 [model_runner_v1.py:2128] Loading model weights took 20.1257 GB
(VllmWorker rank=1 pid=3762406) INFO 08-13 09:33:00 [model_runner_v1.py:2128] Loading model weights took 20.1257 GB
(VllmWorker rank=1 pid=3762406) INFO 08-13 09:33:29 [backends.py:530] Using cache directory: /home/hyc/.cache/vllm/torch_compile_cache/f8e29283c6/rank_1_0/backbone for vLLM's torch.compile
(VllmWorker rank=1 pid=3762406) INFO 08-13 09:33:29 [backends.py:541] Dynamo bytecode transform time: 21.16 s
(VllmWorker rank=0 pid=3762151) INFO 08-13 09:33:30 [backends.py:530] Using cache directory: /home/hyc/.cache/vllm/torch_compile_cache/f8e29283c6/rank_0_0/backbone for vLLM's torch.compile
(VllmWorker rank=0 pid=3762151) INFO 08-13 09:33:30 [backends.py:541] Dynamo bytecode transform time: 22.15 s
(VllmWorker rank=1 pid=3762406) INFO 08-13 09:33:36 [backends.py:215] Compiling a graph for dynamic shape takes 5.03 s
(VllmWorker rank=0 pid=3762151) INFO 08-13 09:33:37 [backends.py:215] Compiling a graph for dynamic shape takes 4.96 s
[rank0]:[E813 09:33:47.781492949 compiler_depend.ts:429] operator():build/CMakeFiles/torch_npu.dir/compiler_depend.ts:3785 NPU function error: call aclnnAddRmsNormQuant failed, error code is 561000
[ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
EZ9999: Inner Error!
EZ9999: [PID: 3762151] 2025-08-13-09:33:47.796.176 Cannot find bin of op AddRmsNormQuant, integral key 0/1/|float16/ND/float16/ND/float16/ND/float16/ND/int8/ND/int8/ND/float16/ND/.
TraceBack (most recent call last):
Cannot find binary for op AddRmsNormQuant.
Kernel Run failed. opType: 29, AddRmsNormQuant
launch failed for AddRmsNormQuant, errno:561000.
Exception raised from operator() at build/CMakeFiles/torch_npu.dir/compiler_depend.ts:3785 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0xd4 (0xffff9a263ea4 in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xe4 (0xffff9a203e44 in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/lib/libc10.so)
frame #2: + 0x1b737dc (0xffff8beb37dc in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #3: + 0x22887d4 (0xffff8c5c87d4 in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #4: + 0x8fb170 (0xffff8ac3b170 in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #5: + 0x8fd504 (0xffff8ac3d504 in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #6: + 0x8f9e2c (0xffff8ac39e2c in /home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch_npu/lib/libtorch_npu.so)
frame #7: + 0xd31fc (0xffff9a0731fc in /lib/aarch64-linux-gnu/libstdc++.so.6)
frame #8: + 0x7d5b8 (0xffffa66cd5b8 in /lib/aarch64-linux-gnu/libc.so.6)
frame #9: + 0xe5edc (0xffffa6735edc in /lib/aarch64-linux-gnu/libc.so.6)
(VllmWorker rank=0 pid=3762151) Traceback (most recent call last):
(VllmWorker rank=0 pid=3762151) File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 393, in call
(VllmWorker rank=0 pid=3762151) return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=0 pid=3762151) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) File "<eval_with_key>.3", line 22, in forward
(VllmWorker rank=0 pid=3762151) linear = torch.C.nn.linear(mul, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight, None); mul = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight = None
(VllmWorker rank=0 pid=3762151) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnAddRmsNormQuant.
(VllmWorker rank=0 pid=3762151) Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
(VllmWorker rank=0 pid=3762151) Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
(VllmWorker rank=0 pid=3762151) [ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151) Call using an FX-traced Module, line 22 of the traced Module's generated forward function:
(VllmWorker rank=0 pid=3762151) mul = silu * getitem_4; silu = getitem_4 = None
(VllmWorker rank=0 pid=3762151) linear = torch.C.nn.linear(mul, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight, None); mul = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight = None
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
(VllmWorker rank=0 pid=3762151) all_reduce_1 = torch.ops.c10d_functional.all_reduce(linear, 'sum', '3')
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151) wait_tensor_1 = torch.ops.c10d_functional.wait_tensor(all_reduce_1); all_reduce_1 = None
(VllmWorker rank=0 pid=3762151)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] WorkerProc hit an exception.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 157, in determine_available_memory
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] self.model_runner.profile_run()
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2023, in profile_run
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.dummy_run(self.max_num_tokens,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1997, in dummy_run
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.generate_dummy_run_hidden_states(
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1905, in generate_dummy_run_hidden_states
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.model(input_ids=input_ids,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/models/qwen3.py", line 136, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/compilation/decorators.py", line 272, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] output = self.compiled_callable(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/dynamo/eval_frame.py", line 655, in fn
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/model_executor/models/qwen2.py", line 336, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] def forward(
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/dynamo/eval_frame.py", line 838, in fn
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 830, in call_wrapped
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 406, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] raise e
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 393, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "<eval_with_key>.130", line 1297, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] submod_2 = self.submod_2(getitem_3, s0, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_scale_reciprocal, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_offset, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_deq_scale, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_quant_bias, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_aclnn_input_scale, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_aclnn_input_offset, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_weight, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_deq_scale, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_quant_bias, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_aclnn_input_scale, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_aclnn_input_offset, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_deq_scale, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_quant_bias, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight, l_positions, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_scale_reciprocal = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_offset = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_deq_scale = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_quant_bias = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_aclnn_input_scale = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_aclnn_input_offset = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_weight = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_deq_scale_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_quant_bias_ = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_aclnn_input_scale_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_aclnn_input_offset = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_deq_scale_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_quant_bias_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 123, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.compiled_graph_for_general_shape(*args)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 830, in call_wrapped
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 404, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] raise e.with_traceback(None) # noqa: B904
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnAddRmsNormQuant.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] [ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546]
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Traceback (most recent call last):
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 541, in worker_busy_loop
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] output = func(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 157, in determine_available_memory
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] self.model_runner.profile_run()
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2023, in profile_run
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.dummy_run(self.max_num_tokens,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/utils/contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return func(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1997, in dummy_run
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.generate_dummy_run_hidden_states(
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1905, in generate_dummy_run_hidden_states
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.model(input_ids=input_ids,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/models/qwen3.py", line 136, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/compilation/decorators.py", line 272, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] output = self.compiled_callable(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/dynamo/eval_frame.py", line 655, in fn
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm_813/vllm/model_executor/models/qwen2.py", line 336, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] def forward(
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/dynamo/eval_frame.py", line 838, in fn
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 830, in call_wrapped
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 406, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] raise e
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 393, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc]
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in wrapped_call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in call_impl
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "<eval_with_key>.130", line 1297, in forward
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] submod_2 = self.submod_2(getitem_3, s0, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_scale_reciprocal, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_offset, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_deq_scale, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_quant_bias, getitem_4, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_aclnn_input_scale, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_aclnn_input_offset, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_weight, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_deq_scale, l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_quant_bias, l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_aclnn_input_scale, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_aclnn_input_offset, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_weight, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_deq_scale, l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_quant_bias, l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight, l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight, l_positions, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_scale_reciprocal = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_aclnn_input_offset = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_deq_scale = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_quant_bias_ = getitem_4 = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_aclnn_input_scale_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_aclnn_input_offset = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_weight_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_deq_scale_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_modules_layer_parameters_quant_bias_ = l_self_modules_layers_modules_0_modules_mlp_modules_down_proj_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_aclnn_input_scale_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_aclnn_input_offset = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_deq_scale_ = l_self_modules_layers_modules_1_modules_input_layernorm_modules_layer_parameters_quant_bias_ = l_self_modules_layers_modules_1_modules_self_attn_modules_q_norm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_k_norm_parameters_weight_ = None
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/workspace3/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 123, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self.compiled_graph_for_general_shape(*args)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 830, in call_wrapped
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/torch/fx/graph_module.py", line 404, in call
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] raise e.with_traceback(None) # noqa: B904
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] RuntimeError: The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnAddRmsNormQuant.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546] [ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546]
(VllmWorker rank=0 pid=3762151) ERROR 08-13 09:33:47 [multiproc_executor.py:546]
ERROR 08-13 09:33:47 [core.py:632] EngineCore failed to start.
ERROR 08-13 09:33:47 [core.py:632] Traceback (most recent call last):
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 623, in run_engine_core
ERROR 08-13 09:33:47 [core.py:632] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 08-13 09:33:47 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 441, in init
ERROR 08-13 09:33:47 [core.py:632] super().init(vllm_config, executor_class, log_stats,
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 86, in init
ERROR 08-13 09:33:47 [core.py:632] self._initialize_kv_caches(vllm_config)
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 158, in _initialize_kv_caches
ERROR 08-13 09:33:47 [core.py:632] self.model_executor.determine_available_memory())
ERROR 08-13 09:33:47 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
ERROR 08-13 09:33:47 [core.py:632] output = self.collective_rpc("determine_available_memory")
ERROR 08-13 09:33:47 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 237, in collective_rpc
ERROR 08-13 09:33:47 [core.py:632] result = get_response(w, dequeue_timeout)
ERROR 08-13 09:33:47 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-13 09:33:47 [core.py:632] File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 224, in get_response
ERROR 08-13 09:33:47 [core.py:632] raise RuntimeError(
ERROR 08-13 09:33:47 [core.py:632] RuntimeError: Worker failed with error 'The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnAddRmsNormQuant.
ERROR 08-13 09:33:47 [core.py:632] Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
ERROR 08-13 09:33:47 [core.py:632] Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
ERROR 08-13 09:33:47 [core.py:632] [ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
ERROR 08-13 09:33:47 [core.py:632] ', please check the stack trace above for the root cause
ERROR 08-13 09:33:58 [multiproc_executor.py:140] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
Process EngineCore_0:
Traceback (most recent call last):
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 636, in run_engine_core
raise e
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 623, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 441, in init
super().init(vllm_config, executor_class, log_stats,
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 86, in init
self._initialize_kv_caches(vllm_config)
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core.py", line 158, in _initialize_kv_caches
self.model_executor.determine_available_memory())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
output = self.collective_rpc("determine_available_memory")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 237, in collective_rpc
result = get_response(w, dequeue_timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/executor/multiproc_executor.py", line 224, in get_response
raise RuntimeError(
RuntimeError: Worker failed with error 'The Inner error is reported as above. The process exits for this inner error, and the current working operator name is aclnnAddRmsNormQuant.
Since the operator is called asynchronously, the stacktrace may be inaccurate. If you want to get the accurate stacktrace, please set the environment variable ASCEND_LAUNCH_BLOCKING=1.
Note: ASCEND_LAUNCH_BLOCKING=1 will force ops to run in synchronous mode, resulting in performance degradation. Please unset ASCEND_LAUNCH_BLOCKING in time after debugging.
[ERROR] 2025-08-13-09:33:47 (PID:3762151, Device:0, RankID:-1) ERR00100 PTA call acl api failed.
', please check the stack trace above for the root cause
Traceback (most recent call last):
File "/home/hyc/miniconda3/envs/vllm/bin/vllm", line 7, in
sys.exit(main())
^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/cli/main.py", line 54, in main
args.dispatch_function(args)
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/cli/serve.py", line 52, in cmd
uvloop.run(run_server(args))
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/uvloop/init.py", line 105, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/openai/api_server.py", line 1791, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
return cls(
^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/async_llm.py", line 117, in init
self.engine_core = EngineCoreClient.make_async_mp_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
return AsyncMPClient(*client_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core_client.py", line 677, in init
super().init(
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/core_client.py", line 408, in init
with launch_core_engines(vllm_config, executor_class,
File "/home/hyc/miniconda3/envs/vllm/lib/python3.11/contextlib.py", line 144, in exit
next(self.gen)
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/utils.py", line 697, in launch_core_engines
wait_for_engine_startup(
File "/home/hyc/workspace3/vllm_813/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[ERROR] 2025-08-13-09:34:05 (PID:3760997, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception