[Bug]: Failed to run offline Pangu Pro MOE in tutorials

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>
PyTorch version: 2.7.1+cpu
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.1.0
Libc version: glibc-2.35

Python version: 3.11.13 (main, Jul 26 2025, 09:30:19) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-182.0.0.95.r1941_123.hce2.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                         aarch64
CPU op-mode(s):                       64-bit
Byte Order:                           Little Endian
CPU(s):                               320
On-line CPU(s) list:                  0-319
Vendor ID:                            HiSilicon
Model:                                0
Thread(s) per core:                   1
Core(s) per cluster:                  80
Socket(s):                            -
Cluster(s):                           4
Stepping:                             0x0
Frequency boost:                      disabled
CPU max MHz:                          3000.0000
CPU min MHz:                          400.0000
BogoMIPS:                             200.00
Flags:                                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint svei8mm svef32mm svef64mm svebf16 i8mm bf16 dgh rng ecv
L1d cache:                            20 MiB (320 instances)
L1i cache:                            20 MiB (320 instances)
L2 cache:                             400 MiB (320 instances)
L3 cache:                             560 MiB (8 instances)
NUMA node(s):                         8
NUMA node0 CPU(s):                    0-39
NUMA node1 CPU(s):                    40-79
NUMA node2 CPU(s):                    80-119
NUMA node3 CPU(s):                    120-159
NUMA node4 CPU(s):                    160-199
NUMA node5 CPU(s):                    200-239
NUMA node6 CPU(s):                    240-279
NUMA node7 CPU(s):                    280-319
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; __user pointer sanitization
Vulnerability Spectre v2:             Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1.dev20250724
[pip3] torchvision==0.22.1
[pip3] transformers==4.56.2
[conda] Could not collect
vLLM Version: 0.11.0rc3
vLLM Ascend Version: 0.11.0rc0

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.rc3.7               Version: 24.1.rc3.7                                           |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip  Phy-ID              | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 6     Ascend910           | OK            | 192.1       37                0    / 0             |
| 0     12                  | 0000:85:00.0  | 0           0    / 0          3410 / 65536         |
+------------------------------------------------------------------------------------------------+
| 6     Ascend910           | OK            | -           36                0    / 0             |
| 1     13                  | 0000:87:00.0  | 0           0    / 0          3203 / 65536         |
+===========================+===============+====================================================+
| 7     Ascend910           | OK            | 187.1       37                0    / 0             |
| 0     14                  | 0000:81:00.0  | 0           0    / 0          3411 / 65536         |
+------------------------------------------------------------------------------------------------+
| 7     Ascend910           | OK            | -           36                0    / 0             |
| 1     15                  | 0000:83:00.0  | 0           0    / 0          3204 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.2.RC1
innerversion=V100R001C22SPC001B231
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21],[V100R001C23]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.2.RC1/aarch64-linux

```text
Your output of above commands here
```
</details>

root@d362325f1846:/workspace# python test1-0.py 
INFO 10-20 11:38:13 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 10-20 11:38:13 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 10-20 11:38:13 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 10-20 11:38:13 [__init__.py:207] Platform plugin ascend is activated
WARNING 10-20 11:38:15 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
INFO 10-20 11:38:16 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 10-20 11:38:16 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
INFO 10-20 11:38:16 [utils.py:233] non-default args: {'trust_remote_code': True, 'max_model_len': 1024, 'distributed_executor_backend': 'mp', 'tensor_parallel_size': 4, 'enable_expert_parallel': True, 'disable_log_stats': True, 'additional_config': {'torchair_graph_config': {'enabled': True}, 'ascend_scheduler_config': {'enabled': True, 'enable_chunked_prefill': False, 'chunked_prefill_enabled': False}}, 'model': '/root/.cache/modelscope/hub/models/IntervitensInc/pangu-pro-moe-model'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
INFO 10-20 11:38:24 [model.py:547] Resolved architecture: PanguProMoEForCausalLM
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 10-20 11:38:24 [model.py:1510] Using max model len 1024
INFO 10-20 11:38:24 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 10-20 11:38:24 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal.
INFO 10-20 11:38:24 [platform.py:194] Torchair compilation enabled on NPU. Setting CUDAGraphMode to NONE
WARNING 10-20 11:38:25 [tokenizer.py:253] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
INFO 10-20 11:38:30 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 10-20 11:38:30 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 10-20 11:38:30 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 10-20 11:38:30 [__init__.py:207] Platform plugin ascend is activated
WARNING 10-20 11:38:32 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
(EngineCore_DP0 pid=7162) INFO 10-20 11:38:32 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=7162) INFO 10-20 11:38:32 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(EngineCore_DP0 pid=7162) INFO 10-20 11:38:32 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model='/root/.cache/modelscope/hub/models/IntervitensInc/pangu-pro-moe-model', speculative_config=None, tokenizer='/root/.cache/modelscope/hub/models/IntervitensInc/pangu-pro-moe-model', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1024, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/root/.cache/modelscope/hub/models/IntervitensInc/pangu-pro-moe-model, enable_prefix_caching=True, chunked_prefill_enabled=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":["all"],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=7162) WARNING 10-20 11:38:32 [multiproc_executor.py:720] Reducing Torch parallelism from 320 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore_DP0 pid=7162) INFO 10-20 11:38:32 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3], buffer_handle=(4, 16777216, 10, 'psm_2049e03e'), local_subscribe_addr='ipc:///tmp/a9b0d0b7-ae17-46ea-9879-dd653600ba9d', remote_subscribe_addr=None, remote_addr_ipv6=False)

(Worker_TP0_EP0 pid=7301) 
(Worker_TP0_EP0 pid=7301) INFO 10-20 11:39:03 [default_loader.py:267] Loading weights took 7.00 seconds
(Worker_TP1_EP1 pid=7302) INFO 10-20 11:39:03 [default_loader.py:267] Loading weights took 6.99 seconds
(Worker_TP0_EP0 pid=7301) INFO 10-20 11:39:04 [model_runner_v1.py:2661] Loading model weights took 33.7981 GB
(Worker_TP3_EP3 pid=7304) INFO 10-20 11:39:04 [default_loader.py:267] Loading weights took 7.89 seconds
(Worker_TP1_EP1 pid=7302) INFO 10-20 11:39:04 [model_runner_v1.py:2661] Loading model weights took 33.7981 GB
(Worker_TP3_EP3 pid=7304) INFO 10-20 11:39:04 [model_runner_v1.py:2661] Loading model weights took 33.7981 GB
(Worker_TP2_EP2 pid=7303) INFO 10-20 11:39:05 [default_loader.py:267] Loading weights took 8.43 seconds
(Worker_TP2_EP2 pid=7303) INFO 10-20 11:39:05 [model_runner_v1.py:2661] Loading model weights took 33.7981 GB
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] WorkerProc hit an exception.
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/torchair_worker.py", line 34, in determine_available_memory
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     available_kv_cache_memory = super().determine_available_memory()
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 205, in determine_available_memory
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2509, in profile_run
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self._dummy_run(self.max_num_tokens,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2475, in _dummy_run
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self._generate_dummy_run_hidden_states(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/torchair_model_runner.py", line 152, in _generate_dummy_run_hidden_states
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = super()._generate_dummy_run_hidden_states(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2320, in _generate_dummy_run_hidden_states
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.model(input_ids=input_ids,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 931, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.model(input_ids, positions, kv_caches,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 225, in __call__
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self.forward(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 866, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states, residual = layer(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                               ^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 730, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.self_attn(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 626, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     q, k = self.rotary_emb(positions, q, k)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/custom_op.py", line 44, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._forward_method(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/ops/torchair_rotary_embedding.py", line 327, in rope_forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return rope_forward_oot(self, positions, query, key, offsets,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/ops/torchair_rotary_embedding.py", line 47, in rope_forward_oot
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self.forward_native(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] TypeError: RotaryEmbedding.forward_native() takes from 3 to 4 positional arguments but 5 were given
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] Traceback (most recent call last):
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 666, in worker_busy_loop
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/torchair_worker.py", line 34, in determine_available_memory
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     available_kv_cache_memory = super().determine_available_memory()
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 205, in determine_available_memory
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2509, in profile_run
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self._dummy_run(self.max_num_tokens,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2475, in _dummy_run
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self._generate_dummy_run_hidden_states(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/torchair_model_runner.py", line 152, in _generate_dummy_run_hidden_states
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = super()._generate_dummy_run_hidden_states(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 2320, in _generate_dummy_run_hidden_states
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.model(input_ids=input_ids,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 931, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.model(input_ids, positions, kv_caches,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 225, in __call__
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self.forward(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 866, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states, residual = layer(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                               ^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 730, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     hidden_states = self.self_attn(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]                     ^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/models/torchair_pangu_moe.py", line 626, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     q, k = self.rotary_emb(positions, q, k)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm/vllm/model_executor/custom_op.py", line 44, in forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self._forward_method(*args, **kwargs)
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/ops/torchair_rotary_embedding.py", line 327, in rope_forward
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return rope_forward_oot(self, positions, query, key, offsets,
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]   File "/vllm-workspace/vllm-ascend/vllm_ascend/torchair/ops/torchair_rotary_embedding.py", line 47, in rope_forward_oot
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]     return self.forward_native(
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671]            ^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] TypeError: RotaryEmbedding.forward_native() takes from 3 to 4 positional arguments but 5 were given
(Worker_TP0_EP0 pid=7301) ERROR 10-20 11:39:07 [multiproc_executor.py:671] 
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708]     raise RuntimeError(
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:07 [core.py:708] RuntimeError: Worker failed with error 'RotaryEmbedding.forward_native() takes from 3 to 4 positional arguments but 5 were given', please check the stack trace above for the root cause
(EngineCore_DP0 pid=7162) ERROR 10-20 11:39:19 [multiproc_executor.py:154] Worker proc VllmWorker-0 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=7162) Process EngineCore_DP0:
(EngineCore_DP0 pid=7162) Traceback (most recent call last):
(EngineCore_DP0 pid=7162)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=7162)     self.run()
(EngineCore_DP0 pid=7162)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=7162)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=7162)     raise e
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=7162)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=7162)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=7162)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=7162)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 190, in _initialize_kv_caches
(EngineCore_DP0 pid=7162)     self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=7162)     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 85, in determine_available_memory
(EngineCore_DP0 pid=7162)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=7162)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 264, in collective_rpc
(EngineCore_DP0 pid=7162)     result = get_response(w, dequeue_timeout,
(EngineCore_DP0 pid=7162)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=7162)   File "/vllm-workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 248, in get_response
(EngineCore_DP0 pid=7162)     raise RuntimeError(
(EngineCore_DP0 pid=7162) RuntimeError: Worker failed with error 'RotaryEmbedding.forward_native() takes from 3 to 4 positional arguments but 5 were given', please check the stack trace above for the root cause
Traceback (most recent call last):
  File "/workspace/test1-0.py", line 36, in <module>
    llm = LLM(model="/root/.cache/modelscope/hub/models/IntervitensInc/pangu-pro-moe-model",
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vllm-workspace/vllm/vllm/entrypoints/llm.py", line 297, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
    return cls(vllm_config=vllm_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py", line 114, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 80, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 602, in __init__
    super().__init__(
  File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__
    with launch_core_engines(vllm_config, executor_class,
  File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines
    wait_for_engine_startup(
  File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[ERROR] 2025-10-20-11:39:21 (PID:6892, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception


### 🐛 Describe the bug

When I run Pangu Pro MOE(Multi-NPU) in the tutorials, online inference is OK, but offline inference failed with the ERROR info as above. I think the num of args may be changed. Related PR: https://github.com/vllm-project/vllm/pull/24789

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Failed to run offline Pangu Pro MOE in tutorials #3563

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Failed to run offline Pangu Pro MOE in tutorials #3563

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions