Skip to content

Conversation

@yiz-liu
Copy link
Collaborator

@yiz-liu yiz-liu commented Jun 17, 2025

What this PR does / why we need it?

Refactor the token-wise padding mechanism to a more elegant implementation, correcting the padding logic errors introduced by the previous multimodal commit #736 .

This is a clean version of #1259 .

Does this PR introduce any user-facing change?

How was this patch tested?

This reverts commit 73979f5.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
…ation, correcting the padding logic errors introduced by the previous multimodal commit.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Jun 17, 2025

@wangxiyuan @ganyi1996ppo @ChenTaoyu-SJTU
Please review this pull request and let me know if I have overlooked anything. The logic should now be more closely aligned with that of gpu_model_runner.py.

@wangxiyuan
Copy link
Collaborator

Thanks for the fix. Please create a fix to main as well.

@ChenTaoyu-SJTU
Copy link
Collaborator

ChenTaoyu-SJTU commented Jun 17, 2025

@yiz-liu Hello, After I pull your PR into my local machine, my pull command is below:

(vllm_dev) [root@ascend05 vllm-ascend]# git fetch upstream pull/1261/head:pr-1261
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 11 (delta 6), reused 10 (delta 6), pack-reused 0 (from 0)
Unpacking objects: 100% (11/11), 4.20 KiB | 717.00 KiB/s, done.
From https://github.com/vllm-project/vllm-ascend
 * [new ref]         refs/pull/1261/head -> pr-1261
(vllm_dev) [root@ascend05 vllm-ascend]# git checkout -b pr-1261-fix-padding pr-1261
Switched to a new branch 'pr-1261-fix-padding'
(vllm_dev) [root@ascend05 vllm-ascend]# git branch
  add_qwen2.5_vl_multimodal
  fix_name_redefine_error
  main
  pr-1259
  pr-1259-fix-padding-mechanism
  pr-1261
* pr-1261-fix-padding
  pr_950
  simple_pd_v1
  try_pass_accuracy_test_for_qwen2.5vl_in_vllm_ascend_v1

And I using the aclgraph (enforce_eager=False) to run "Qwen/Qwen2.5-3B-Instruct", but this will have a problem below:
I want to know whether you are same with me?

(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:42 [backends.py:472] Dynamo bytecode transform time: 8.16 s
(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:45 [backends.py:173] Compiling a graph for general shape takes 2.20 s
DEBUG 06-17 14:07:52 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:54 [monitor.py:34] torch.compile takes 10.37 s in total
INFO 06-17 14:07:55 [kv_cache_utils.py:715] GPU KV cache size: 1,388,032 tokens
INFO 06-17 14:07:55 [kv_cache_utils.py:719] Maximum concurrency for 32,768 tokens per request: 42.36x
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [config.py:4663] enabled custom ops: Counter({'rms_norm': 73, 'silu_and_mul': 36, 'rotary_embedding': 1})
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [config.py:4665] disabled custom ops: Counter()
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [piecewise_backend.py:159] Warming up 1/1 for shape 512
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [piecewise_backend.py:170] Capturing a aclgraph for shape 512
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] WorkerProc hit an exception.
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] Traceback (most recent call last):
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 518, in worker_busy_loop
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 206, in compile_or_warm_up_model
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self.model_runner.capture_model()
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1769, in capture_model
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self._dummy_run(num_tokens)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1493, in _dummy_run
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     hidden_states = model(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                     ^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 477, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 336, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     def forward(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     raise e
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "<eval_with_key>.74", line 262, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_bias_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_bias_ = None
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     with torch.npu.graph(aclgraph, pool=self.graph_pool):
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch_npu/npu/graphs.py", line 187, in __enter__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self.npu_graph.capture_begin(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch_npu/npu/graphs.py", line 96, in capture_begin
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] RuntimeError: Failed to find function aclmdlRICaptureBegin
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] [ERROR] 2025-06-17-14:07:56 (PID:3846447, Device:0, RankID:-1) ERR00008 PTA resource not found

@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Jun 18, 2025

@yiz-liu Hello, After I pull your PR into my local machine, my pull command is below:

(vllm_dev) [root@ascend05 vllm-ascend]# git fetch upstream pull/1261/head:pr-1261
remote: Enumerating objects: 11, done.
remote: Counting objects: 100% (11/11), done.
remote: Compressing objects: 100% (5/5), done.
remote: Total 11 (delta 6), reused 10 (delta 6), pack-reused 0 (from 0)
Unpacking objects: 100% (11/11), 4.20 KiB | 717.00 KiB/s, done.
From https://github.com/vllm-project/vllm-ascend
 * [new ref]         refs/pull/1261/head -> pr-1261
(vllm_dev) [root@ascend05 vllm-ascend]# git checkout -b pr-1261-fix-padding pr-1261
Switched to a new branch 'pr-1261-fix-padding'
(vllm_dev) [root@ascend05 vllm-ascend]# git branch
  add_qwen2.5_vl_multimodal
  fix_name_redefine_error
  main
  pr-1259
  pr-1259-fix-padding-mechanism
  pr-1261
* pr-1261-fix-padding
  pr_950
  simple_pd_v1
  try_pass_accuracy_test_for_qwen2.5vl_in_vllm_ascend_v1

And I using the aclgraph (enforce_eager=False) to run "Qwen/Qwen2.5-3B-Instruct", but this will have a problem below: I want to know whether you are same with me?

(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:42 [backends.py:472] Dynamo bytecode transform time: 8.16 s
(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:45 [backends.py:173] Compiling a graph for general shape takes 2.20 s
DEBUG 06-17 14:07:52 [utils.py:485] Waiting for 1 local, 0 remote core engine proc(s) to start.
(VllmWorker rank=0 pid=3846447) INFO 06-17 14:07:54 [monitor.py:34] torch.compile takes 10.37 s in total
INFO 06-17 14:07:55 [kv_cache_utils.py:715] GPU KV cache size: 1,388,032 tokens
INFO 06-17 14:07:55 [kv_cache_utils.py:719] Maximum concurrency for 32,768 tokens per request: 42.36x
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [config.py:4663] enabled custom ops: Counter({'rms_norm': 73, 'silu_and_mul': 36, 'rotary_embedding': 1})
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [config.py:4665] disabled custom ops: Counter()
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [piecewise_backend.py:159] Warming up 1/1 for shape 512
(VllmWorker rank=0 pid=3846447) DEBUG 06-17 14:07:55 [piecewise_backend.py:170] Capturing a aclgraph for shape 512
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] WorkerProc hit an exception.
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] Traceback (most recent call last):
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 518, in worker_busy_loop
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 206, in compile_or_warm_up_model
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self.model_runner.capture_model()
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1769, in capture_model
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self._dummy_run(num_tokens)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/worker/model_runner_v1.py", line 1493, in _dummy_run
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     hidden_states = model(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                     ^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 477, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     hidden_states = self.model(input_ids, positions, intermediate_tensors,
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 246, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     model_output = self.forward(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 336, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     def forward(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return fn(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 784, in call_wrapped
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._wrapped_call(self, *args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 361, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     raise e
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/fx/graph_module.py", line 348, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "<eval_with_key>.74", line 262, in forward
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     submod_0 = self.submod_0(l_input_ids_, s0, l_self_modules_embed_tokens_parameters_weight_, l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_bias_, l_positions_, s1, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_);  l_input_ids_ = l_self_modules_embed_tokens_parameters_weight_ = l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_bias_ = None
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/vllm_dev/vllm-ascend/vllm_ascend/compilation/piecewise_backend.py", line 192, in __call__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     with torch.npu.graph(aclgraph, pool=self.graph_pool):
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch_npu/npu/graphs.py", line 187, in __enter__
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     self.npu_graph.capture_begin(
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]   File "/root/uv/vllm_dev/lib/python3.11/site-packages/torch_npu/npu/graphs.py", line 96, in capture_begin
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523]     super().capture_begin(pool=pool, capture_error_mode=capture_error_mode)
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] RuntimeError: Failed to find function aclmdlRICaptureBegin
(VllmWorker rank=0 pid=3846447) ERROR 06-17 14:07:56 [multiproc_executor.py:523] [ERROR] 2025-06-17-14:07:56 (PID:3846447, Device:0, RankID:-1) ERR00008 PTA resource not found

@ChenTaoyu-SJTU I believe this error occurs because you’re not using the latest CANN and torch_npu releases. In earlier versions, the ACL Graph API isn’t available, as evidenced by the log message: “Failed to find function aclmdlRICaptureBegin.”

@ChenTaoyu-SJTU
Copy link
Collaborator

@yiz-liu Hello, can you recommend me a version of cann and torch? my current is 8.1.RC1.alpha001, and my current torch_npu is 2.5.1. Do you think 8.2.RC1.alpha002 is a good choice?

(vllm_dev) [root@ascend05 vllm-ascend]# cat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info
package_name=Ascend-cann-toolkit
version=8.1.RC1.alpha001
(vllm_dev) [root@ascend05 vllm-ascend]# pip list | grep torch
torch                                    2.5.1
torch-npu                                2.5.1
torchvision                              0.20.1

@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Jun 18, 2025

@yiz-liu Hello, can you recommend me a version of cann and torch? my current is 8.1.RC1.alpha001, and my current torch_npu is 2.5.1. Do you think 8.2.RC1.alpha002 is a good choice?

(vllm_dev) [root@ascend05 vllm-ascend]# cat /usr/local/Ascend/ascend-toolkit/latest/aarch64-linux/ascend_toolkit_install.info
package_name=Ascend-cann-toolkit
version=8.1.RC1.alpha001
(vllm_dev) [root@ascend05 vllm-ascend]# pip list | grep torch
torch                                    2.5.1
torch-npu                                2.5.1
torchvision                              0.20.1

Strange, RC1.alpha001 should be more than enough, but please migrate to 8.2.RC1.alpha002, build a clean environment and try again. For the record, I am using Beta which should be older than yours, but you can try this version anyway, maybe it's more stable.

@yiz-liu
Copy link
Collaborator Author

yiz-liu commented Jun 18, 2025

@ganyi1996ppo Please review and see if it affects DeepSeek, should be merged ASAP.

@ganyi1996ppo ganyi1996ppo merged commit fc8905a into vllm-project:v0.9.1-dev Jun 18, 2025
16 checks passed
@yiz-liu yiz-liu deleted the fix-padding branch June 19, 2025 09:11
wangxiyuan added a commit that referenced this pull request Oct 14, 2025
…nalinaly (#3406)

I'd like to nominate 4 new maintainers for vllm-ascend: 

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](#3002 (comment)),
[#discussion_r2224572309](#1803 (comment)),
[#issuecomment-2982470226](#1261 (comment)),
[#issuecomment-2903621197](#836 (comment)),
[#issuecomment-2857678691](#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](#2791 (comment)),
[discussion_r2332166704](#2817 (comment)),
[discussion_r2343289692](#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module #2150
#2706
#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. #1547
#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…nalinaly (vllm-project#3406)

I'd like to nominate 4 new maintainers for vllm-ascend: 

----

Yizhou Liu [@yiz-liu](https://github.com/yiz-liu)
----

**Review Quality‌**: He has completed [40+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Ayiz-liu)
and provided solutions or guides for [10+
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20commenter%3Ayiz-liu),
which includes many quality review like
[#issue-3428408401](vllm-project#3002 (comment)),
[#discussion_r2224572309](vllm-project#1803 (comment)),
[#issuecomment-2982470226](vllm-project#1261 (comment)),
[#issuecomment-2903621197](vllm-project#836 (comment)),
[#issuecomment-2857678691](vllm-project#778 (comment)).

**Sustained and High-Quality Contributions:** He has contributed more
than [30+
commits](https://github.com/vllm-project/vllm-ascend/commits?author=yiz-liu)
since Mar.2025, especially, aclgraph, DP, and EP related contributions
are the main reason why I nominated him. As the owner of aclgraph
support, he continuously improves aclgraph stability and performance as
well as fixes key bugs. he laid the groundwork for EP-related
functionality and delivered multiple foundational improvements

**Community involvement:** He has a very good habit of logging
issues:vllm-project#1649 and is
also very active and involved in [many
issues](https://github.com/vllm-project/vllm-ascend/issues?q=is%3Aissue%20state%3Aopen%20commenter%3Ayiz-liu%20-author%3Ayiz-liu)
to help users resolve issues.

----

Peng Yu  [@paulyu12](https://github.com/paulyu12)
---
The main reasons for his nomination are his expertise and key
contributions to the LORA and sustained and major contributions (initial
support/doc/bugfix) around Lora.

**Sustained and Major Contributions:** @paulyu12 starts his contribution
with [Lora and Mulit-Lora
support](vllm-project@697908f)
since Apr 2025, he contributed about [10+ commits and
bugfixes](vllm-project@697908f)
on vllm-ascend.
**Review Quality‌ and Community Involvement‌:** He also helped more than
10+ users address [Lora related
issues](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Apaulyu12+-author%3Apaulyu12+is%3Aclosed).

I believe his addition will further improve vLLM Ascend Lora support.

----

Jinqian Wei [@weijinqian0](https://github.com/weijinqian0)
---
The main reasons for his nomination are his key contributions to the RL
scene and the high quality of his code reviews.

**Review Quality‌:** He has completed [60+
reviews](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+commenter%3Aweijinqian0+is%3Aopen+-author%3Aweijinqian0)
since June. 2025, include
[#comment-3284055430](vllm-project#2791 (comment)),
[discussion_r2332166704](vllm-project#2817 (comment)),
[discussion_r2343289692](vllm-project#2846 (comment))
high quality review.

**Sustained and Quality Contributions:** He has Deep understanding of
‌vLLM‌ and ‌vLLM Ascend‌ codebases and solid contributions in RL scene
(about [10+ PR
merged](https://github.com/vllm-project/vllm-ascend/pulls?q=is%3Apr+author%3Aweijinqian0+is%3Amerged+)
and 10+ PRs merged as co-author.

- Code Refactor: As a co-author, he participated in the refactoring of
the MOE module vllm-project#2150
vllm-project#2706
vllm-project#2867
- Performance Enhancement for RL: Participated as a co-author in the
design and development of the solution, contributing to the planning of
core capabilities. vllm-project#1547
vllm-project#2120 and so on.

So I think he's a great addition to the vLLM Ascend Maintainer team.

----

Chuanyu Qin  [@nalinaly](https://github.com/nalinaly)
---
The main reason I nominated Qinchuanyu is because he is the initial
designer of aclgraph and torch-npu, two key components of vllm-ascend.
Considering aclgraph will eventually become the main path for
vllm-ascend's graph model, I propose to nominate him.

**Sustained and Major Contributions:** In fact, chuanyu actively helped
the users/developers of vllm-ascend since Mar 2025
([vllm-discuss#162](https://discuss.vllm.ai/t/can-ascend-officially-draft-a-documentation-on-the-vllm-ascend-adaptation-for-graph-mode/162/5)),
and also helped early users of vllm-ascend understand aclgraph. He
provided lots of help in the process of integrating aclgraph with
vllm-ascend.

**Community Involvement‌:** As speaker, he also presents help users
understand aclgraph and torch_npu [《The design philosophy of torch_npu
and the high performance principle of
aclGraph》](https://github.com/PyTorch-China/pytorch-meetup/blob/main/beijing-2025/%E3%80%905%E3%80%91torch_npu%20%E7%9A%84%E8%AE%BE%E8%AE%A1%E5%93%B2%E5%AD%A6%E4%B8%8E%20aclGraph%20%E9%AB%98%E6%80%A7%E8%83%BD%E5%8E%9F%E7%90%86-%E7%A7%A6%E4%BC%A0%E7%91%9C-0920.pdf)

----

They have activate contribution to vllm-ascend or have rich experience
for ascend AI.

Welcome!
- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants