Skip to content

Conversation

@yewentao256
Copy link
Member

@yewentao256 yewentao256 commented Sep 24, 2025

Purpose

Added clear instructions for #25604 (comment)

Test

After upgrading to torch 2.8.0, the issue disappear

vllm bench throughput --model Qwen/Qwen3-30B-A3B-FP8 --load-format dummy --input-len 1000 --output-len 100 --trust_remote_code --enable-expert-parallel

Throughput: 20.97 requests/s, 23004.43 total tokens/s, 2096.51 output tokens/s
Total num prompt tokens:  997271
Total num output tokens:  100000

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a workaround for an issue with TorchDynamo in PyTorch versions older than 2.8.0. The change modifies BasevLLMParameter.__torch_function__ to return NotImplemented for these older versions, preventing a crash. The implementation is correct, but I have one suggestion to improve the logging behavior to avoid potential performance issues and log spam.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
@mgoin mgoin enabled auto-merge (squash) September 24, 2025 22:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 24, 2025
@vllm-bot vllm-bot merged commit 4492e3a into main Sep 25, 2025
50 of 54 checks passed
@vllm-bot vllm-bot deleted the wentao-fix-torch-function-super-issue branch September 25, 2025 01:52
@ILikeIneine
Copy link
Contributor

ILikeIneine commented Sep 25, 2025

@mgoin @yewentao256 This pr cause the deepseek model crash while loading:

The output of traceback
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.collective_rpc("load_model")
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3120, in run_method
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 221, in load_model
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2580, in load_model
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.model = model_loader.load_model(
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 820, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.model = DeepseekV2Model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 201, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 748, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 626, in make_layers
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     [PPMissingLayer() for _ in range(start_layer)] + [
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 627, in <listcomp>
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 750, in <lambda>
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     lambda prefix: DeepseekV2DecoderLayer(vllm_config, prefix),
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 639, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.self_attn = attn_cls(
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 527, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.kv_a_proj_with_mqa = ReplicatedLinear(
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 337, in __init__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     self.quant_method.create_weights(self,
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 222, in create_weights
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     layer.register_parameter("weight", weight)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 620, in register_parameter
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     elif param.grad_fn:
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_device.py", line 104, in __torch_function__
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] TypeError: Multiple dispatch failed for 'torch.Tensor.grad_fn.__get__'; all __torch_function__ handlers returned NotImplemented:
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] 
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708]   - tensor subclass <class 'vllm.model_executor.parameter.ModelWeightParameter'>
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] 
(EngineCore_DP0 pid=1434551) ERROR 09-25 19:00:32 [core.py:708] For more information, try re-running with TORCH_LOGS=not_implemented
(EngineCore_DP0 pid=1434551) Process EngineCore_DP0:
(EngineCore_DP0 pid=1434551) Traceback (most recent call last):
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=1434551)     self.run()
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=1434551)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=1434551)     raise e
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=1434551)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=1434551)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=1434551)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=1434551)     self._init_executor()
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=1434551)     self.collective_rpc("load_model")
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=1434551)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3120, in run_method
(EngineCore_DP0 pid=1434551)     return func(*args, **kwargs)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 221, in load_model
(EngineCore_DP0 pid=1434551)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2580, in load_model
(EngineCore_DP0 pid=1434551)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=1434551)     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=1434551)     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 820, in __init__
(EngineCore_DP0 pid=1434551)     self.model = DeepseekV2Model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 201, in __init__
(EngineCore_DP0 pid=1434551)     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 748, in __init__
(EngineCore_DP0 pid=1434551)     self.start_layer, self.end_layer, self.layers = make_layers(
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 626, in make_layers
(EngineCore_DP0 pid=1434551)     [PPMissingLayer() for _ in range(start_layer)] + [
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 627, in <listcomp>
(EngineCore_DP0 pid=1434551)     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 750, in <lambda>
(EngineCore_DP0 pid=1434551)     lambda prefix: DeepseekV2DecoderLayer(vllm_config, prefix),
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 639, in __init__
(EngineCore_DP0 pid=1434551)     self.self_attn = attn_cls(
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/deepseek_v2.py", line 527, in __init__
(EngineCore_DP0 pid=1434551)     self.kv_a_proj_with_mqa = ReplicatedLinear(
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 337, in __init__
(EngineCore_DP0 pid=1434551)     self.quant_method.create_weights(self,
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 222, in create_weights
(EngineCore_DP0 pid=1434551)     layer.register_parameter("weight", weight)
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 620, in register_parameter
(EngineCore_DP0 pid=1434551)     elif param.grad_fn:
(EngineCore_DP0 pid=1434551)   File "/opt/conda/lib/python3.10/site-packages/torch/utils/_device.py", line 104, in __torch_function__
(EngineCore_DP0 pid=1434551)     return func(*args, **kwargs)
(EngineCore_DP0 pid=1434551) TypeError: Multiple dispatch failed for 'torch.Tensor.grad_fn.__get__'; all __torch_function__ handlers returned NotImplemented:
(EngineCore_DP0 pid=1434551) 
(EngineCore_DP0 pid=1434551)   - tensor subclass <class 'vllm.model_executor.parameter.ModelWeightParameter'>
(EngineCore_DP0 pid=1434551) 
(EngineCore_DP0 pid=1434551) For more information, try re-running with TORCH_LOGS=not_implemented

Could it be further checked or tested? (vllm-metax plugin with torch 2.6 got crashed on e2e test, without the modification everything worked well)

@wangxiyuan
Copy link
Contributor

+1, hit the similar issue.

@yewentao256
Copy link
Member Author

@ILikeIneine Thanks for the feedback! Taking a look now

@mgoin
Copy link
Member

mgoin commented Sep 25, 2025

Hey @ILikeIneine @wangxiyuan, is there a reason for your hardware backends to still be using torch 2.6 and 2.7? Generally we expect vllm to be compatible with only the latest version of torch and we don't have a need to support older torch versions before. If we must consider this, then we need to change our policy and CI to help capture issues in upstream.

If I understand your comments correctly, you only have an issue with this "bugfix" PR right? We can revert it as it only tries to give a helpful log

@ILikeIneine
Copy link
Contributor

is there a reason for your hardware backends to still be using torch 2.6 and 2.7? Generally we expect vllm to be compatible with only the latest version of torch and we don't have a need to support older torch versions before.

@mgoin Sorry for the delayed response. This is because our hardware backend sometimes can't catch up the latest release of torch. It takes some time to adapt. So the plugin torch version may not always the same as vllm's.(sometimes it's one or two version behind)

@wangxiyuan
Copy link
Contributor

yes, we use torch 2.7 now.

iboiko-habana pushed a commit to iboiko-habana/vllm-gaudi that referenced this pull request Oct 2, 2025
vllm-project/vllm#23991
vllm-project/vllm#25613

---------

Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
yewentao256 added a commit that referenced this pull request Oct 3, 2025
…lling disabled super() (#25613)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…lling disabled super() (vllm-project#25613)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
…lling disabled super() (vllm-project#25613)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…lling disabled super() (vllm-project#25613)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…lling disabled super() (vllm-project#25613)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants