[MM][Bugfix] Replace `PatchEmbed`'s conv3d to linear layer #27418

Isotr0py · 2025-10-23T14:03:08Z

Purpose

A temporary solution for [Bug]: MM performance regression from upgrading to torch 2.9 #27406
Fortunately, all ViT's Conv3d layers have kernel_size==stride, so that we can replace it with linear layer.

TODO

~~Add some benchmark results.~~ Not really, the code on main branch is terribly slow, and it should work now.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

gemini-code-assist

Code Review

This pull request replaces Conv3d layers with ReplicatedLinear layers in several model files (glm4_1v.py, qwen2_5_vl.py, qwen2_vl.py, qwen3_omni_moe_thinker.py, qwen3_vl.py) and introduces a utility function conv3d_to_linear_weight in vision.py to handle weight conversion. The change aims to provide a temporary solution for issue #27406, leveraging the specific case where kernel_size equals stride in the Conv3d layers, allowing for replacement with a linear layer. The review focuses on the correctness of the weight conversion and the impact on model functionality.

gemini-code-assist · 2025-10-23T14:07:02Z

vllm/model_executor/models/glm4_1v.py

+            return_bias=False,
        )



The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

gemini-code-assist · 2025-10-23T14:07:02Z

vllm/model_executor/models/qwen2_5_vl.py

    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        L, C = x.shape
-        x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
-        x = self.proj(x).view(L, self.hidden_size)
+        x = self.proj(x)
        return x


The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

gemini-code-assist · 2025-10-23T14:07:02Z

vllm/model_executor/models/qwen2_vl.py

    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        L, C = x.shape
-        x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
-        x = self.proj(x).view(L, self.embed_dim)
+        x = self.proj(x)
        return x


The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

gemini-code-assist · 2025-10-23T14:07:02Z

vllm/model_executor/models/qwen3_omni_moe_thinker.py

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        L, C = x.shape
-        x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size)
-        x = self.proj(x).view(L, self.hidden_size)
+        x = self.proj(x)


The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

gemini-code-assist · 2025-10-23T14:07:03Z

vllm/model_executor/models/qwen3_vl.py

+        x = self.proj(x)
        return x


The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

gemini-code-assist · 2025-10-23T14:07:03Z

vllm/model_executor/models/glm4_1v.py

        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        L, C = x.shape


The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.

Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.

youkaichao · 2025-10-23T14:37:09Z

vllm/model_executor/models/vision.py

    return llm_pos_ids
+
+
+def conv3d_to_linear_weight(conv3d_weight: torch.Tensor) -> torch.Tensor:


comment why we need this?

youkaichao

@zRzRzRzRzRzRzR can you help verify the correctness (w.r.t. pytorch 2.8)?

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

ywang96

I have also communicated with Qwen team about this - let's get this in first.

zou3519 · 2025-10-24T06:03:21Z

Just to check, does this get all of the performance back?

ywang96 · 2025-10-24T06:16:12Z

Just to check, does this get all of the performance back?

@zou3519 Yea it's lucky that kernel_size==stride in this case so we can use linear instead with the reshape hack. I talked to the model vendors and confirmed that they do still use Conv3d in the training code so hopefully this issue can be resolved soon.

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>

zou3519 · 2025-10-24T16:34:24Z

Just to check, does this get all of the performance back?

@zou3519 Yea it's lucky that kernel_size==stride in this case so we can use linear instead with the reshape hack. I talked to the model vendors and confirmed that they do still use Conv3d in the training code so hopefully this issue can be resolved soon.

Sounds good, I'll see what we can do on the PyTorch side (given it is a cudnn issue, the work might be on the Nvidia side)

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

### What this PR does / why we need it? vllm-project/vllm@c9461e0 Fix ```spec decode rejection sampler```, caused by vllm-project/vllm#26060 Fix some ```import```, caused by vllm-project/vllm#27374 Fix ```scheduler_config.send_delta_data```, caused by #3719 Fix ```init_with_cudagraph_sizes```, caused by vllm-project/vllm#26016 Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer, caused by vllm-project/vllm#27418 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@c9461e0 --------- Signed-off-by: Icey <1790571317@qq.com>

conv3d to linear

e358b7c

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

Isotr0py requested a review from sighingnow as a code owner October 23, 2025 14:03

mergify bot added the qwen Related to Qwen models label Oct 23, 2025

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

youkaichao reviewed Oct 23, 2025

View reviewed changes

youkaichao approved these changes Oct 23, 2025

View reviewed changes

add comment about the workaround

fb5f3ff

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

huydhn approved these changes Oct 23, 2025

View reviewed changes

youkaichao mentioned this pull request Oct 24, 2025

4x performance regression for 3D convs with AMP on torch 2.9.0 pytorch/pytorch#166122

Open

ywang96 added 2 commits October 23, 2025 19:12

Merge branch 'main' into conv3d-to-linear

a663e7c

Merge branch 'main' into conv3d-to-linear

bca65ff

ywang96 approved these changes Oct 24, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025

ywang96 enabled auto-merge (squash) October 24, 2025 07:22

ywang96 merged commit 42efe60 into vllm-project:main Oct 24, 2025
54 of 55 checks passed

Isotr0py deleted the conv3d-to-linear branch October 24, 2025 07:48

atalhens pushed a commit to atalhens/vllm that referenced this pull request Oct 24, 2025

[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (vllm-proj…

9f99139

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>

Kay-Tian mentioned this pull request Oct 24, 2025

vLLM PR #27418 变更核心文件提醒 Kay-Tian/vllm#39

Open

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (vllm-proj…

b111b0a

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>

rohin-garg pushed a commit to rohin-garg/vllm that referenced this pull request Oct 25, 2025

[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer (vllm-proj…

c2ba676

…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>

ywang96 mentioned this pull request Oct 27, 2025

[Bug]: MM performance regression from upgrading to torch 2.9 #27406

Closed

1 task

wxsIcey mentioned this pull request Oct 28, 2025

Upgrade to 0.11.1 newest vllm commit vllm-project/vllm-ascend#3762

Merged

		return llm_pos_ids


		def conv3d_to_linear_weight(conv3d_weight: torch.Tensor) -> torch.Tensor:

Uh oh!

[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer #27418

[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer #27418

Conversation

Isotr0py commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

zou3519 commented Oct 24, 2025

Uh oh!

ywang96 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zou3519 commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[MM][Bugfix] Replace `PatchEmbed`'s conv3d to linear layer #27418

[MM][Bugfix] Replace `PatchEmbed`'s conv3d to linear layer #27418

Isotr0py commented Oct 23, 2025 •

edited by github-actions bot

Loading

ywang96 commented Oct 24, 2025 •

edited

Loading