-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer
#27418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request replaces Conv3d layers with ReplicatedLinear layers in several model files (glm4_1v.py, qwen2_5_vl.py, qwen2_vl.py, qwen3_omni_moe_thinker.py, qwen3_vl.py) and introduces a utility function conv3d_to_linear_weight in vision.py to handle weight conversion. The change aims to provide a temporary solution for issue #27406, leveraging the specific case where kernel_size equals stride in the Conv3d layers, allowing for replacement with a linear layer. The review focuses on the correctness of the weight conversion and the impact on model functionality.
| return_bias=False, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
| L, C = x.shape | ||
| x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size) | ||
| x = self.proj(x).view(L, self.hidden_size) | ||
| x = self.proj(x) | ||
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
| L, C = x.shape | ||
| x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size) | ||
| x = self.proj(x).view(L, self.embed_dim) | ||
| x = self.proj(x) | ||
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
| L, C = x.shape | ||
| x = x.view(L, -1, self.temporal_patch_size, self.patch_size, self.patch_size) | ||
| x = self.proj(x).view(L, self.hidden_size) | ||
| x = self.proj(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| x = self.proj(x) | ||
| return x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| ) | ||
|
|
||
| def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
| L, C = x.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code reshapes the input tensor x before applying the convolutional layer and then reshapes the output. With the replacement of nn.Conv3d by ReplicatedLinear, these reshaping operations are no longer necessary and have been removed. However, it's crucial to ensure that the input tensor x is now directly compatible with the ReplicatedLinear layer's expected input shape. This change might introduce a critical issue if the input shape is not correctly adapted to the linear layer, potentially leading to incorrect computations or errors. The original code's reshaping operations might have been essential for aligning the input with the convolutional layer's expected format. Directly feeding x into self.proj without proper reshaping could lead to a mismatch in dimensions, causing the linear layer to perform unintended operations or raise exceptions. It's imperative to verify that the input x now has the correct shape expected by ReplicatedLinear to avoid breaking the model's functionality.
Can you confirm that the input tensor x is correctly preprocessed to match the expected input shape of the ReplicatedLinear layer? If not, this could lead to a critical error.
| return llm_pos_ids | ||
|
|
||
|
|
||
| def conv3d_to_linear_weight(conv3d_weight: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zRzRzRzRzRzRzR can you help verify the correctness (w.r.t. pytorch 2.8)?
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also communicated with Qwen team about this - let's get this in first.
|
Just to check, does this get all of the performance back? |
@zou3519 Yea it's lucky that |
…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>
Sounds good, I'll see what we can do on the PyTorch side (given it is a cudnn issue, the work might be on the Nvidia side) |
…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>
…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io>
…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ect#27418) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
### What this PR does / why we need it? vllm-project/vllm@c9461e0 Fix ```spec decode rejection sampler```, caused by vllm-project/vllm#26060 Fix some ```import```, caused by vllm-project/vllm#27374 Fix ```scheduler_config.send_delta_data```, caused by #3719 Fix ```init_with_cudagraph_sizes```, caused by vllm-project/vllm#26016 Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer, caused by vllm-project/vllm#27418 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@c9461e0 --------- Signed-off-by: Icey <1790571317@qq.com>
Purpose
kernel_size==stride, so that we can replace it with linear layer.TODO
Add some benchmark results.Not really, the code on main branch is terribly slow, and it should work now.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.