-
Notifications
You must be signed in to change notification settings - Fork 572
[Bugfix] Fix qwen2.5-vl-without-padding #2623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request corrects the class structure for AscendQwen2_5_VLForConditionalGeneration_Without_Padding by properly including the _process_image_input and _process_video_input methods within the class, fixing what appears to be an indentation issue. It also adds tests to verify that these methods are correctly overridden from the parent class.
The changes are correct and address the bug. I have one suggestion to improve the efficiency of the __init__ method in AscendQwen2_5_VLForConditionalGeneration_Without_Padding to avoid unnecessary object instantiation during model loading.
| def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""): | ||
| super().__init__(vllm_config=vllm_config, prefix=prefix) | ||
| config: Qwen2_5_VLConfig = vllm_config.model_config.hf_config | ||
| quant_config = vllm_config.quant_config | ||
| self.visual = AscendQwen2_5_VisionTransformer_Without_Padding( | ||
| vision_config=config.vision_config, | ||
| norm_eps=getattr(config, "rms_norm_eps", 1e-6), | ||
| quant_config=self._maybe_ignore_quant_config(quant_config), | ||
| prefix=maybe_prefix(prefix, "visual"), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current __init__ method calls super().__init__, which instantiates the parent's Qwen2_5_VisionTransformer. This object is then immediately discarded and replaced by an AscendQwen2_5_VisionTransformer_Without_Padding. This is inefficient and can impact model loading performance and memory usage.
A better approach is to bypass the parent's __init__ and replicate its essential logic. This involves calling nn.Module.__init__() directly and then initializing the language_model attribute, as done for self.visual. This avoids the unnecessary object creation.
def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""):
# Call `nn.Module.__init__` directly to avoid instantiating the base
# vision model, which is immediately replaced. This improves efficiency.
nn.Module.__init__(self)
config: Qwen2_5_VLConfig = vllm_config.model_config.hf_config
quant_config = vllm_config.quant_config
self.visual = AscendQwen2_5_VisionTransformer_Without_Padding(
vision_config=config.vision_config,
norm_eps=getattr(config, "rms_norm_eps", 1e-6),
quant_config=self._maybe_ignore_quant_config(quant_config),
prefix=maybe_prefix(prefix, "visual"),
)
# Replicate language_model creation from the parent class.
# A local import is used to avoid modifying file-level imports.
from vllm.model_executor.models.qwen2 import Qwen2ForCausalLM
self.language_model = Qwen2ForCausalLM(config,
quant_config=quant_config,
prefix=prefix)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2623 +/- ##
==========================================
+ Coverage 72.35% 72.42% +0.06%
==========================================
Files 146 146
Lines 21622 21608 -14
==========================================
+ Hits 15645 15649 +4
+ Misses 5977 5959 -18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? Correct `AscendQwen2_5_VLForConditionalGeneration_Without_Padding` override methods ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@42dc59d --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
### What this PR does / why we need it? Correct `AscendQwen2_5_VLForConditionalGeneration_Without_Padding` override methods ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@42dc59d --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? Correct `AscendQwen2_5_VLForConditionalGeneration_Without_Padding` override methods ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@42dc59d --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? Correct `AscendQwen2_5_VLForConditionalGeneration_Without_Padding` override methods ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@42dc59d --------- Signed-off-by: wangli <wangli858794774@gmail.com>
What this PR does / why we need it?
Correct
AscendQwen2_5_VLForConditionalGeneration_Without_Paddingoverride methodsDoes this PR introduce any user-facing change?
How was this patch tested?