-
Notifications
You must be signed in to change notification settings - Fork 530
[0.7.3] optimize qwen2_vl and qwen2_5_vl #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
vllm_ascend/models/qwen2_5_vl.py
Outdated
| context_layer = torch.torch.empty_like(q) | ||
|
|
||
| # operator requires pta version >= 2.5.1 | ||
| # operator requires pta version >= 2.5.1.dev20250226 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # operator requires pta version >= 2.5.1.dev20250226 | |
| # operator requires pta version >= 2.5.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I’ve made some updates based on your advice.
vllm_ascend/models/qwen2_vl.py
Outdated
| context_layer = torch.torch.empty_like(q) | ||
|
|
||
| # operator requires pta version >= 2.5.1 | ||
| # operator requires pta version >= 2.5.1.dev20250226 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # operator requires pta version >= 2.5.1.dev20250226 | |
| # operator requires pta version >= 2.5.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I’ve made some updates based on your advice.
|
@zouyida2052 Can you paste the optimized performance here compared with the v0.7.3 branch? This can have some straight overview. |
vllm_ascend/models/qwen2_vl.py
Outdated
| from vllm.multimodal import MULTIMODAL_REGISTRY | ||
|
|
||
| MIN_PAD_SIZE = 64 | ||
| MAX_PAD_SIZE = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments for those 2 magic number? Is this caused by kernel requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've added my explanation on this 2 numbers.
| ModelRegistry.register_model( | ||
| "Qwen2VLForConditionalGeneration", | ||
| "vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration") | ||
| "vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.
Initially, our naming with "Custom" wasn't very elegant and didn’t align well with our own logic. I think we should switch to using "Ascend" instead. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with Ascend prefix.
I've added it on my comment, please take a look. |
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
|
ready to go |
What this PR does / why we need it?
Optimize qwen2_vl and qwen2_5_vl.
Does this PR introduce any user-facing change?
no
How was this patch tested?
Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got roughly 22% perf boost.