[0.7.3] optimize qwen2_vl and qwen2_5_vl #702

zouyida2052 · 2025-04-28T06:51:53Z

What this PR does / why we need it?

Optimize qwen2_vl and qwen2_5_vl.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got roughly 22% perf boost.

Signed-off-by: zouyida2052 <zouyida@huawei.com>

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

Yikun · 2025-04-28T10:11:41Z

vllm_ascend/models/qwen2_5_vl.py

        context_layer = torch.torch.empty_like(q)

-        # operator requires pta version >= 2.5.1
+        # operator requires pta version >= 2.5.1.dev20250226


Suggested change

# operator requires pta version >= 2.5.1.dev20250226

# operator requires pta version >= 2.5.1

Thanks for the suggestion! I’ve made some updates based on your advice.

Yikun · 2025-04-28T10:12:08Z

vllm_ascend/models/qwen2_vl.py

        context_layer = torch.torch.empty_like(q)

-        # operator requires pta version >= 2.5.1
+        # operator requires pta version >= 2.5.1.dev20250226


Suggested change

# operator requires pta version >= 2.5.1.dev20250226

# operator requires pta version >= 2.5.1

Thanks for the suggestion! I’ve made some updates based on your advice.

ganyi1996ppo · 2025-04-28T10:57:00Z

@zouyida2052 Can you paste the optimized performance here compared with the v0.7.3 branch? This can have some straight overview.

ganyi1996ppo · 2025-04-28T11:02:18Z

vllm_ascend/models/qwen2_vl.py

 from vllm.multimodal import MULTIMODAL_REGISTRY

+MIN_PAD_SIZE = 64
+MAX_PAD_SIZE = 128


Can you add comments for those 2 magic number? Is this caused by kernel requirements?

Ok, I've added my explanation on this 2 numbers.

wangxiyuan · 2025-04-28T11:16:10Z

vllm_ascend/models/__init__.py

    ModelRegistry.register_model(
        "Qwen2VLForConditionalGeneration",
-        "vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration")
+        "vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration")


qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.

qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.

Initially, our naming with "Custom" wasn't very elegant and didn’t align well with our own logic. I think we should switch to using "Ascend" instead. What do you think?

I'm fine with Ascend prefix.

zouyida2052 · 2025-04-28T11:28:28Z

@zouyida2052 Can you paste the optimized performance here compared with the v0.7.3 branch? This can have some straight overview.

I've added it on my comment, please take a look.

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

wangxiyuan · 2025-04-28T12:02:10Z

ready to go

zouyida2002 and others added 7 commits April 28, 2025 12:14

bugfix for qwen2_5_vl

748f090

Signed-off-by: zouyida2052 <zouyida@huawei.com>

qwen_vl_bugfix

eb4c7cd

Signed-off-by: zouyida2052 <zouyida@huawei.com>

cleancode for qwen_vl

cfe1f48

Signed-off-by: zouyida2052 <zouyida@huawei.com>

cleancode for qwen2_vl

82c3be3

Signed-off-by: zouyida2052 <zouyida@huawei.com>

cleancode for qwen2_vl

83bacef

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

cleancode for qwen2_vl

609abf4

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

cleancode for qwen2_vl

4ed7e96

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

wangxiyuan mentioned this pull request Apr 28, 2025

[Release]: vLLM Ascend v0.7.3 release checklist #644

Closed

46 tasks

Yikun reviewed Apr 28, 2025

View reviewed changes

ganyi1996ppo reviewed Apr 28, 2025

View reviewed changes

wangxiyuan reviewed Apr 28, 2025

View reviewed changes

cleancode for qwen2_vl

3819cac

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>

wangxiyuan approved these changes Apr 28, 2025

View reviewed changes

wangxiyuan merged commit b9528e6 into vllm-project:v0.7.3-dev Apr 28, 2025
11 checks passed

	# operator requires pta version >= 2.5.1.dev20250226
	# operator requires pta version >= 2.5.1

[0.7.3] optimize qwen2_vl and qwen2_5_vl #702

[0.7.3] optimize qwen2_vl and qwen2_5_vl #702

Uh oh!

Conversation

zouyida2052 commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Apr 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zouyida2052 commented Apr 28, 2025

Uh oh!

Uh oh!

wangxiyuan commented Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zouyida2052 commented Apr 28, 2025 •

edited

Loading