Skip to content

Conversation

@zouyida2052
Copy link
Contributor

@zouyida2052 zouyida2052 commented Apr 28, 2025

What this PR does / why we need it?

Optimize qwen2_vl and qwen2_5_vl.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Testing this PR on 1080p picture with tp=1, bs=1 on Qwen2-VL and Qwen2.5-VL, every fa op's during time lasting from 11ms to 9ms, got roughly 22% perf boost.

zouyida2002 and others added 7 commits April 28, 2025 12:14
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida@huawei.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
context_layer = torch.torch.empty_like(q)

# operator requires pta version >= 2.5.1
# operator requires pta version >= 2.5.1.dev20250226
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# operator requires pta version >= 2.5.1.dev20250226
# operator requires pta version >= 2.5.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I’ve made some updates based on your advice.

context_layer = torch.torch.empty_like(q)

# operator requires pta version >= 2.5.1
# operator requires pta version >= 2.5.1.dev20250226
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# operator requires pta version >= 2.5.1.dev20250226
# operator requires pta version >= 2.5.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I’ve made some updates based on your advice.

@ganyi1996ppo
Copy link
Collaborator

@zouyida2052 Can you paste the optimized performance here compared with the v0.7.3 branch? This can have some straight overview.

from vllm.multimodal import MULTIMODAL_REGISTRY

MIN_PAD_SIZE = 64
MAX_PAD_SIZE = 128
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments for those 2 magic number? Is this caused by kernel requirements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've added my explanation on this 2 numbers.

ModelRegistry.register_model(
"Qwen2VLForConditionalGeneration",
"vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration")
"vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qwen named AscendXX, Deepseek named CustomXXX. it's a litte complex for user. Let keep the same in the future.

Initially, our naming with "Custom" wasn't very elegant and didn’t align well with our own logic. I think we should switch to using "Ascend" instead. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with Ascend prefix.

@zouyida2052
Copy link
Contributor Author

@zouyida2052 Can you paste the optimized performance here compared with the v0.7.3 branch? This can have some straight overview.

I've added it on my comment, please take a look.

Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
@wangxiyuan wangxiyuan merged commit b9528e6 into vllm-project:v0.7.3-dev Apr 28, 2025
11 checks passed
@wangxiyuan
Copy link
Collaborator

ready to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants