Skip to content

Conversation

@Angazenn
Copy link
Collaborator

What this PR does / why we need it?

Currently, the implementation for MLA V1 pads q, k, v to head_dim 256 to conform to early MLA kernel. But the new MLA kernel supports head_dim that can't be devided by 128. Therefore we can remove those unnecessary paddings to boost the performance

Does this PR introduce any user-facing change?

No.

How was this patch tested?

@Angazenn Angazenn changed the title [WIP][Perf]remove unnecessary padding before mla prefill v1 [WIP][Perf]remove unnecessary padding before MLA V1 prefill May 21, 2025
Signed-off-by: angazenn <zengyanjia@huawei.com>
@ganyi1996ppo ganyi1996ppo merged commit a970b27 into vllm-project:main May 23, 2025
15 checks passed
@ganyi1996ppo ganyi1996ppo changed the title [WIP][Perf]remove unnecessary padding before MLA V1 prefill [Perf]remove unnecessary padding before MLA V1 prefill May 23, 2025
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request May 30, 2025
…ject#917)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Currently, the implementation for MLA V1 pads q, k, v to `head_dim` 256
to conform to early MLA kernel. But the new MLA kernel supports
`head_dim` that can't be devided by 128. Therefore we can remove those
unnecessary paddings to boost the performance

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
@Yikun Yikun mentioned this pull request Jun 28, 2025
40 tasks
@Angazenn Angazenn deleted the unpad branch September 8, 2025 03:16
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…ject#917)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Currently, the implementation for MLA V1 pads q, k, v to `head_dim` 256
to conform to early MLA kernel. But the new MLA kernel supports
`head_dim` that can't be devided by 128. Therefore we can remove those
unnecessary paddings to boost the performance

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Angazenn added a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ject#917)

<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
Currently, the implementation for MLA V1 pads q, k, v to `head_dim` 256
to conform to early MLA kernel. But the new MLA kernel supports
`head_dim` that can't be devided by 128. Therefore we can remove those
unnecessary paddings to boost the performance

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: angazenn <zengyanjia@huawei.com>
Co-authored-by: angazenn <zengyanjia@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants