[Attention] Tune CUTLASS MLA num_splits #26846

MatthewBonanni · 2025-10-14T21:58:11Z

Purpose

Tune the num_splits heuristic for CUTLASS_MLA to achieve some speedup now that #26026 has fixed the hang. Based on experiments performed using the tools introduced in #26835, this is the optimal num_splits policy:

Following the optimal policy would yield this speedup:

As a simpler alternative, we implement a heuristic yielding the following policy:

This results in the following speedup:

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request introduces a new heuristic for determining num_splits in CUTLASS MLA to improve performance. The new logic is based on the ratio of sequence length to batch size. While this is a reasonable approach for performance tuning, my review has identified a critical concern. The change removes a safeguard that was in place to prevent kernel hangs when the batch size is greater than one. Reintroducing this hang would be a critical issue, and it's not clear from the pull request description if the underlying problem has been resolved. I have left a comment detailing this concern.

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson

LGTM! thanks for doing this!

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

tune num_splits

4ba352b

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Oct 14, 2025

View reviewed changes

csrc/attention/mla/cutlass_sm100_mla/device/sm100_mla.hpp Show resolved Hide resolved

MatthewBonanni added 2 commits October 15, 2025 13:46

implement new policy

f8b9780

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

remove comments

4bd6526

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

LucasWilkinson approved these changes Oct 15, 2025

View reviewed changes

LucasWilkinson enabled auto-merge (squash) October 15, 2025 18:28

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025

Merge branch 'main' into tune_num_splits

89b8cd0

vllm-bot merged commit 314fa8a into vllm-project:main Oct 16, 2025
83 of 85 checks passed

albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

a36234c

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

f0b20c0

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

15f42f5

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

0d2814f

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

e961ea1

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

9aa4055

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

db9bea8

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

5390be1

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

57b97e0

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

MatthewBonanni mentioned this pull request Nov 4, 2025

Add attention benchmarking tools #26835

Open

5 tasks

MatthewBonanni deleted the tune_num_splits branch November 4, 2025 16:25

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

f38a182

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

[Attention] Tune CUTLASS MLA num_splits (vllm-project#26846)

d64d3d5

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Attention] Tune CUTLASS MLA num_splits #26846

[Attention] Tune CUTLASS MLA num_splits #26846

Uh oh!

MatthewBonanni commented Oct 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Attention] Tune CUTLASS MLA num_splits #26846

[Attention] Tune CUTLASS MLA num_splits #26846

Uh oh!

Conversation

MatthewBonanni commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatthewBonanni commented Oct 14, 2025 •

edited by github-actions bot

Loading