[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA #24385

minosfuture · 2025-09-07T03:33:49Z

Purpose

This PR supports decode context parallelism with CUTLASS MLA kernels on GB200

credits to #22789 from @LucasWilkinson, and @youkaichao

Test Plan

pytest -v -s tests/distributed/test_context_parallel.py

note: on GB200, needs to modify this UT to use only two GPUs. This can be added later or in a follow-up PR.

Test Result

both -tp 2 -dcp 2 and -tp 2 work

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Ming Yang <minos.future@gmail.com>

gemini-code-assist

Code Review

This pull request adds support for returning log-sum-exp (LSE) values from the CUTLASS MLA decode kernel, which is a key requirement for enabling decode context parallelism on GB200. The changes are well-contained and correctly plumb the new lse tensor through the C++ kernel, PyTorch bindings, and the Python attention backend. I have identified one critical issue related to tensor shape consistency that needs to be addressed.

vllm/v1/attention/backends/mla/cutlass_mla.py

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Ming Yang <minos.future@gmail.com>

vllm/v1/attention/backends/mla/cutlass_mla.py

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Ming Yang <minos.future@gmail.com>

youkaichao

locally verified that tests/distributed/test_context_parallel.py can pass on B200 now. thanks for the great job!

Signed-off-by: youkaichao <youkaichao@gmail.com>

LucasWilkinson

LGTM! Thanks for doing this!

… MLA (vllm-project#24385) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>

… MLA (vllm-project#24385) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Kernel] Support decode context parallelism on GB200 with CUTLASS MLA

63d331f

Signed-off-by: Ming Yang <minos.future@gmail.com>

minosfuture requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 7, 2025 03:33

mergify bot added the v1 label Sep 7, 2025

gemini-code-assist bot reviewed Sep 7, 2025

View reviewed changes

vllm/v1/attention/backends/mla/cutlass_mla.py Outdated Show resolved Hide resolved

youkaichao reviewed Sep 7, 2025

View reviewed changes

vllm/v1/attention/backends/mla/cutlass_mla.py Outdated Show resolved Hide resolved

youkaichao changed the title ~~[Kernel] Support decode context parallelism on GB200 with CUTLASS MLA~~ [Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA Sep 7, 2025

address comment

8e21d64

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Ming Yang <minos.future@gmail.com>

youkaichao reviewed Sep 7, 2025

View reviewed changes

vllm/v1/attention/backends/mla/cutlass_mla.py Outdated Show resolved Hide resolved

address comment

1d23a5c

Co-authored-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Ming Yang <minos.future@gmail.com>

youkaichao approved these changes Sep 7, 2025

View reviewed changes

youkaichao enabled auto-merge (squash) September 7, 2025 06:54

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 7, 2025

update tests

65692b4

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested review from tlrmchlsmth and yewentao256 as code owners September 7, 2025 09:03

youkaichao added 2 commits September 7, 2025 17:05

update comments

03fb011

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix return

1295bd2

Signed-off-by: youkaichao <youkaichao@gmail.com>

LucasWilkinson approved these changes Sep 7, 2025

View reviewed changes

youkaichao disabled auto-merge September 8, 2025 01:27

youkaichao merged commit 86173ad into vllm-project:main Sep 8, 2025
66 of 71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA #24385

[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA #24385

minosfuture commented Sep 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA #24385

[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA #24385

Conversation

minosfuture commented Sep 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

minosfuture commented Sep 7, 2025 •

edited by github-actions bot

Loading