Skip to content

Conversation

@minosfuture
Copy link
Contributor

@minosfuture minosfuture commented Sep 7, 2025

Purpose

This PR supports decode context parallelism with CUTLASS MLA kernels on GB200

credits to #22789 from @LucasWilkinson, and @youkaichao

Test Plan

pytest -v -s tests/distributed/test_context_parallel.py

note: on GB200, needs to modify this UT to use only two GPUs. This can be added later or in a follow-up PR.

Test Result

both -tp 2 -dcp 2 and -tp 2 work


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Ming Yang <minos.future@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for returning log-sum-exp (LSE) values from the CUTLASS MLA decode kernel, which is a key requirement for enabling decode context parallelism on GB200. The changes are well-contained and correctly plumb the new lse tensor through the C++ kernel, PyTorch bindings, and the Python attention backend. I have identified one critical issue related to tensor shape consistency that needs to be addressed.

@youkaichao youkaichao changed the title [Kernel] Support decode context parallelism on GB200 with CUTLASS MLA [Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA Sep 7, 2025
Co-authored-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Ming Yang <minos.future@gmail.com>
Copy link
Member

@youkaichao youkaichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locally verified that tests/distributed/test_context_parallel.py can pass on B200 now. thanks for the great job!

@youkaichao youkaichao enabled auto-merge (squash) September 7, 2025 06:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 7, 2025
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for doing this!

@youkaichao youkaichao disabled auto-merge September 8, 2025 01:27
@youkaichao youkaichao merged commit 86173ad into vllm-project:main Sep 8, 2025
66 of 71 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
… MLA (vllm-project#24385)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
… MLA (vllm-project#24385)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
… MLA (vllm-project#24385)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
… MLA (vllm-project#24385)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
… MLA (vllm-project#24385)

Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants