Skip to content

Conversation

@MatthewBonanni
Copy link
Contributor

@MatthewBonanni MatthewBonanni commented Aug 29, 2025

Purpose

Add full cudagraph support in FlashAttention MLA backend.

Test Plan

VLLM_ATTENTION_BACKEND=FLASH_ATTN_MLA vllm bench throughput --model=deepseek-ai/DeepSeek-V2-Lite-Chat --dataset-name=random --input-len=128 --output-len=128 --num-prompts=10 --kv-cache-dtype=auto --compilation-config='{"cudagraph_mode": "full_decode_only"}'

Test Result

Functional


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@MatthewBonanni MatthewBonanni changed the title [Attention] FlashAttention MLA cudagraph support [WIP][Attention] FlashAttention MLA cudagraph support Aug 29, 2025
@mergify mergify bot added ci/build rocm Related to AMD ROCm v1 labels Aug 29, 2025
@MatthewBonanni MatthewBonanni force-pushed the feature/fa_mla_cudagraph branch 2 times, most recently from 559b62e to b916531 Compare September 2, 2025 20:25
@MatthewBonanni MatthewBonanni changed the title [WIP][Attention] FlashAttention MLA cudagraph support [Attention] FlashAttention MLA cudagraph support Sep 2, 2025
@MatthewBonanni MatthewBonanni force-pushed the feature/fa_mla_cudagraph branch 2 times, most recently from 8fe50f8 to a7224af Compare September 3, 2025 17:47
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for doing this!

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) September 8, 2025 19:55
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 8, 2025
@LucasWilkinson LucasWilkinson merged commit 620db1f into vllm-project:main Sep 8, 2025
46 checks passed
@MatthewBonanni MatthewBonanni deleted the feature/fa_mla_cudagraph branch September 9, 2025 03:02
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants