[WIP] Support persistent MLA for ROCm MLA backend #739

ganyi1996ppo · 2025-10-16T07:05:18Z

Purpose

support mla persistent kernel
support fp8 mla
aiter branch: mla_splitkv_enhance_split_alg_inte

Test Plan

Serving script:

export VLLM_USE_V1=1
export SAFETENSORS_FAST_GPU=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=1
export VLLM_USE_TRITON_FLASH_ATTN=0
export NCCL_DEBUG=WARN
export VLLM_RPC_TIMEOUT=1800000
export VLLM_ROCM_USE_AITER_ASMMOE=1
export VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_USE_TRITON_ROPE=1

# for profiling
# export VLLM_TORCH_PROFILER_DIR="deepseek_in3k_out1k"
# export VLLM_TORCH_PROFILER_WITH_STACK=1
# export VLLM_TORCH_PROFILER_RECORD_SHAPES=1

model_path="/mnt/raid0/zhangguopeng/deepseek-r1-FP8-Dynamic"
vllm serve $model_path \
  --tensor-parallel-size 8 \
  --max-num-batched-tokens 32768 \
  --trust-remote-code \
  --no-enable-prefix-caching \
  --disable-log-requests \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --gpu_memory_utilization 0.9 \
  --block-size 1 \
  --kv-cache-dtype fp8 \ # for fp8 cache, remove it if you want bf16 for mla

Test Result

acc test result:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9522|±  |0.0059|
|     |       |strict-match    |     5|exact_match|↑  |0.9507|±  |0.0060|

acc result for fp8 mla

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.953|±  |0.0058|
|     |       |strict-match    |     5|exact_match|↑  |0.953|±  |0.0058|

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: ganyi <ygan@amd.com>

sunway513 · 2025-10-22T15:33:41Z

Was this PR also submitted to upstream vLLM?

ganyi1996ppo · 2025-10-23T01:48:22Z

Was this PR also submitted to upstream vLLM?

The upstream PR link vllm-project#27380
FYI, the upstream PR will remain draft until the persistent kernel merged into aiter's main branch

ganyi1996ppo requested review from kliuae-amd, tjtanaavllm, wuhuikx and zejunchen-zejun as code owners October 16, 2025 07:05

ganyi1996ppo added 5 commits October 22, 2025 05:34

enable persistent mla kernel

ff33b31

Signed-off-by: ganyi <ygan@amd.com>

workable

e33bcff

Signed-off-by: ganyi <ygan@amd.com>

acc verified

f5bec92

Signed-off-by: ganyi <ygan@amd.com>

fp8 mla support

c6f2fd2

Signed-off-by: ganyi <ygan@amd.com>

lint fix

5515fcb

Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo force-pushed the ganyi/persistent_mla branch from 7ba9a55 to 5515fcb Compare October 22, 2025 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Support persistent MLA for ROCm MLA backend #739

[WIP] Support persistent MLA for ROCm MLA backend #739

Uh oh!

ganyi1996ppo commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

sunway513 commented Oct 22, 2025

Uh oh!

ganyi1996ppo commented Oct 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[WIP] Support persistent MLA for ROCm MLA backend #739

Are you sure you want to change the base?

[WIP] Support persistent MLA for ROCm MLA backend #739

Uh oh!

Conversation

ganyi1996ppo commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

sunway513 commented Oct 22, 2025

Uh oh!

ganyi1996ppo commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ganyi1996ppo commented Oct 16, 2025 •

edited by github-actions bot

Loading

ganyi1996ppo commented Oct 23, 2025 •

edited

Loading