[Attention] Add missing kv cache scale setup #27490

MatthewBonanni · 2025-10-24T21:23:52Z

Purpose

#25103 missed the kv cache scale setup when breaking out MLAAttention. This PR adds this to MLAAttention.__init__

cc @pavanimajety

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

gemini-code-assist

Code Review

This pull request correctly adds the missing KV cache scale and quantization setup to the MLAAttention layer. The implementation appears to be a direct copy of the logic from the Attention layer. While this fixes the immediate issue, it introduces significant code duplication. My review includes a suggestion to refactor this common initialization logic into a shared helper function to improve maintainability and avoid potential inconsistencies in the future.

vllm/attention/layer.py

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

vllm/attention/layer.py

pavanimajety

LGTM, thanks for the quick fix!

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

fix scales calculation

596a494

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

MatthewBonanni requested a review from LucasWilkinson as a code owner October 24, 2025 21:23

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

copy comments over

d4e2794

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

pavanimajety reviewed Oct 24, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

pavanimajety approved these changes Oct 24, 2025

View reviewed changes

pavanimajety added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 24, 2025

MatthewBonanni added 2 commits October 24, 2025 20:20

refactor to helper

59d4756

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

remove comment

801b65c

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

mgoin added bug Something isn't working deepseek Related to DeepSeek models labels Oct 25, 2025

mgoin approved these changes Oct 25, 2025

View reviewed changes

vllm-bot merged commit a99564a into vllm-project:main Oct 25, 2025
49 of 51 checks passed

Kay-Tian mentioned this pull request Oct 25, 2025

vLLM PR #27490 变更核心文件提醒 Kay-Tian/vllm#40

Closed

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Attention] Add missing kv cache scale setup (vllm-project#27490)

2509106

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

[Attention] Add missing kv cache scale setup (vllm-project#27490)

f7bc3bc

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

MatthewBonanni deleted the fix_scales branch October 27, 2025 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Attention] Add missing kv cache scale setup #27490

[Attention] Add missing kv cache scale setup #27490

Uh oh!

MatthewBonanni commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

pavanimajety left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

[Attention] Add missing kv cache scale setup #27490

[Attention] Add missing kv cache scale setup #27490

Uh oh!

Conversation

MatthewBonanni commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

pavanimajety left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MatthewBonanni commented Oct 24, 2025 •

edited by github-actions bot

Loading