[Core] Separate out attention metadata building logic from prepare inputs #26764

LucasWilkinson · 2025-10-14T05:37:26Z

Preparatory refactor PR for: #24002 (#23789)

Separate out the attention metadata building logic from prepare inputs so we can re-use it for dummy_runs and to setup for padding for cuda-graphs before building attention metadata (see #23789 for motivation)

Moving _compute_cascade_attn_prefix_lens out of metadata building loops since whether or not we are using cascade attention is needed by cudagraph dispatcher that will happen before attention metadata building in #24002.

NOTE: renamed kv_cache_group_id to kv_cache_gid (common acronym in linux) to make more lines fit within the formatter max line width

fhl2000

Hi @LucasWilkinson, thank you for refactoring this! Just left one question, otherwise looking great.

vllm/v1/worker/gpu_model_runner.py

fhl2000

Overall looks good! Any chance to also separate the attn_metadata building logic into a function for the Eagle drafter? So we can later reuse them for both the drafter execution and its dummy run (prepared for full CUDA graph support).

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-11-08T20:50:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LucasWilkinson.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin

LGTM, nothing clearly stood out as missed

…puts (vllm-project#26764) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

mergify bot added the v1 label Oct 14, 2025

LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025

LucasWilkinson marked this pull request as ready for review October 14, 2025 16:49

LucasWilkinson requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners October 14, 2025 16:49

LucasWilkinson force-pushed the lwilkinson/seperate-build-attn-metadata branch from 3bdfe10 to e48ab54 Compare October 14, 2025 21:21

fhl2000 reviewed Oct 22, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

LucasWilkinson force-pushed the lwilkinson/seperate-build-attn-metadata branch from db8f24c to e9387ea Compare October 25, 2025 08:51

fhl2000 reviewed Nov 3, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/gpu_model_runner.py Show resolved Hide resolved

LucasWilkinson force-pushed the lwilkinson/seperate-build-attn-metadata branch 2 times, most recently from 6d1a1d4 to 4787566 Compare November 7, 2025 04:45

tjtanaa mentioned this pull request Nov 7, 2025

[Bug]: CUDA Graph Capture Issue: Unexpected Prefill Branches in Uniform Decode Graphs when MTP=2 #28207

Open

LucasWilkinson requested review from ProExpertProg and heheda12345 November 7, 2025 21:50

LucasWilkinson mentioned this pull request Nov 8, 2025

[Spec-decode] Refoctor cudagraphs for spec-decode;support uniform_alignment of cudagraph sizes. #23679

Open

5 tasks

mergify bot added the needs-rebase label Nov 8, 2025

LucasWilkinson added 9 commits November 8, 2025 20:54

refactor

aabc842

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

wip

b1a9277

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

831b4d4

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

24c3836

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

768d928

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

a6674f9

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

83fecae

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix

a3339af

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

clean up

9f2e872

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson added 7 commits November 8, 2025 20:54

fix docs error

b8b7847

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix lm_eval

528786a

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fixes

b279409

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

fix sliding window accuracy

c8806c6

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

review comments

218fadb

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

CI fix

1ae7228

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

cleanup

651b9de

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

LucasWilkinson force-pushed the lwilkinson/seperate-build-attn-metadata branch from 2814f5f to 651b9de Compare November 8, 2025 20:54

mergify bot removed the needs-rebase label Nov 8, 2025

format

0ce23f9

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

mgoin approved these changes Nov 8, 2025

View reviewed changes

LucasWilkinson merged commit 636efd1 into vllm-project:main Nov 9, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Separate out attention metadata building logic from prepare inputs #26764

[Core] Separate out attention metadata building logic from prepare inputs #26764

Uh oh!

LucasWilkinson commented Oct 14, 2025 •

edited

Loading

Uh oh!

fhl2000 left a comment

Uh oh!

Uh oh!

fhl2000 left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Nov 8, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Core] Separate out attention metadata building logic from prepare inputs #26764

[Core] Separate out attention metadata building logic from prepare inputs #26764

Uh oh!

Conversation

LucasWilkinson commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhl2000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fhl2000 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Nov 8, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucasWilkinson commented Oct 14, 2025 •

edited

Loading