[MISC] `cudagraph_capture_sizes` related improvements #26016

fhl2000 · 2025-10-01T11:20:26Z

Purpose

Part of the CompilationConfig improvements (#20283), asked by @ProExpertProg and @WoosukKwon:

rename max_capture_size -> max_cudagraph_capture_size: used to specify a single size, fills in the rest
cudagraph_capture_sizes -> stays the same, used only to specify a full list
SchedulerConfig.cuda_graph_sizes: -> remove
Also stop reversing the sizes, always store them in ascending order (already on the issue), and always sort when using so we don't rely on them being sorted

Updated:

The assignment of cudagraph_capture_size changed, from a uniformly stepped size 8 to a 2-level step size: step_size =16 for size >= 256, and step_size =8 for size <256. (Asked by @ProExpertProg ) [MISC] cudagraph_capture_sizes related improvements #26016 (comment)
new CLIs --cudagraph-capture-sizes and --max-cudagraph-capture-size, for corresponding settings in compilation_config respectively.
CLI --cuda-graph-sizes is deprecated and will be removed in 0.13.0 or 1.0.0, and is equivalent to --cudagraph-capture-sizes now.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: fhl <2410591650@qq.com>

mergify · 2025-10-01T11:21:09Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request refactors the configuration for CUDA graph capture sizes, improving clarity and consistency. Key changes include renaming max_capture_size to max_cudagraph_capture_size, centralizing this configuration within CompilationConfig by removing it from SchedulerConfig, and standardizing the storage of capture sizes to an ascending order. These modifications streamline the configuration process and enhance code maintainability. The implementation is solid and aligns with the stated objectives. I have reviewed the changes and found no issues of high or critical severity.

vllm/config/compilation.py

vllm/engine/arg_utils.py

vllm/model_executor/models/config.py

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

mergify · 2025-10-01T14:07:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

hmellor · 2025-10-23T12:43:31Z

vllm/config/compilation.py

    Warning: This flag is deprecated and will be removed in the next major or
-    minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode=PIECEWISE
-    instead.
+    minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode=FULL_AND


If this was already supposed to have been removed, can we just remove it?

hmellor · 2025-10-23T12:43:39Z

vllm/config/compilation.py

    performance benefits for smaller models.
    Warning: This flag is deprecated and will be removed in the next major or
-    minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode=
+    minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode=


If this was already supposed to have been removed, can we just remove it?

vllm/config/compilation.py

vllm/engine/arg_utils.py

vllm/config/vllm.py

tests/compile/test_config.py

mergify · 2025-10-23T13:23:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

vllm/v1/attention/backends/mla/flashattn_mla.py

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

hmellor

LGTM

My only remaining nit is why are we delaying the deletion of a deprecated field that should already have been deleted?

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

fhl2000 · 2025-10-23T17:29:25Z

My only remaining nit is why are we delaying the deletion of a deprecated field that should already have been deleted?

Okay, I have changed back the comments to v0.11.0. I think the deletion can be done in another PR, if you'd like to do this.

hmellor · 2025-10-24T10:36:22Z

LM eval test fails on main, rerunning the entrypoint test because I suspect it's flaky

hmellor · 2025-10-24T12:02:40Z

I can do the follow up removing v0.11.0 deprecations.

…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

### What this PR does / why we need it? vllm-project/vllm@c9461e0 Fix ```spec decode rejection sampler```, caused by vllm-project/vllm#26060 Fix some ```import```, caused by vllm-project/vllm#27374 Fix ```scheduler_config.send_delta_data```, caused by #3719 Fix ```init_with_cudagraph_sizes```, caused by vllm-project/vllm#26016 Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer, caused by vllm-project/vllm#27418 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@c9461e0 --------- Signed-off-by: Icey <1790571317@qq.com>

…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

…vLLM [#26016](vllm-project/vllm#26016) Ensures batch sizes for aclgraph are sorted ascending when aclgraph mode is enabled, improving consistency and compatibility with later logic that may depend on order. Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>

fhl2000 added 2 commits October 1, 2025 19:06

cudagraph_capture_size improvements

951d2c2

Signed-off-by: fhl <2410591650@qq.com>

small

29f231c

Signed-off-by: fhl <2410591650@qq.com>

fhl2000 requested review from LucasWilkinson, ProExpertProg, WoosukKwon, alexm-redhat, benchislett, comaniac, hmellor, houseroad, luccafong, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners October 1, 2025 11:20

mergify bot added speculative-decoding v1 labels Oct 1, 2025

mergify bot added the needs-rebase label Oct 1, 2025

Merge branch 'main' into simple_clean_up

41e18b7

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

gemini-code-assist bot reviewed Oct 1, 2025

View reviewed changes

mergify bot removed the needs-rebase label Oct 1, 2025

ProExpertProg reviewed Oct 1, 2025

View reviewed changes

vllm/config/compilation.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/model_executor/models/config.py Outdated Show resolved Hide resolved

address comments

0144dd5

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

mergify bot added the needs-rebase label Oct 1, 2025

hmellor reviewed Oct 23, 2025

View reviewed changes

mergify bot added the needs-rebase label Oct 23, 2025

Merge branch 'main' into simple_clean_up

94a3820

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

mergify bot removed the needs-rebase label Oct 23, 2025

Apply suggestions from code review

4663319

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

hmellor reviewed Oct 23, 2025

View reviewed changes

vllm/v1/attention/backends/mla/flashattn_mla.py Show resolved Hide resolved

Update vllm/v1/attention/backends/mla/flashattn_mla.py

2a6d1c8

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

hmellor approved these changes Oct 23, 2025

View reviewed changes

fhl2000 mentioned this pull request Oct 23, 2025

[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder #27427

Merged

5 tasks

address comments

8c64ddd

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>

Merge branch 'main' into simple_clean_up

4a967e6

vllm-bot merged commit 284cc92 into vllm-project:main Oct 24, 2025
63 of 65 checks passed

fhl2000 deleted the simple_clean_up branch October 24, 2025 15:06

wxsIcey mentioned this pull request Oct 25, 2025

Upgrade to 0.11.1 newest vllm commit vllm-project/vllm-ascend#3762

Merged

ProExpertProg linked an issue Nov 3, 2025 that may be closed by this pull request

[RFC][UX][torch.compile][CUDAGraph]: Overhaul CompilationConfig and improve CLI -O<n> #20283

Open

1 task

njhill mentioned this pull request Nov 12, 2025

[BugFix] Ensure EngineArgs.create_engine_config is idempotent #28515

Merged

yiz-liu mentioned this pull request Nov 17, 2025

fix: sorts aclgraph batch sizes in ascending order vllm-project/vllm-ascend#4230

Open

Uh oh!

[MISC] cudagraph_capture_sizes related improvements #26016

[MISC] cudagraph_capture_sizes related improvements #26016

Uh oh!

Conversation

fhl2000 commented Oct 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

mergify bot commented Oct 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 1, 2025

Uh oh!

hmellor Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

fhl2000 commented Oct 23, 2025

Uh oh!

hmellor commented Oct 24, 2025

Uh oh!

hmellor commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[MISC] `cudagraph_capture_sizes` related improvements #26016

[MISC] `cudagraph_capture_sizes` related improvements #26016

fhl2000 commented Oct 1, 2025 •

edited by github-actions bot

Loading