Skip to content

Conversation

@fhl2000
Copy link
Contributor

@fhl2000 fhl2000 commented Oct 1, 2025

Purpose

Part of the CompilationConfig improvements (#20283), asked by @ProExpertProg and @WoosukKwon:

  • rename max_capture_size -> max_cudagraph_capture_size: used to specify a single size, fills in the rest
  • cudagraph_capture_sizes -> stays the same, used only to specify a full list
  • SchedulerConfig.cuda_graph_sizes: -> remove
  • Also stop reversing the sizes, always store them in ascending order (already on the issue), and always sort when using so we don't rely on them being sorted

Updated:

  • The assignment of cudagraph_capture_size changed, from a uniformly stepped size 8 to a 2-level step size: step_size =16 for size >= 256, and step_size =8 for size <256. (Asked by @ProExpertProg ) [MISC] cudagraph_capture_sizes related improvements #26016 (comment)
  • new CLIs --cudagraph-capture-sizes and --max-cudagraph-capture-size, for corresponding settings in compilation_config respectively.
  • CLI --cuda-graph-sizes is deprecated and will be removed in 0.13.0 or 1.0.0, and is equivalent to --cudagraph-capture-sizes now.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl <2410591650@qq.com>
@mergify
Copy link

mergify bot commented Oct 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 1, 2025
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the configuration for CUDA graph capture sizes, improving clarity and consistency. Key changes include renaming max_capture_size to max_cudagraph_capture_size, centralizing this configuration within CompilationConfig by removing it from SchedulerConfig, and standardizing the storage of capture sizes to an ascending order. These modifications streamline the configuration process and enhance code maintainability. The implementation is solid and aligns with the stated objectives. I have reviewed the changes and found no issues of high or critical severity.

@mergify mergify bot removed the needs-rebase label Oct 1, 2025
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
@mergify
Copy link

mergify bot commented Oct 1, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 1, 2025
Warning: This flag is deprecated and will be removed in the next major or
minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode=PIECEWISE
instead.
minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode=FULL_AND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was already supposed to have been removed, can we just remove it?

performance benefits for smaller models.
Warning: This flag is deprecated and will be removed in the next major or
minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode=
minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was already supposed to have been removed, can we just remove it?

@mergify
Copy link

mergify bot commented Oct 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @fhl2000.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 23, 2025
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
@mergify mergify bot removed the needs-rebase label Oct 23, 2025
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

My only remaining nit is why are we delaying the deletion of a deprecated field that should already have been deleted?

Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
@fhl2000
Copy link
Contributor Author

fhl2000 commented Oct 23, 2025

My only remaining nit is why are we delaying the deletion of a deprecated field that should already have been deleted?

Okay, I have changed back the comments to v0.11.0. I think the deletion can be done in another PR, if you'd like to do this.

@hmellor
Copy link
Member

hmellor commented Oct 24, 2025

LM eval test fails on main, rerunning the entrypoint test because I suspect it's flaky

@hmellor
Copy link
Member

hmellor commented Oct 24, 2025

I can do the follow up removing v0.11.0 deprecations.

@vllm-bot vllm-bot merged commit 284cc92 into vllm-project:main Oct 24, 2025
63 of 65 checks passed
atalhens pushed a commit to atalhens/vllm that referenced this pull request Oct 24, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
@fhl2000 fhl2000 deleted the simple_clean_up branch October 24, 2025 15:06
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
rohin-garg pushed a commit to rohin-garg/vllm that referenced this pull request Oct 25, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Oct 28, 2025
### What this PR does / why we need it?

vllm-project/vllm@c9461e0

Fix ```spec decode rejection sampler```, caused by
vllm-project/vllm#26060
Fix some ```import```, caused by
vllm-project/vllm#27374
Fix ```scheduler_config.send_delta_data```, caused by
#3719
Fix ```init_with_cudagraph_sizes```, caused by
vllm-project/vllm#26016
Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer,
caused by vllm-project/vllm#27418

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
CI passed with new added/existing test.


- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@c9461e0

---------

Signed-off-by: Icey <1790571317@qq.com>
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…6016)

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
yiz-liu added a commit to yiz-liu/vllm-ascend that referenced this pull request Nov 17, 2025
…vLLM [#26016](vllm-project/vllm#26016)

Ensures batch sizes for aclgraph are sorted ascending when aclgraph
mode is enabled, improving consistency and compatibility with later logic that may depend on order.

Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC][UX][torch.compile][CUDAGraph]: Overhaul CompilationConfig and improve CLI -O<n>

5 participants