-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[MISC] cudagraph_capture_sizes related improvements
#26016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: fhl <2410591650@qq.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the configuration for CUDA graph capture sizes, improving clarity and consistency. Key changes include renaming max_capture_size to max_cudagraph_capture_size, centralizing this configuration within CompilationConfig by removing it from SchedulerConfig, and standardizing the storage of capture sizes to an ascending order. These modifications streamline the configuration process and enhance code maintainability. The implementation is solid and aligns with the stated objectives. I have reviewed the changes and found no issues of high or critical severity.
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
|
This pull request has merge conflicts that must be resolved before it can be |
vllm/config/compilation.py
Outdated
| Warning: This flag is deprecated and will be removed in the next major or | ||
| minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode=PIECEWISE | ||
| instead. | ||
| minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode=FULL_AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was already supposed to have been removed, can we just remove it?
vllm/config/compilation.py
Outdated
| performance benefits for smaller models. | ||
| Warning: This flag is deprecated and will be removed in the next major or | ||
| minor release, i.e. v0.11.0 or v1.0.0. Please use cudagraph_mode= | ||
| minor release, i.e. v0.12.0 or v1.0.0. Please use cudagraph_mode= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this was already supposed to have been removed, can we just remove it?
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
hmellor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
My only remaining nit is why are we delaying the deletion of a deprecated field that should already have been deleted?
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Okay, I have changed back the comments to v0.11.0. I think the deletion can be done in another PR, if you'd like to do this. |
|
LM eval test fails on main, rerunning the entrypoint test because I suspect it's flaky |
|
I can do the follow up removing v0.11.0 deprecations. |
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
### What this PR does / why we need it? vllm-project/vllm@c9461e0 Fix ```spec decode rejection sampler```, caused by vllm-project/vllm#26060 Fix some ```import```, caused by vllm-project/vllm#27374 Fix ```scheduler_config.send_delta_data```, caused by #3719 Fix ```init_with_cudagraph_sizes```, caused by vllm-project/vllm#26016 Fix ```vl model```of replacing PatchEmbed's conv3d to linear layer, caused by vllm-project/vllm#27418 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with new added/existing test. - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@c9461e0 --------- Signed-off-by: Icey <1790571317@qq.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…6016) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
…vLLM [#26016](vllm-project/vllm#26016) Ensures batch sizes for aclgraph are sorted ascending when aclgraph mode is enabled, improving consistency and compatibility with later logic that may depend on order. Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com>
Purpose
Part of the CompilationConfig improvements (#20283), asked by @ProExpertProg and @WoosukKwon:
Updated:
cudagraph_capture_sizesrelated improvements #26016 (comment)--cudagraph-capture-sizesand--max-cudagraph-capture-size, for corresponding settings in compilation_config respectively.--cuda-graph-sizesis deprecated and will be removed in 0.13.0 or 1.0.0, and is equivalent to--cudagraph-capture-sizesnow.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.