-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 #25829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 #25829
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses an issue where the batch_size_capture_list for CUDA graphs was generated with misleading values when the maximum graph size was small. The fix is clear and effective, filtering the default batch sizes [1, 2, 4] against the specified maximum. The introduction of the max_graph_size variable improves readability. The change is well-contained and the logic is sound. I have no further suggestions.
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
ec08f20 to
d4e0a16
Compare
Co-authored-by: Luka Govedi<C4><8D> <ProExpertProg@users.noreply.github.com> Signed-off-by: billishyahao <bill.he@amd.com>
Signed-off-by: billishyahao <bill.he@amd.com>
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the work!
|
Actually I would expect |
|
Yeah I think that would be fine too feel free to ask for it in #26016 |
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
… 4 (#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
… 4 (vllm-project#25829) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
Purpose
Previously, the logic for generating
batch_size_capture_listalways included [1, 2, 4] by default. Refer tovllm/vllm/config/__init__.py
Lines 619 to 622 in f4e4088
This was misleading when cuda_graph_sizes < 4 because the list contained batch sizes that exceeded the actual maximum.
How to fix in this patch
Filtered out [1, 2, 4] values that are larger than cuda_graph_sizes[0].
The list now correctly reflects the allowed batch sizes, even for small values.
Examples:
Test Result
Before fix:
After fix:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.