-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
Reduce the Cuda Graph memory footprint when running with DBO #25779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the Cuda Graph memory footprint when running with DBO #25779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an effective optimization to reduce the memory footprint of CUDA graphs when using Dynamic Batching and Overlapping (DBO). The main change avoids capturing both microbatched and non-microbatched graphs for the same shape, instead capturing only the appropriate graph type. This directly contributes to lower memory usage. A corresponding change correctly handles runtime scenarios where microbatching is aborted, by falling back to eager execution to prevent graph mismatches. The implementation is clean, logical, and appears to be correct. I have not identified any issues of high or critical severity.
…dbo-cudagraph-size
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…oject#25779) Signed-off-by: Sage Moore <sage@neuralmagic.com>
Purpose
This PR reduces the memory footprint of cudagraphs when running with DBO by only constructing non-dbo cudagraphs for shapes that DBO doesn't support.
Test Plan
lm_eval
deepseek-ai/DeepSeek-V2-LiteTest Result
Sizes before
Sizes after
This size can be further reduced by running with full cudagraphs only
Which is around what non-DBO is with both styles of cudagraphs turned on