- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP #26574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue where enabling Decode Context Parallelism (DCP) was incompatible with full CUDA graph modes. The proposed fix forces the cudagraph_mode to PIECEWISE when DCP is active. While the intent is correct, the implementation is overly aggressive and will override a user's explicit choice to disable CUDA graphs entirely (cudagraph_mode=NONE). My review includes a critical comment to refine this logic, ensuring it only downgrades from FULL modes to PIECEWISE and warns the user, without affecting NONE mode.
| if self.parallel_config.decode_context_parallel_size > 1: | ||
| self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation unconditionally sets cudagraph_mode to PIECEWISE if decode context parallelism (DCP) is enabled. This is too aggressive as it will override a user's explicit choice to disable CUDA graphs (e.g., cudagraph_mode=NONE), which might be done for debugging purposes.
A better approach is to only downgrade the mode to PIECEWISE if a FULL CUDA graph mode was requested, as those are the ones incompatible with DCP. This change also adds a warning to inform the user about the automatic adjustment.
if self.parallel_config.decode_context_parallel_size > 1 and \
                    self.compilation_config.cudagraph_mode.has_full_cudagraphs():
                    logger.warning(
                        "Decode context parallel (DCP) is enabled, which is "
                        "incompatible with full CUDA graphs. Downgrading "
                        "cudagraph_mode from %s to PIECEWISE.",
                        self.compilation_config.cudagraph_mode.name)
                    self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISEThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code snippets will only execute when cudagraph_mode is not explicitly set by users.
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @LucasWilkinson @youzhedian it should be possible to make dcp compatible with full cudagraph.
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
#25444 change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE. However, DCP do not support full cuda graphs now (#26022 (comment)). This PR change default CUDAGraphMode to PIECEWISE when enable DCP.
cc @youzhedian @youkaichao @LucasWilkinson
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.