Skip to content

Commit a36468b

Browse files
committed
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder
Whisper does not work with full cudagraphs. That is being worked on in PR #25208. The failure can be reproduced reliably via `tests/models/multimodal/generation/test_whisper.py`, at least in my H100 development environment. The tests passed on the PR and I'm not sure why. Regardless, this seems like the right change to make until #25208 sorts out exactly what changes are needed. Signed-off-by: Russell Bryant <rbryant@redhat.com>
1 parent 3468f17 commit a36468b

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

vllm/config/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -364,9 +364,11 @@ def __post_init__(self):
364364
self.compilation_config.cudagraph_mode = \
365365
CUDAGraphMode.FULL_AND_PIECEWISE
366366

367-
# pooling model does not support full cudagraphs
367+
# pooling models and encoder-decoder models
368+
# do not support full cudagraphs
368369
if self.model_config is not None and \
369-
self.model_config.pooler_config is not None:
370+
(self.model_config.pooler_config is not None
371+
or self.model_config.is_encoder_decoder is not None):
370372
self.compilation_config.cudagraph_mode = \
371373
CUDAGraphMode.PIECEWISE
372374
else:

0 commit comments

Comments
 (0)