Skip to content

Commit 3ccf486

Browse files
author
Varun Sundar Rabindranath
committed
assert num_tokens_after_padding bounds
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
1 parent 3edcca7 commit 3ccf486

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3401,6 +3401,8 @@ def _dummy_run(
34013401
with self.maybe_dummy_run_with_lora(
34023402
self.lora_config, num_scheduled_tokens, remove_lora
34033403
):
3404+
# Make sure padding doesn't exceed max_num_tokens
3405+
assert num_tokens_after_padding <= self.max_num_tokens
34043406
model_kwargs = self._init_model_kwargs(num_tokens_after_padding)
34053407
if self.supports_mm_inputs and not self.model_config.is_encoder_decoder:
34063408
input_ids = None

0 commit comments

Comments
 (0)