You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vllm/multimodal/profiling.py
-29Lines changed: 0 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -234,19 +234,6 @@ def get_decoder_dummy_data(
234
234
prompt_token_ids=mm_inputs["prompt_token_ids"]
235
235
total_len=len(prompt_token_ids)
236
236
237
-
# V0 does not support chunked prefill.
238
-
iftotal_len>seq_lenandnotenvs.VLLM_USE_V1:
239
-
# `max_num_batched_tokens` is defined by `SchedulerConfig`
240
-
logger.warning_once(
241
-
"The sequence length used for profiling (max_num_batched_tokens / max_num_seqs = %d) "# noqa: E501
242
-
"is too short to hold the multi-modal embeddings in the worst case (%d tokens in total, out of which %s are reserved for multi-modal embeddings). "# noqa: E501
243
-
"This may cause certain multi-modal inputs to fail during inference, even when the input text is short. "# noqa: E501
244
-
"To avoid this, you should increase `max_model_len`, reduce `max_num_seqs`, and/or reduce `mm_counts`.", # noqa: E501
0 commit comments