diff --git a/docs/source/models/supported_models.md b/docs/source/models/supported_models.md index 3ba34c77205e5..acbe27a22a679 100644 --- a/docs/source/models/supported_models.md +++ b/docs/source/models/supported_models.md @@ -322,7 +322,7 @@ See [this page](#generative-models) for more information on how to use generativ - ✅︎ - ✅︎ * - `Qwen2ForCausalLM` - - Qwen2 + - QwQ, Qwen2 - `Qwen/QwQ-32B-Preview`, `Qwen/Qwen2-7B-Instruct`, `Qwen/Qwen2-7B`, etc. - ✅︎ - ✅︎ @@ -436,7 +436,7 @@ loaded. See [relevant issue on HF Transformers](https://github.com/huggingface/t ``` If your model is not in the above list, we will try to automatically convert the model using -{func}`vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings +{func}`~vllm.model_executor.models.adapters.as_embedding_model`. By default, the embeddings of the whole prompt are extracted from the normalized hidden state corresponding to the last token. #### Reward Modeling (`--task reward`) @@ -468,7 +468,7 @@ of the whole prompt are extracted from the normalized hidden state corresponding ``` If your model is not in the above list, we will try to automatically convert the model using -{func}`vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly. +{func}`~vllm.model_executor.models.adapters.as_reward_model`. By default, we return the hidden states of each token directly. ```{important} For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly, @@ -499,7 +499,7 @@ e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "r ``` If your model is not in the above list, we will try to automatically convert the model using -{func}`vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token. +{func}`~vllm.model_executor.models.adapters.as_classification_model`. By default, the class probabilities are extracted from the softmaxed hidden state corresponding to the last token. #### Sentence Pair Scoring (`--task score`) @@ -550,6 +550,28 @@ On the other hand, modalities separated by `/` are mutually exclusive. See [this page](#multimodal-inputs) on how to pass multi-modal inputs to the model. +````{important} +To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference) +or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt: + +Offline inference: +```python +llm = LLM( + model="Qwen/Qwen2-VL-7B-Instruct", + limit_mm_per_prompt={"image": 4}, +) +``` + +Online inference: +```bash +vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4 +``` +```` + +```{note} +vLLM currently only supports adding LoRA to the language backbone of multimodal models. +``` + ### Generative Models See [this page](#generative-models) for more information on how to use generative models. @@ -689,14 +711,14 @@ See [this page](#generative-models) for more information on how to use generativ * - `Phi3VForCausalLM` - Phi-3-Vision, Phi-3.5-Vision - T + IE+ - - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct` etc. + - `microsoft/Phi-3-vision-128k-instruct`, `microsoft/Phi-3.5-vision-instruct`, etc. - - ✅︎ - ✅︎ * - `PixtralForConditionalGeneration` - Pixtral - T + I+ - - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` etc. + - `mistralai/Pixtral-12B-2409`, `mistral-community/pixtral-12b` (see note), etc. - - ✅︎ - ✅︎ @@ -715,7 +737,7 @@ See [this page](#generative-models) for more information on how to use generativ - ✅︎ - ✅︎ * - `Qwen2VLForConditionalGeneration` - - Qwen2-VL + - QVQ, Qwen2-VL - T + IE+ + VE+ - `Qwen/QVQ-72B-Preview`, `Qwen/Qwen2-VL-7B-Instruct`, `Qwen/Qwen2-VL-72B-Instruct`, etc. - ✅︎ @@ -733,26 +755,6 @@ See [this page](#generative-models) for more information on how to use generativ E Pre-computed embeddings can be inputted for this modality. + Multiple items can be inputted per text prompt for this modality. -````{important} -To enable multiple multi-modal items per text prompt, you have to set `limit_mm_per_prompt` (offline inference) -or `--limit-mm-per-prompt` (online inference). For example, to enable passing up to 4 images per text prompt: - -```python -llm = LLM( - model="Qwen/Qwen2-VL-7B-Instruct", - limit_mm_per_prompt={"image": 4}, -) -``` - -```bash -vllm serve Qwen/Qwen2-VL-7B-Instruct --limit-mm-per-prompt image=4 -``` -```` - -```{note} -vLLM currently only supports adding LoRA to the language backbone of multimodal models. -``` - ```{note} To use `TIGER-Lab/Mantis-8B-siglip-llama3`, you have pass `--hf_overrides '{"architectures": ["MantisForConditionalGeneration"]}'` when running vLLM. ``` @@ -762,6 +764,11 @@ The official `openbmb/MiniCPM-V-2` doesn't work yet, so we need to use a fork (` For more details, please see: ``` +```{note} +The chat template for Pixtral-HF is incorrect (see [discussion](https://huggingface.co/mistral-community/pixtral-12b/discussions/22)). +A corrected version is available at . +``` + ### Pooling Models See [this page](pooling-models) for more information on how to use pooling models. diff --git a/examples/template_pixtral_hf.jinja b/examples/template_pixtral_hf.jinja new file mode 100644 index 0000000000000..e94661cb39071 --- /dev/null +++ b/examples/template_pixtral_hf.jinja @@ -0,0 +1,38 @@ +{%- if messages[0]["role"] == "system" %} + {%- set system_message = messages[0]["content"] %} + {%- set loop_messages = messages[1:] %} +{%- else %} + {%- set loop_messages = messages %} +{%- endif %} + +{{- bos_token }} +{%- for message in loop_messages %} + {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %} + {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }} + {%- endif %} + {%- if message["role"] == "user" %} + {%- if loop.last and system_message is defined %} + {{- "[INST]" + system_message + "\n" }} + {%- else %} + {{- "[INST]" }} + {%- endif %} + {%- if message["content"] is not string %} + {%- for chunk in message["content"] %} + {%- if chunk["type"] == "text" %} + {{- chunk["text"] }} + {%- elif chunk["type"] == "image" %} + {{- "[IMG]" }} + {%- else %} + {{- raise_exception("Unrecognized content type!") }} + {%- endif %} + {%- endfor %} + {%- else %} + {{- message["content"] }} + {%- endif %} + {{- "[/INST]" }} + {%- elif message["role"] == "assistant" %} + {{- message["content"] + eos_token}} + {%- else %} + {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }} + {%- endif %} +{%- endfor %} diff --git a/tests/entrypoints/test_chat_utils.py b/tests/entrypoints/test_chat_utils.py index d63b963522e73..8f242df4a60e3 100644 --- a/tests/entrypoints/test_chat_utils.py +++ b/tests/entrypoints/test_chat_utils.py @@ -758,6 +758,7 @@ def test_resolve_content_format_hf_defined(model, expected_format): ("template_falcon.jinja", "string"), ("template_inkbot.jinja", "string"), ("template_llava.jinja", "string"), + ("template_pixtral_hf.jinja", "openai"), ("template_vlm2vec.jinja", "openai"), ("tool_chat_template_granite_20b_fc.jinja", "string"), ("tool_chat_template_hermes.jinja", "string"),