Skip to content

Commit f23871e

Browse files
[Doc] Add notice about breaking changes to VLMs (#5818)
1 parent e9de9dd commit f23871e

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

docs/source/models/vlm.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ Using VLMs
55

66
vLLM provides experimental support for Vision Language Models (VLMs). This document shows you how to run and serve these models using vLLM.
77

8+
.. important::
9+
We are actively iterating on VLM support. Expect breaking changes to VLM usage and development in upcoming releases without prior deprecation.
10+
811
Engine Arguments
912
----------------
1013

@@ -39,6 +42,10 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM``
3942
image_feature_size=576,
4043
)
4144
45+
.. important::
46+
We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration.
47+
48+
4249
To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:
4350

4451
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
@@ -63,6 +70,9 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptS
6370
6471
A code example can be found in `examples/llava_example.py <https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py>`_.
6572

73+
.. important::
74+
We will remove the need to format image tokens in a future release. Afterwards, the input text will follow the same format as that for the original HuggingFace model.
75+
6676
Online OpenAI Vision API Compatible Inference
6777
----------------------------------------------
6878

@@ -89,6 +99,9 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with
8999
--image-feature-size 576 \
90100
--chat-template template_llava.jinja
91101
102+
.. important::
103+
We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration.
104+
92105
To consume the server, you can use the OpenAI client like in the example below:
93106

94107
.. code-block:: python

0 commit comments

Comments
 (0)