You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/models/vlm.rst
+13Lines changed: 13 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,9 @@ Using VLMs
5
5
6
6
vLLM provides experimental support for Vision Language Models (VLMs). This document shows you how to run and serve these models using vLLM.
7
7
8
+
.. important::
9
+
We are actively iterating on VLM support. Expect breaking changes to VLM usage and development in upcoming releases without prior deprecation.
10
+
8
11
Engine Arguments
9
12
----------------
10
13
@@ -39,6 +42,10 @@ To initialize a VLM, the aforementioned arguments must be passed to the ``LLM``
39
42
image_feature_size=576,
40
43
)
41
44
45
+
.. important::
46
+
We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration.
47
+
48
+
42
49
To pass an image to the model, note the following in :class:`vllm.inputs.PromptStrictInputs`:
43
50
44
51
* ``prompt``: The prompt should have a number of ``<image>`` tokens equal to ``image_feature_size``.
@@ -63,6 +70,9 @@ To pass an image to the model, note the following in :class:`vllm.inputs.PromptS
63
70
64
71
A code example can be found in `examples/llava_example.py <https://github.com/vllm-project/vllm/blob/main/examples/llava_example.py>`_.
65
72
73
+
.. important::
74
+
We will remove the need to format image tokens in a future release. Afterwards, the input text will follow the same format as that for the original HuggingFace model.
75
+
66
76
Online OpenAI Vision API Compatible Inference
67
77
----------------------------------------------
68
78
@@ -89,6 +99,9 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with
89
99
--image-feature-size 576 \
90
100
--chat-template template_llava.jinja
91
101
102
+
.. important::
103
+
We will remove most of the vision-specific arguments in a future release as they can be inferred from the HuggingFace configuration.
104
+
92
105
To consume the server, you can use the OpenAI client like in the example below:
0 commit comments