You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/prompt_embeds.md
-3Lines changed: 0 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,6 @@ This page teaches you how to pass prompt embedding inputs to vLLM.
6
6
7
7
The traditional flow of text data for a Large Language Model goes from text to token ids (via a tokenizer) then from token ids to prompt embeddings. For a traditional decoder-only model (such as meta-llama/Llama-3.1-8B-Instruct), this step of converting token ids to prompt embeddings happens via a look-up from a learned embedding matrix, but the model is not limited to processing only the embeddings corresponding to its token vocabulary.
8
8
9
-
!!! note
10
-
Prompt embeddings are currently only supported in the v0 engine.
11
-
12
9
## Offline Inference
13
10
14
11
To input multi-modal data, follow this schema in [vllm.inputs.EmbedsPrompt][]:
0 commit comments