@@ -45,14 +45,14 @@ Each pooling model in vLLM supports one or more of these tasks according to
4545[ Pooler.get_supported_tasks] [ vllm.model_executor.layers.pooler.Pooler.get_supported_tasks ] ,
4646enabling the corresponding APIs:
4747
48- | Task | APIs |
49- | ------------| --------------------|
50- | ` encode ` | ` encode ` |
51- | ` embed ` | ` embed ` , ` score ` \* |
52- | ` classify ` | ` classify ` |
53- | ` score ` | ` score ` |
48+ | Task | APIs |
49+ | ------------| -------------------------------------- |
50+ | ` encode ` | ` LLM.reward(...) ` |
51+ | ` embed ` | ` LLM. embed(...) ` , ` LLM. score(...) ` \* |
52+ | ` classify ` | ` LLM. classify(...) ` |
53+ | ` score ` | ` LLM. score(...) ` |
5454
55- \* The ` score ` API falls back to ` embed ` task if the model does not support ` score ` task.
55+ \* The ` LLM. score(...) ` API falls back to ` embed ` task if the model does not support ` score ` task.
5656
5757### Pooler Configuration
5858
@@ -66,11 +66,11 @@ you can override some of its attributes via the `--override-pooler-config` optio
6666If the model has been converted via ` --convert ` (see above),
6767the pooler assigned to each task has the following attributes by default:
6868
69- | Task | Pooling Type | Normalization | Softmax |
70- | ------------| ---------------- | ---------------| ---------|
71- | ` encode ` | ` ALL ` | ❌ | ❌ |
72- | ` embed ` | ` LAST ` | ✅︎ | ❌ |
73- | ` classify ` | ` LAST ` | ❌ | ✅︎ |
69+ | Task | Pooling Type | Normalization | Softmax |
70+ | ------------| --------------| ---------------| ---------|
71+ | ` reward ` | ` ALL ` | ❌ | ❌ |
72+ | ` embed ` | ` LAST ` | ✅︎ | ❌ |
73+ | ` classify ` | ` LAST ` | ❌ | ✅︎ |
7474
7575When loading [ Sentence Transformers] ( https://huggingface.co/sentence-transformers ) models,
7676its Sentence Transformers configuration file (` modules.json ` ) takes priority over the model's defaults.
@@ -83,21 +83,6 @@ which takes priority over both the model's and Sentence Transformers's defaults.
8383The [ LLM] [ vllm.LLM ] class provides various methods for offline inference.
8484See [ configuration] [ configuration ] for a list of options when initializing the model.
8585
86- ### ` LLM.encode `
87-
88- The [ encode] [ vllm.LLM.encode ] method is available to all pooling models in vLLM.
89- It returns the extracted hidden states directly, which is useful for reward models.
90-
91- ``` python
92- from vllm import LLM
93-
94- llm = LLM(model = " Qwen/Qwen2.5-Math-RM-72B" , runner = " pooling" )
95- (output,) = llm.encode(" Hello, my name is" )
96-
97- data = output.outputs.data
98- print (f " Data: { data!r } " )
99- ```
100-
10186### ` LLM.embed `
10287
10388The [ embed] [ vllm.LLM.embed ] method outputs an embedding vector for each prompt.
@@ -106,7 +91,7 @@ It is primarily designed for embedding models.
10691``` python
10792from vllm import LLM
10893
109- llm = LLM(model = " intfloat/e5-mistral-7b-instruct " , runner = " pooling" )
94+ llm = LLM(model = " intfloat/e5-small " , runner = " pooling" )
11095(output,) = llm.embed(" Hello, my name is" )
11196
11297embeds = output.outputs.embedding
@@ -154,6 +139,46 @@ print(f"Score: {score}")
154139
155140A code example can be found here: < gh-file:examples/offline_inference/basic/score.py >
156141
142+ ### ` LLM.reward `
143+
144+ The [ reward] [ vllm.LLM.reward ] method is available to all reward models in vLLM.
145+ It returns the extracted hidden states directly.
146+
147+ ``` python
148+ from vllm import LLM
149+
150+ llm = LLM(model = " internlm/internlm2-1_8b-reward" , runner = " pooling" , trust_remote_code = True )
151+ (output,) = llm.reward(" Hello, my name is" )
152+
153+ data = output.outputs.data
154+ print (f " Data: { data!r } " )
155+ ```
156+
157+ A code example can be found here: < gh-file:examples/offline_inference/basic/reward.py >
158+
159+ ### ` LLM.encode `
160+
161+ The [ encode] [ vllm.LLM.encode ] method is available to all pooling models in vLLM.
162+ It returns the extracted hidden states directly.
163+
164+ !!! note
165+ Please use one of the more specific methods or set the task directly when using ` LLM.encode ` :
166+
167+ - For embeddings, use `LLM.embed(...)` or `pooling_task="embed"`.
168+ - For classification logits, use `LLM.classify(...)` or `pooling_task="classify"`.
169+ - For rewards, use `LLM.reward(...)` or `pooling_task="reward"`.
170+ - For similarity scores, use `LLM.score(...)`.
171+
172+ ``` python
173+ from vllm import LLM
174+
175+ llm = LLM(model = " intfloat/e5-small" , runner = " pooling" )
176+ (output,) = llm.encode(" Hello, my name is" , pooling_task = " embed" )
177+
178+ data = output.outputs.data
179+ print (f " Data: { data!r } " )
180+ ```
181+
157182## Online Serving
158183
159184Our [ OpenAI-Compatible Server] ( ../serving/openai_compatible_server.md ) provides endpoints that correspond to the offline APIs:
0 commit comments