@@ -45,14 +45,14 @@ Each pooling model in vLLM supports one or more of these tasks according to
4545[ Pooler.get_supported_tasks] [ vllm.model_executor.layers.pooler.Pooler.get_supported_tasks ] ,
4646enabling the corresponding APIs:
4747
48- |  Task       |  APIs               | 
49- | ------------| --------------------| 
50- |  ` encode `    |  ` encode `            | 
51- |  ` embed `     |  ` embed ` , ` score ` \*  | 
52- |  ` classify `  |  ` classify `          | 
53- |  ` score `     |  ` score `             | 
48+ |  Task       |  APIs                                  | 
49+ | ------------| -------------------------------------- | 
50+ |  ` encode `    |  ` LLM.reward(...) `                      | 
51+ |  ` embed `     |  ` LLM. embed(...) ` , ` LLM. score(...) ` \*  | 
52+ |  ` classify `  |  ` LLM. classify(...) `                    | 
53+ |  ` score `     |  ` LLM. score(...) `                       | 
5454
55- \*  The ` score `  API falls back to ` embed `  task if the model does not support ` score `  task.
55+ \*  The ` LLM. score(...) `  API falls back to ` embed `  task if the model does not support ` score `  task.
5656
5757### Pooler Configuration  
5858
@@ -66,11 +66,11 @@ you can override some of its attributes via the `--override-pooler-config` optio
6666If the model has been converted via ` --convert `  (see above),
6767the pooler assigned to each task has the following attributes by default:
6868
69- |  Task       |  Pooling Type    |  Normalization |  Softmax | 
70- | ------------| ---------------- | ---------------| ---------| 
71- |  ` encode `    |  ` ALL `            |  ❌            |  ❌       | 
72- |  ` embed `     |  ` LAST `           |  ✅︎            |  ❌      | 
73- |  ` classify `  |  ` LAST `           |  ❌            |  ✅︎      | 
69+ |  Task       |  Pooling Type |  Normalization |  Softmax | 
70+ | ------------| --------------| ---------------| ---------| 
71+ |  ` reward `    |  ` ALL `         |  ❌            |  ❌     | 
72+ |  ` embed `     |  ` LAST `        |  ✅︎            |  ❌      | 
73+ |  ` classify `  |  ` LAST `        |  ❌            |  ✅︎      | 
7474
7575When loading [ Sentence Transformers] ( https://huggingface.co/sentence-transformers )  models,
7676its Sentence Transformers configuration file (` modules.json ` ) takes priority over the model's defaults.
@@ -83,21 +83,6 @@ which takes priority over both the model's and Sentence Transformers's defaults.
8383The [ LLM] [ vllm.LLM ]  class provides various methods for offline inference.
8484See [ configuration] [ configuration ]  for a list of options when initializing the model.
8585
86- ### ` LLM.encode `  
87- 
88- The [ encode] [ vllm.LLM.encode ]  method is available to all pooling models in vLLM.
89- It returns the extracted hidden states directly, which is useful for reward models.
90- 
91- ``` python 
92- from  vllm import  LLM 
93- 
94- llm =  LLM(model = " Qwen/Qwen2.5-Math-RM-72B" runner = " pooling" 
95- (output,) =  llm.encode(" Hello, my name is" 
96- 
97- data =  output.outputs.data
98- print (f " Data:  { data!r } " )
99- ``` 
100- 
10186### ` LLM.embed `  
10287
10388The [ embed] [ vllm.LLM.embed ]  method outputs an embedding vector for each prompt.
@@ -106,7 +91,7 @@ It is primarily designed for embedding models.
10691``` python 
10792from  vllm import  LLM 
10893
109- llm =  LLM(model = " intfloat/e5-mistral-7b-instruct " runner = " pooling" 
94+ llm =  LLM(model = " intfloat/e5-small " runner = " pooling" 
11095(output,) =  llm.embed(" Hello, my name is" 
11196
11297embeds =  output.outputs.embedding
@@ -154,6 +139,46 @@ print(f"Score: {score}")
154139
155140A code example can be found here: < gh-file:examples/offline_inference/basic/score.py > 
156141
142+ ### ` LLM.reward `  
143+ 
144+ The [ reward] [ vllm.LLM.reward ]  method is available to all reward models in vLLM.
145+ It returns the extracted hidden states directly.
146+ 
147+ ``` python 
148+ from  vllm import  LLM 
149+ 
150+ llm =  LLM(model = " internlm/internlm2-1_8b-reward" runner = " pooling" trust_remote_code = True )
151+ (output,) =  llm.reward(" Hello, my name is" 
152+ 
153+ data =  output.outputs.data
154+ print (f " Data:  { data!r } " )
155+ ``` 
156+ 
157+ A code example can be found here: < gh-file:examples/offline_inference/basic/reward.py > 
158+ 
159+ ### ` LLM.encode `  
160+ 
161+ The [ encode] [ vllm.LLM.encode ]  method is available to all pooling models in vLLM.
162+ It returns the extracted hidden states directly.
163+ 
164+ !!! note
165+     Please use one of the more specific methods or set the task directly when using ` LLM.encode ` :
166+ 
167+     - For embeddings, use `LLM.embed(...)` or `pooling_task="embed"`. 
168+     - For classification logits, use `LLM.classify(...)` or `pooling_task="classify"`. 
169+     - For rewards, use `LLM.reward(...)` or `pooling_task="reward"`. 
170+     - For similarity scores, use `LLM.score(...)`.   
171+ 
172+ ``` python 
173+ from  vllm import  LLM 
174+ 
175+ llm =  LLM(model = " intfloat/e5-small" runner = " pooling" 
176+ (output,) =  llm.encode(" Hello, my name is" pooling_task = " embed" 
177+ 
178+ data =  output.outputs.data
179+ print (f " Data:  { data!r } " )
180+ ``` 
181+ 
157182## Online Serving  
158183
159184Our [ OpenAI-Compatible Server] ( ../serving/openai_compatible_server.md )  provides endpoints that correspond to the offline APIs:
0 commit comments