You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/design/io_processor_plugins.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,7 +79,7 @@ The `post_process*` methods take `PoolingRequestOutput` objects as input and gen
79
79
The `validate_or_generate_params` method is used for validating with the plugin any `SamplingParameters`/`PoolingParameters` received with the user request, or to generate new ones if none are specified. The function always returns the validated/generated parameters.
80
80
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/openai/serving_pooling.py).
81
81
82
-
An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/prithvi_geospatial_mae.py](../../examples/online_serving/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/prithvi_geospatial_mae_io_processor.py)) inference examples.
82
+
An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/pooling/prithvi_geospatial_mae.py](../../examples/online_serving/pooling/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py)) inference examples.
\* The `LLM.score(...)` API falls back to `embed` task if the model does not support `score` task.
56
58
@@ -144,7 +146,6 @@ A code example can be found here: [examples/offline_inference/basic/score.py](..
144
146
### `LLM.reward`
145
147
146
148
The [reward][vllm.LLM.reward] method is available to all reward models in vLLM.
147
-
It returns the extracted hidden states directly.
148
149
149
150
```python
150
151
from vllm importLLM
@@ -161,15 +162,17 @@ A code example can be found here: [examples/offline_inference/basic/reward.py](.
161
162
### `LLM.encode`
162
163
163
164
The [encode][vllm.LLM.encode] method is available to all pooling models in vLLM.
164
-
It returns the extracted hidden states directly.
165
165
166
166
!!! note
167
167
Please use one of the more specific methods or set the task directly when using `LLM.encode`:
168
168
169
169
- For embeddings, use `LLM.embed(...)` or `pooling_task="embed"`.
170
170
- For classification logits, use `LLM.classify(...)` or `pooling_task="classify"`.
171
-
- For rewards, use `LLM.reward(...)` or `pooling_task="reward"`.
172
171
- For similarity scores, use `LLM.score(...)`.
172
+
- For rewards, use `LLM.reward(...)` or `pooling_task="token_classify"`.
173
+
- For token classification, use `pooling_task="token_classify"`.
174
+
- For multi-vector retrieval, use `pooling_task="token_embed"`
175
+
- For IO Processor Plugins , use `pooling_task="plugin"`
173
176
174
177
```python
175
178
from vllm importLLM
@@ -185,10 +188,47 @@ print(f"Data: {data!r}")
185
188
186
189
Our [OpenAI-Compatible Server](../serving/openai_compatible_server.md) provides endpoints that correspond to the offline APIs:
187
190
188
-
-[Pooling API](../serving/openai_compatible_server.md#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
189
191
-[Embeddings API](../serving/openai_compatible_server.md#embeddings-api) is similar to `LLM.embed`, accepting both text and [multi-modal inputs](../features/multimodal_inputs.md) for embedding models.
190
192
-[Classification API](../serving/openai_compatible_server.md#classification-api) is similar to `LLM.classify` and is applicable to sequence classification models.
191
193
-[Score API](../serving/openai_compatible_server.md#score-api) is similar to `LLM.score` for cross-encoder models.
194
+
-[Pooling API](../serving/openai_compatible_server.md#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
195
+
196
+
!!! note
197
+
Please use one of the more specific methods or set the task directly when using [Pooling API](../serving/openai_compatible_server.md#pooling-api) api.:
198
+
199
+
- For embeddings, use [Embeddings API](../serving/openai_compatible_server.md#embeddings-api) or `"task":"embed"`.
200
+
- For classification logits, use [Classification API](../serving/openai_compatible_server.md#classification-api) or `task":"classify"`.
201
+
- For similarity scores, use [Score API](../serving/openai_compatible_server.md#score-api).
202
+
- For rewards, `task":"token_classify"`.
203
+
- For token classification, use `task":"token_classify"`.
204
+
- For multi-vector retrieval, use `task":"token_embed"`
205
+
- For IO Processor Plugins , use `task":"plugin"`
206
+
207
+
```python
208
+
# start a supported embeddings model server with `vllm serve`, e.g.
An OpenAI client example can be found here: [examples/online_serving/pooling/openai_embedding_matryoshka_fy.py](../../examples/online_serving/pooling/openai_embedding_matryoshka_fy.py)
308
+
309
+
## Deprecated Features
310
+
311
+
### Encode task
312
+
313
+
We have split the `encode` task into two more specific token wise tasks: `token_embed` and `token_classify`:
314
+
315
+
-`token_embed` is the same as embed, using normalize as activation.
316
+
-`token_classify` is the same as classify, default using softmax as activation.
317
+
318
+
### Remove softmax from PoolingParams
319
+
320
+
We are going to remove `softmax` and `activation` from `PoolingParams`. Instead, you should set `use_activation`, since we actually allow `classify` and `token_classify` to use any activation function.
Full example: [examples/online_serving/openai_cross_encoder_score_for_multimodal.py](../../examples/online_serving/openai_cross_encoder_score_for_multimodal.py)
822
+
Full example: [examples/online_serving/pooling/openai_cross_encoder_score_for_multimodal.py](../../examples/online_serving/pooling/openai_cross_encoder_score_for_multimodal.py)
0 commit comments