You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/models/pooling_models.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
7
7
before returning them.
8
8
9
9
!!! note
10
-
We currently support pooling models primarily as a matter of convenience.
11
-
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
12
-
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
10
+
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
11
+
12
+
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
0 commit comments