Skip to content

Commit 95276b7

Browse files
DarkLight1337jinzhen-lin
authored andcommitted
[Doc] Link to RFC for pooling optimizations (vllm-project#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
1 parent b029541 commit 95276b7

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/models/pooling_models.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
77
before returning them.
88

99
!!! note
10-
We currently support pooling models primarily as a matter of convenience.
11-
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
12-
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
10+
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
11+
12+
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
1313

1414
## Configuration
1515

0 commit comments

Comments
 (0)