Skip to content

Commit a248025

Browse files
[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
1 parent 7234fe2 commit a248025

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/models/pooling_models.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
77
before returning them.
88

99
!!! note
10-
We currently support pooling models primarily as a matter of convenience.
11-
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
12-
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
10+
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
11+
12+
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
1313

1414
## Configuration
1515

0 commit comments

Comments
 (0)