[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

noooop · 2025-09-23T23:53:03Z

TL;DR

pooling_task required for llm.encode
/pooling endpoint support all pooling tasks
softmax, activation -> use_activation

Improve all pooling task (0.11.1 cut)

[Model][0/N] Improve all pooling task | clean up #25817
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414
[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370
[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066
[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Following #25370

Split the encode task into two tasks: token_embed and token_classify
- token_embed is the same as embed, using normalize as activation
- token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn. )

Address: #27413 (comment)

/pooling endpoint support all pooling tasks.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

noooop · 2025-09-24T07:54:27Z

@DarkLight1337

We continue our previous discussion here

following #25370

Split the encode task into two tasks: token_embed and token_classify

token_embed is the same as embed, using normalize as activation
token_classify is the same as classify, default using softmax as activation

For online scenarios (/pooling):

Keep one api now (/pooling), but we need to adaptively select token_embed or token_classify using something like encode2pooling_task method somewhere
split the /pooling api into /pooling_token_embed and /pooling_token_classify. （I personally feel that /pooling_token_embed and /pooling_token_classify looks terrible, the online /pooling API is not suitable for major changes yet. We can collect usage scenarios for a while.）

DarkLight1337 · 2025-09-24T08:30:39Z

I think for online API, the user should be able to pass the task in the request.

mergify · 2025-10-08T14:41:56Z

Documentation preview: https://vllm--25524.org.readthedocs.build/en/25524/

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-28T10:29:47Z

/gemini review

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: wang.yuqi <noooop@126.com>

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-28T15:02:48Z

/gemini review

gemini-code-assist

Code Review

This pull request refactors the pooling API by splitting the encode task into more specific tasks (token_embed, token_classify) and renaming the activation parameter to use_activation for clarity. The documentation, examples, and tests are updated to reflect these changes. The changes are generally well-executed and improve the API's usability. However, I've identified a critical issue in the task inference logic for the new generic /pooling endpoint that could lead to unexpected failures. My review includes a suggestion to fix this logic.

vllm/entrypoints/openai/serving_pooling.py

Signed-off-by: wang.yuqi <noooop@126.com>

hmellor

Since activation was user facing, we should probably deprecate it when changing it to use_activation

noooop · 2025-10-29T13:06:08Z

Since activation was user facing, we should probably deprecate it when changing it to use_activation

Anyway, I’d like to merge this PR quickly before the release. It’s mainly documentation changes.

This feature has been a bit of a mess, and it’s been changing with every recent release. It should finally be stable now after this latest change.

DarkLight1337

I agree with @hmellor , we should still keep the old field with a deprecation

DarkLight1337 · 2025-10-29T13:08:25Z

Otherwise we would break back-compatibility, which is definitely not what a "doc PR" should do

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-30T10:23:49Z

cc @DarkLight1337

Let’s land this.

…g) api & Document. (vllm-project#25524) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

mergify bot added the documentation Improvements or additions to documentation label Sep 23, 2025

noooop mentioned this pull request Sep 24, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

Merged

5 tasks

noooop changed the title ~~[Frontend][Doc] Consolidate encode (pooling) api & Document.~~ [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 11, 2025

This was referenced Oct 11, 2025

[Model][0/N] Improve all pooling task | clean up #25817

Merged

[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414

Merged

noooop changed the title ~~[Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 14, 2025

noooop changed the title ~~[Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 16, 2025

noooop mentioned this pull request Oct 24, 2025

[Usage]: how to request a qwen2.5-VL-7B classify model served by vllm using openai SDK? #27413

Open

1 task

noooop force-pushed the update_pooling_docs branch from 18b0002 to b6b2e12 Compare October 28, 2025 06:13

noooop added 2 commits October 28, 2025 14:13

token_embed & token_classify

b6b2e12

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into update_pooling_docs

fb5fdfa

mergify bot added the frontend label Oct 28, 2025

noooop changed the title ~~[Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document.~~ [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 28, 2025

noooop force-pushed the update_pooling_docs branch from 976793c to d98bf46 Compare October 28, 2025 08:09

/pooling endpoint support all pooling tasks

dd06fe1

Signed-off-by: wang.yuqi <noooop@126.com>

noooop force-pushed the update_pooling_docs branch from e326cde to dd06fe1 Compare October 28, 2025 08:19

noooop added 3 commits October 28, 2025 17:44

update

0643461

Signed-off-by: wang.yuqi <noooop@126.com>

update examples

ce69d7b

Signed-off-by: wang.yuqi <noooop@126.com>

update examples

f9d85cf

Signed-off-by: wang.yuqi <noooop@126.com>

noooop marked this pull request as ready for review October 28, 2025 10:27

noooop requested review from aarnphm and chaunceyjiang as code owners October 28, 2025 10:27

noooop and others added 10 commits October 28, 2025 22:11

Update examples/online_serving/pooling/README.md

351d526

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: wang.yuqi <noooop@126.com>

Update examples/online_serving/pooling/README.md

12db9e3

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: wang.yuqi <noooop@126.com>

Pooling Tasks

4938636

Signed-off-by: wang.yuqi <noooop@126.com>

+ runner="pooling"

a7ba610

Signed-off-by: wang.yuqi <noooop@126.com>

Openai -> OpenAI

4188194

Signed-off-by: wang.yuqi <noooop@126.com>

activation -> use_activation

86ce4c4

Signed-off-by: wang.yuqi <noooop@126.com>

fix

44c7d8a

Signed-off-by: wang.yuqi <noooop@126.com>

fix

d46428a

Signed-off-by: wang.yuqi <noooop@126.com>

activation -> use_activation

90df794

Signed-off-by: wang.yuqi <noooop@126.com>

fix

90746ca

Signed-off-by: wang.yuqi <noooop@126.com>

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

vllm/entrypoints/openai/serving_pooling.py Show resolved Hide resolved

noooop added 2 commits October 29, 2025 00:19

fix

2cf3132

Signed-off-by: wang.yuqi <noooop@126.com>

fix

794669d

Signed-off-by: wang.yuqi <noooop@126.com>

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025

noooop added 3 commits October 29, 2025 00:43

Merge branch 'main' into update_pooling_docs

f43249e

Merge branch 'main' into update_pooling_docs

37137cf

Merge branch 'main' into update_pooling_docs

0124f4f

noooop mentioned this pull request Oct 29, 2025

[CI Failure]: torch._inductor.exc.InductorError in Nightly build to run all tests #27724

Open

3 tasks

hmellor reviewed Oct 29, 2025

View reviewed changes

DarkLight1337 reviewed Oct 29, 2025

View reviewed changes

noooop added 2 commits October 30, 2025 15:58

add deprecated waring

4c2a98e

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'main' into update_pooling_docs

95e014b

DarkLight1337 approved these changes Oct 30, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 30, 2025 08:08

DarkLight1337 merged commit 4464723 into vllm-project:main Oct 30, 2025
61 checks passed

Uh oh!

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

Conversation

noooop commented Sep 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Improve all pooling task (0.11.1 cut)

Purpose

Test Plan

Test Result

Uh oh!

noooop commented Sep 24, 2025

Uh oh!

DarkLight1337 commented Sep 24, 2025

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

noooop commented Oct 28, 2025

Uh oh!

noooop commented Oct 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Oct 29, 2025

Uh oh!

noooop commented Oct 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

noooop commented Sep 23, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 29, 2025 •

edited

Loading