Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Sep 23, 2025

TL;DR

  • pooling_task required for llm.encode
  • /pooling endpoint support all pooling tasks
  • softmax, activation -> use_activation

Improve all pooling task (0.11.1 cut)

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Following #25370

  • Split the encode task into two tasks: token_embed and token_classify
    • token_embed is the same as embed, using normalize as activation
    • token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn. )

Address: #27413 (comment)

  • /pooling endpoint support all pooling tasks.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the documentation Improvements or additions to documentation label Sep 23, 2025
@noooop
Copy link
Collaborator Author

noooop commented Sep 24, 2025

@DarkLight1337

We continue our previous discussion here

following #25370

Split the encode task into two tasks: token_embed and token_classify

  • token_embed is the same as embed, using normalize as activation
  • token_classify is the same as classify, default using softmax as activation

For online scenarios (/pooling):

  • Keep one api now (/pooling), but we need to adaptively select token_embed or token_classify using something like encode2pooling_task method somewhere

  • split the /pooling api into /pooling_token_embed and /pooling_token_classify. (I personally feel that /pooling_token_embed and /pooling_token_classify looks terrible, the online /pooling API is not suitable for major changes yet. We can collect usage scenarios for a while.)

@DarkLight1337
Copy link
Member

I think for online API, the user should be able to pass the task in the request.

@mergify
Copy link

mergify bot commented Oct 8, 2025

Documentation preview: https://vllm--25524.org.readthedocs.build/en/25524/

@noooop noooop changed the title [Frontend][Doc] Consolidate encode (pooling) api & Document. [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 11, 2025
@noooop noooop changed the title [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. [Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 14, 2025
@noooop noooop changed the title [Frontend][Doc][3/N] Improve all pooling task | Polish encode (pooling) api & Document. [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 16, 2025
@noooop noooop force-pushed the update_pooling_docs branch from 18b0002 to b6b2e12 Compare October 28, 2025 06:13
@mergify mergify bot added the frontend label Oct 28, 2025
@noooop noooop changed the title [Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. Oct 28, 2025
@noooop noooop force-pushed the update_pooling_docs branch from 976793c to d98bf46 Compare October 28, 2025 08:09
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop force-pushed the update_pooling_docs branch from e326cde to dd06fe1 Compare October 28, 2025 08:19
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop marked this pull request as ready for review October 28, 2025 10:27
@noooop
Copy link
Collaborator Author

noooop commented Oct 28, 2025

/gemini review

noooop and others added 10 commits October 28, 2025 22:11
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop
Copy link
Collaborator Author

noooop commented Oct 28, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pooling API by splitting the encode task into more specific tasks (token_embed, token_classify) and renaming the activation parameter to use_activation for clarity. The documentation, examples, and tests are updated to reflect these changes. The changes are generally well-executed and improve the API's usability. However, I've identified a critical issue in the task inference logic for the new generic /pooling endpoint that could lead to unexpected failures. My review includes a suggestion to fix this logic.

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025
Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since activation was user facing, we should probably deprecate it when changing it to use_activation

@noooop
Copy link
Collaborator Author

noooop commented Oct 29, 2025

Since activation was user facing, we should probably deprecate it when changing it to use_activation

Anyway, I’d like to merge this PR quickly before the release. It’s mainly documentation changes.

This feature has been a bit of a mess, and it’s been changing with every recent release. It should finally be stable now after this latest change.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @hmellor , we should still keep the old field with a deprecation

@DarkLight1337
Copy link
Member

Otherwise we would break back-compatibility, which is definitely not what a "doc PR" should do

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 30, 2025 08:08
@noooop
Copy link
Collaborator Author

noooop commented Oct 30, 2025

cc @DarkLight1337

Let’s land this.

@DarkLight1337 DarkLight1337 merged commit 4464723 into vllm-project:main Oct 30, 2025
61 checks passed
MatthewBonanni pushed a commit to MatthewBonanni/vllm that referenced this pull request Oct 30, 2025
…g) api & Document. (vllm-project#25524)

Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants