[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

noooop · 2025-09-22T08:29:56Z

Break change

Split the encode task into two tasks: token_embed and token_classify
- token_embed is the same as embed, using normalize as activation
- token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn. )
Complete remove softmax from PoolingParams and perfer using activation, since we actually allow classify and token_classify to use any activation function.
pooling_task required for llm.encode and /pooling <- This will take effect entirely in the next PR ([Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524)

Improve all pooling task

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

After #21227 landed, we hope that pooling models can always use all pooling, and users don’t need to manually set using all pooling.

The current encode api (/pooling api ) mainly targets the classify for each token scenario (e.g. TokenClassification #24872 & reward models), overlooked the embed for each token scenario.

Let's support embed for each token scenario (multi-vector retrieval)

Partial_Fix #25165

We are stepping closer to support ColBERT & ColPali

cc @DarkLight1337 @maxdebayser

Test Plan

tests/models/language/pooling/test_multi_vector_retrieval.py
tests/test_pooling_params.py

Test Result

pass

Known Issues

Maybe we should find a way to support chunked prefill + all pooling (and mean pooling))
support ColBERT & ColPali

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

noooop · 2025-09-22T09:26:34Z

@jupyterjazz

try:

from vllm import LLM

llm = LLM(
    model="jinaai/jina-embeddings-v4-vllm-text-matching",
    enforce_eager=True,
    max_model_len=1024,
    enable_chunked_prefill=False,  # <- In order to use the encode api
    runner="pooling")

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

outputs = llm.embed(prompts)

for prompt, output in zip(prompts, outputs):
    embeds = output.outputs.embedding
    print(len(embeds))

outputs = llm.encode(prompts, pooling_task="encode")

for prompt, output in zip(prompts, outputs):
    multi_vector = output.outputs.data
    print(multi_vector.shape)

Do you ok with this api and outputs?

There are still some broken features that need to be fixed. But the multi_vector feature is now testable.

noooop · 2025-09-23T11:00:06Z

@DarkLight1337

Ready for review

(Slight) Break change

Split the encode task into two tasks: token_embed and token_classify
- token_embed is the same as embed, using normalize as activation
- token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn. )
- Use the following code for compatibility:

def encode2pooling_task(supported_tasks):
    # Currently no model supports both token_embed and token_classify.
    if "token_embed" in supported_tasks:
        return "token_embed"
    elif "token_classify" in supported_tasks:
        return "token_classify"
    else:
        raise ValueError(f"pooling_task must be one of {supported_tasks}.")

Complete remove softmax from PoolingParams and perfer using activation, since we actually allow classify and token_classify to use any activation function.

vllm/tasks.py

jupyterjazz · 2025-09-23T21:55:58Z

Hi @noooop,

I just tested and it works fine. The only thing it was missing is override_pooler_config=PoolerConfig(normalize=False). I don't have a strong opinion on this but setting normalization to False by default during encode could make sense because oftentimes you would want the actual last hidden layer at this step. Other than this, the API looks good to me. Thank you for such a quick fix!

vllm/pooling_params.py

vllm/entrypoints/openai/api_server.py

DarkLight1337

Please update this page https://docs.vllm.ai/en/latest/models/pooling_models.html#model-conversion to not use encode task anymore

DarkLight1337 · 2025-09-24T03:25:31Z

I'll delay the merge of this PR until after the release so we don't have to worry about back-compatibility issues which further complicate future PRs

maxdebayser

Overall the changes in this PR look good to me but I would prefer to still keep the generic "encode" task around for uses cases that don't cleanly fit with the old and new tasks introduced in this PR. As an example, the use case introduced by this PR: #22820 can't really be described as "token_embed", "token_classify", "embed", "classify" or "score".

mergify · 2025-09-26T14:00:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @noooop.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

noooop · 2025-09-27T04:55:11Z

Overall the changes in this PR look good to me but I would prefer to still keep the generic "encode" task around for uses cases that don't cleanly fit with the old and new tasks introduced in this PR. As an example, the use case introduced by this PR: #22820 can't really be described as "token_embed", "token_classify", "embed", "classify" or "score".

Now only mypy will check whether pooling_task belongs to PoolingTask, so pooling_task in encode api can accept any str.

In fact, we basically support custom pooling task and pooler. As long as user implement a OOT model with pooler, and use the corresponding pooling_task in encode. LOL

Do we need to move towards allowing users to use pooling task plugins?

DarkLight1337 · 2025-10-15T07:18:58Z

cc @christian-pinto

noooop · 2025-10-15T07:19:53Z

Can I temporarily skip this test?
( I think there is a conflict in dependencies between terratorch and vllm

No, we cannot skip it

It seems this pr can pass the test, but I don't know how it happened.

DarkLight1337 · 2025-10-15T07:37:35Z

Maybe you can try to pip install the specific version of terratorch

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-15T07:44:02Z

Maybe you can try to pip install the specific version of terratorch

vllm/requirements/test.in

Line 57 in b8a4572

    
           terratorch @ git+https://github.com/IBM/terratorch.git@1.1.rc3 # required for PrithviMAE test

pip install git+https://github.com/IBM/terratorch.git@1.1.rc3

It seems I can only rely on ci.

christian-pinto · 2025-10-15T08:49:07Z

probably this is happening because on CI we use the compiled list of dependencies, rather than manually pip installing the various packages?

Have you tried installing all dependencies from here: https://github.com/vllm-project/vllm/blob/main/requirements/test.in

We are in the process of releasing a new version of TerraTorch and then we will be able to just pip install from pypy rather than a specific tag from git. I will post a PR to fix that in vLLM soon.

noooop · 2025-10-15T09:07:23Z

By the way, this PR splits the encode task into two tasks: token_embed and token_classify.

token_embed is the same as embed, using normalize as activation.
token_classify is the same as classify, default using softmax as activation (we actually allow classify and token_classify to use any activation function by setting act_fn.)

prithvi_mae and test_io_processor_plugins are not one of "token_embed", "token_classify", "embed", "classify" or "score".

We no longer have the encode task that puts all the others into it.

@christian-pinto How do you think about the plugin pooling task?

noooop · 2025-10-15T09:39:25Z

@DarkLight1337

Are there any more modifications needed for this PR?

DarkLight1337 · 2025-10-15T09:40:53Z

As long as the test passes then I'm fine with it

christian-pinto · 2025-10-15T09:42:56Z

When it comes to the IO processor plugins, the type of pooling activation function depends on the combination of model and plugin. As an example, for PrithviMAE we instantiate a All pooler and disable softmax because the plugin is expecting to receive the raw output from the pooler. Other models in the future might have a different behavior and require a different pooling strategy and activation function, which is fine and should still work.

However, I agree with @maxdebayser regarding keeping the encode task to support more generic cases like this one, without having to resort to using the remaining tasks that can be confusing for people.

noooop · 2025-10-15T09:42:58Z

As long as the test passes then I'm fine with it

(You forgot to approve it

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: bbartels <benjamin@bartels.dev>

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com>

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com>

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

mergify bot added documentation Improvements or additions to documentation qwen Related to Qwen models labels Sep 22, 2025

mergify bot added frontend v1 labels Sep 23, 2025

noooop marked this pull request as ready for review September 23, 2025 10:59

noooop requested review from DarkLight1337, NickLucche, WoosukKwon, aarnphm, alexm-redhat, chaunceyjiang, comaniac, njhill, robertgshaw2-redhat, sighingnow, simon-mo and ywang96 as code owners September 23, 2025 10:59

DarkLight1337 reviewed Sep 23, 2025

View reviewed changes

vllm/tasks.py Outdated Show resolved Hide resolved

noooop commented Sep 24, 2025

View reviewed changes

vllm/pooling_params.py Show resolved Hide resolved

noooop commented Sep 24, 2025

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 24, 2025

View reviewed changes

noooop mentioned this pull request Sep 24, 2025

[Frontend][Doc][Last/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

Draft

5 tasks

maxdebayser reviewed Sep 26, 2025

View reviewed changes

mergify bot added the needs-rebase label Sep 26, 2025

fix test_prithvi_mae.py

60db627

Signed-off-by: wang.yuqi <noooop@126.com>

mergify bot added the multi-modality Related to multi-modality (#4194) label Oct 15, 2025

Merge branch 'main' into multi_vector_retrieval

9c9150b

noooop enabled auto-merge (squash) October 15, 2025 09:41

DarkLight1337 approved these changes Oct 15, 2025

View reviewed changes

noooop merged commit f54f851 into vllm-project:main Oct 15, 2025
60 checks passed

noooop deleted the multi_vector_retrieval branch October 15, 2025 11:20

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

71125a4

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: bbartels <benjamin@bartels.dev>

noooop mentioned this pull request Oct 16, 2025

[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973

Merged

5 tasks

mandy-li pushed a commit to mandy-li/vllm that referenced this pull request Oct 16, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

8626736

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com>

This was referenced Oct 17, 2025

[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066

Merged

[Model][5/N] Improve all pooling task | Support chunked prefill with ALL pooling #27145

Open

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

d1e4804

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

8b8fbc6

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

febbf28

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval (

08c7f22

vllm-project#25370) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Uh oh!

Uh oh!

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

Conversation

noooop commented Sep 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Break change

Improve all pooling task

Purpose

Test Plan

Test Result

Known Issues

Uh oh!

noooop commented Sep 22, 2025

Uh oh!

noooop commented Sep 23, 2025

(Slight) Break change

Uh oh!

Uh oh!

jupyterjazz commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Sep 24, 2025

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 26, 2025

Uh oh!

noooop commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Oct 15, 2025

Uh oh!

noooop commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Oct 15, 2025

Uh oh!

noooop commented Oct 15, 2025

Uh oh!

christian-pinto commented Oct 15, 2025

Uh oh!

noooop commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 15, 2025

Uh oh!

DarkLight1337 commented Oct 15, 2025

Uh oh!

christian-pinto commented Oct 15, 2025

Uh oh!

noooop commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

noooop commented Sep 22, 2025 •

edited by github-actions bot

Loading

jupyterjazz commented Sep 23, 2025 •

edited

Loading

noooop commented Sep 27, 2025 •

edited

Loading

noooop commented Oct 15, 2025 •

edited

Loading

noooop commented Oct 15, 2025 •

edited

Loading