[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973

noooop · 2025-10-16T01:28:43Z

Improve all pooling task

[Model][0/N] Improve all pooling task | clean up #25817
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). #26414
[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370
[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066
[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

plugin task uses io_processor.parse_request to verify inputs, skipping PoolingParams verify

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <noooop@126.com>

mergify · 2025-10-16T01:29:19Z

Documentation preview: https://vllm--26973.org.readthedocs.build/en/26973/

gemini-code-assist

Code Review

This pull request correctly introduces a new 'plugin' pooling task, which is a valuable addition for models that utilize custom IO processors for pooling, such as Terratorch. The changes are well-implemented across the codebase, including updates to examples and tests. However, I have identified one critical bug in the new DummyPooler implementation that must be fixed.

vllm/model_executor/layers/pooler.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@codex fix this CI failure
@codex address that feedback

vllm/model_executor/layers/pooler.py

noooop · 2025-10-16T01:33:41Z

When it comes to the IO processor plugins, the type of pooling activation function depends on the combination of model and plugin. As an example, for PrithviMAE we instantiate a All pooler and disable softmax because the plugin is expecting to receive the raw output from the pooler. Other models in the future might have a different behavior and require a different pooling strategy and activation function, which is fine and should still work.

However, I agree with @maxdebayser regarding keeping the encode task to support more generic cases like this one, without having to resort to using the remaining tasks that can be confusing for people.

We need to change at least the name to avoid confusion with the previous encode task.

@christian-pinto @maxdebayser @DarkLight1337

After careful consideration, we still need a plugin pooling task, which uses io_processor.parse_request to verify inputs, skipping the verification of PoolingParams.

Signed-off-by: wang.yuqi <noooop@126.com>

DarkLight1337 · 2025-10-16T03:13:02Z

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

christian-pinto · 2025-10-16T08:40:43Z

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.

I can take care of this change and have the plugin generating/validating pooling parameters.

noooop · 2025-10-16T08:58:32Z

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.

I can take care of this change and have the plugin generating/validating pooling parameters.

sorry, I can't run prithvi_geospatial_mae locally, so please help fix all possible errors in examples and/or all other places .

Will it be more convenient if I invite you to collaborate on this PR?

The devil is in the details; perhaps when we fix these details, we may discover bigger problems.

christian-pinto · 2025-10-16T09:08:31Z

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.
I can take care of this change and have the plugin generating/validating pooling parameters.

sorry, I can't run prithvi_geospatial_mae locally, so please help fix all possible errors in examples and/or all other places .

Will it be more convenient if I invite you to collaborate on this PR?

The devil is in the details; perhaps when we fix these details, we will discover bigger problems.

Sure, invite me to this PR. I will be able to work on it starting on Monday though. Right now I have another task that I need to finish

DarkLight1337 · 2025-10-16T13:04:36Z

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin.

You're right, sorry I misremembered this (was afk when I sent the previous message so I couldn't do code search to check 😓 )

maxdebayser · 2025-10-16T20:42:31Z

I think this extra task makes sense. Should we also add extra_kwargs to the request classes in protocol.py and in PoolingParams? In this way we would be able to pass custom args to the plugin.

noooop · 2025-10-17T00:08:31Z

I think this extra task makes sense. Should we also add extra_kwargs to the request classes in protocol.py and in PoolingParams? In this way we would be able to pass custom args to the plugin.

IOProcessor uses a data-in, data-out, where the data can be any thing

Do we still need a separate extra_kwargs, or can extra_kwargs be part of the data? The user can actually use any data format.

noooop · 2025-10-17T02:20:43Z

@christian-pinto PTAL #27063 #27066

I hope io_processor_plugins can support binary response, it will definitely be more efficient compared to base64.

maxdebayser · 2025-10-17T16:34:09Z

IOProcessor uses a data-in, data-out, where the data can be any thing

Yes, you're right.

noooop · 2025-10-22T09:05:28Z

On the compression bit, I assume this is something that happens after the plugin was applied? If we compress the output of the pooler right away, then the plugin would have to de-compress in order to be applied and this is not ideal as we would just be wasting time. If the compression happens after the post_process is applied in the /pooling endpoint then the plugins are not affected. Only the client will have to know the payload is compressed. But this I assume would be either something requested by the client itself, or a feature of the deployment where users know that data is being compressed.

Compression occurs after post-processing is applied in the /pooling endpoint,

I just want to confirm if there are any compatibility issues or what improvements are needed to make the plugin task + binary response more efficient.

Signed-off-by: wang.yuqi <noooop@126.com>

noooop · 2025-10-22T10:54:15Z

@christian-pinto #27066 (binary response) has been merged.

Please help check if it can be used together with plugin task.

It would be best to have an example to tell users how to use binary response with prithvi_geospatial_mae .

…cessor plugin Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

…ngly Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto · 2025-10-22T13:13:33Z

I have added a new function to the plugins interface that can be used for validating or generating params. There were a few other changes needed to make the tests pass - i.e., the new DummyPooler return one less dimension in the output and the test plugin needed to be changed to account for that.

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto · 2025-10-22T15:31:28Z

mypy should be happy now 🤞

noooop · 2025-10-23T00:03:32Z

@christian-pinto Thanks for your help

Please confirm if it can still run after merging #27204

PTAL:

Please update the documentation for IO Processor Plugins

https://docs.vllm.ai/en/latest/design/io_processor_plugins.html

christian-pinto · 2025-10-23T06:59:38Z

After #27393 gets merged this one will pass fine too. Let me fix the documentation.

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

noooop · 2025-10-23T09:59:20Z

The new plugin pooling task looks great. @christian-pinto Thanks for your help!

cc @DarkLight1337

Are there any more modifications needed for this PR?

vllm/entrypoints/llm.py

vllm/plugins/io_processors/interface.py

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

…llm-project#26973) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

…llm-project#26973) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com>

…llm-project#26973) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

add plugin pooling task

7a9a647

Signed-off-by: wang.yuqi <noooop@126.com>

noooop requested review from aarnphm and chaunceyjiang as code owners October 16, 2025 01:28

mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) labels Oct 16, 2025

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

noooop mentioned this pull request Oct 16, 2025

[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. #25524

Open

5 tasks

Update vllm/model_executor/layers/pooler.py

cefd6eb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: wang.yuqi <noooop@126.com>

chatgpt-codex-connector bot reviewed Oct 16, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Outdated Show resolved Hide resolved

add plugin pooling task

75ad7dd

Signed-off-by: wang.yuqi <noooop@126.com>

noooop mentioned this pull request Oct 16, 2025

Remove unused functions #26971

Open

noooop mentioned this pull request Oct 17, 2025

Support bge-m3 sparse embeddings (lexical weights) #14526

Open

noooop changed the title ~~[Frontend][3/N] Improve all pooling task | Add plugin pooling task~~ [Frontend][4/N] Improve all pooling task | Add plugin pooling task Oct 17, 2025

noooop mentioned this pull request Oct 17, 2025

[Frontend][3/N] Improve all pooling task | Support binary embedding response #27066

Merged

5 tasks

noooop changed the title ~~[Frontend][4/N] Improve all pooling task | Add plugin pooling task~~ [Frontend][5/N] Improve all pooling task | Add plugin pooling task Oct 18, 2025

noooop changed the title ~~[Frontend][5/N] Improve all pooling task | Add plugin pooling task~~ [Frontend][4/N] Improve all pooling task | Add plugin pooling task Oct 22, 2025

noooop added 2 commits October 22, 2025 18:41

Merge branch 'main' into add_plugin_pooling_task

61bba0c

Signed-off-by: wang.yuqi <noooop@126.com>

fix

970fabd

Signed-off-by: wang.yuqi <noooop@126.com>

christian-pinto added 2 commits October 22, 2025 13:07

Added validation/generation of Pooling/Sampling parameters with IOPro…

db63ee3

…cessor plugin Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Dummy pooler returns one less dimension. FIxed prithvi plugin accordi…

fe5ac87

…ngly Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

christian-pinto added 3 commits October 22, 2025 14:37

Solving pre-commit failures

c38faae

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Solving pre-commit failures

c92753d

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Still trying to please the type checker

5f0f625

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Merge branch 'main' into add_plugin_pooling_task

8d96b70

noooop mentioned this pull request Oct 23, 2025

[Model] Add num_cached_tokens for PoolingRequestOutput #27378

Merged

5 tasks

christian-pinto and others added 2 commits October 23, 2025 07:29

Fixed documentation and updated examples requirements

71564fa

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Merge branch 'main' into add_plugin_pooling_task

2acca21

noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025

DarkLight1337 reviewed Oct 23, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/plugins/io_processors/interface.py Outdated Show resolved Hide resolved

christian-pinto added 2 commits October 23, 2025 12:18

Small fixes after review

fd885c2

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>

Merge branch 'main' into add_plugin_pooling_task

782c90f

DarkLight1337 approved these changes Oct 23, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 23, 2025 12:54

DarkLight1337 merged commit 3fa2c12 into vllm-project:main Oct 23, 2025
59 checks passed

Uh oh!

Uh oh!

[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973

[Frontend][4/N] Improve all pooling task | Add plugin pooling task #26973

Uh oh!

Conversation

noooop commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improve all pooling task

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

noooop commented Oct 16, 2025

Uh oh!

DarkLight1337 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christian-pinto commented Oct 16, 2025

Uh oh!

noooop commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christian-pinto commented Oct 16, 2025

Uh oh!

DarkLight1337 commented Oct 16, 2025

Uh oh!

maxdebayser commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser commented Oct 17, 2025

Uh oh!

noooop commented Oct 22, 2025

Uh oh!

noooop commented Oct 22, 2025

Uh oh!

christian-pinto commented Oct 22, 2025

Uh oh!

christian-pinto commented Oct 22, 2025

Uh oh!

noooop commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christian-pinto commented Oct 23, 2025

Uh oh!

noooop commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

noooop commented Oct 16, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Oct 16, 2025 •

edited

Loading

noooop commented Oct 16, 2025 •

edited

Loading

maxdebayser commented Oct 16, 2025 •

edited

Loading

noooop commented Oct 17, 2025 •

edited

Loading

noooop commented Oct 17, 2025 •

edited

Loading

noooop commented Oct 23, 2025 •

edited

Loading

noooop commented Oct 23, 2025 •

edited

Loading