Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Oct 16, 2025

Improve all pooling task

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

plugin task uses io_processor.parse_request to verify inputs, skipping PoolingParams verify

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: wang.yuqi <noooop@126.com>
@mergify
Copy link

mergify bot commented Oct 16, 2025

Documentation preview: https://vllm--26973.org.readthedocs.build/en/26973/

@mergify mergify bot added documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) labels Oct 16, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly introduces a new 'plugin' pooling task, which is a valuable addition for models that utilize custom IO processors for pooling, such as Terratorch. The changes are well-implemented across the codebase, including updates to examples and tests. However, I have identified one critical bug in the new DummyPooler implementation that must be fixed.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@noooop
Copy link
Collaborator Author

noooop commented Oct 16, 2025

When it comes to the IO processor plugins, the type of pooling activation function depends on the combination of model and plugin. As an example, for PrithviMAE we instantiate a All pooler and disable softmax because the plugin is expecting to receive the raw output from the pooler. Other models in the future might have a different behavior and require a different pooling strategy and activation function, which is fine and should still work.

However, I agree with @maxdebayser regarding keeping the encode task to support more generic cases like this one, without having to resort to using the remaining tasks that can be confusing for people.

We need to change at least the name to avoid confusion with the previous encode task.

@christian-pinto @maxdebayser @DarkLight1337

After careful consideration, we still need a plugin pooling task, which uses io_processor.parse_request to verify inputs, skipping the verification of PoolingParams.

Signed-off-by: wang.yuqi <noooop@126.com>
@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 16, 2025

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

@christian-pinto
Copy link
Contributor

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.

I can take care of this change and have the plugin generating/validating pooling parameters.

@noooop
Copy link
Collaborator Author

noooop commented Oct 16, 2025

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.

I can take care of this change and have the plugin generating/validating pooling parameters.

sorry, I can't run prithvi_geospatial_mae locally, so please help fix all possible errors in examples and/or all other places .

Will it be more convenient if I invite you to collaborate on this PR?

The devil is in the details; perhaps when we fix these details, we may discover bigger problems.

@christian-pinto
Copy link
Contributor

parse_request is for OpenAI serving only. I think we might need to introduce a new method in the plugin explicitly to verify sampling/pooling params, or pass the params to pre_process method

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin. I am fine with either methods to be honest. At the moment, only for the online case, the Pooling parameters are created in the IOProcessorRequest, while in the offline case the parameters are just coming from the request or defaults are given.
I can take care of this change and have the plugin generating/validating pooling parameters.

sorry, I can't run prithvi_geospatial_mae locally, so please help fix all possible errors in examples and/or all other places .

Will it be more convenient if I invite you to collaborate on this PR?

The devil is in the details; perhaps when we fix these details, we will discover bigger problems.

Sure, invite me to this PR. I will be able to work on it starting on Monday though. Right now I have another task that I need to finish

@DarkLight1337
Copy link
Member

The request parsing is done in both offline and online mode, it's meant to verify that the input is appropriate for the plugin.

You're right, sorry I misremembered this (was afk when I sent the previous message so I couldn't do code search to check 😓 )

@maxdebayser
Copy link
Contributor

maxdebayser commented Oct 16, 2025

I think this extra task makes sense. Should we also add extra_kwargs to the request classes in protocol.py and in PoolingParams? In this way we would be able to pass custom args to the plugin.

@noooop
Copy link
Collaborator Author

noooop commented Oct 17, 2025

I think this extra task makes sense. Should we also add extra_kwargs to the request classes in protocol.py and in PoolingParams? In this way we would be able to pass custom args to the plugin.

IOProcessor uses a data-in, data-out, where the data can be any thing

Do we still need a separate extra_kwargs, or can extra_kwargs be part of the data? The user can actually use any data format.

@noooop noooop changed the title [Frontend][3/N] Improve all pooling task | Add plugin pooling task [Frontend][4/N] Improve all pooling task | Add plugin pooling task Oct 17, 2025
@noooop
Copy link
Collaborator Author

noooop commented Oct 17, 2025

@christian-pinto PTAL #27063 #27066

I hope io_processor_plugins can support binary response, it will definitely be more efficient compared to base64.

@maxdebayser
Copy link
Contributor

IOProcessor uses a data-in, data-out, where the data can be any thing

Yes, you're right.

@noooop noooop changed the title [Frontend][4/N] Improve all pooling task | Add plugin pooling task [Frontend][5/N] Improve all pooling task | Add plugin pooling task Oct 18, 2025
@noooop
Copy link
Collaborator Author

noooop commented Oct 22, 2025

On the compression bit, I assume this is something that happens after the plugin was applied? If we compress the output of the pooler right away, then the plugin would have to de-compress in order to be applied and this is not ideal as we would just be wasting time. If the compression happens after the post_process is applied in the /pooling endpoint then the plugins are not affected. Only the client will have to know the payload is compressed. But this I assume would be either something requested by the client itself, or a feature of the deployment where users know that data is being compressed.

Compression occurs after post-processing is applied in the /pooling endpoint,

I just want to confirm if there are any compatibility issues or what improvements are needed to make the plugin task + binary response more efficient.

@noooop noooop changed the title [Frontend][5/N] Improve all pooling task | Add plugin pooling task [Frontend][4/N] Improve all pooling task | Add plugin pooling task Oct 22, 2025
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
@noooop
Copy link
Collaborator Author

noooop commented Oct 22, 2025

@christian-pinto #27066 (binary response) has been merged.

Please help check if it can be used together with plugin task.

It would be best to have an example to tell users how to use binary response with prithvi_geospatial_mae .

…cessor plugin

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
…ngly

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto
Copy link
Contributor

I have added a new function to the plugins interface that can be used for validating or generating params. There were a few other changes needed to make the tests pass - i.e., the new DummyPooler return one less dimension in the output and the test plugin needed to be changed to account for that.

Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
@christian-pinto
Copy link
Contributor

mypy should be happy now 🤞

@noooop
Copy link
Collaborator Author

noooop commented Oct 23, 2025

@christian-pinto
Copy link
Contributor

After #27393 gets merged this one will pass fine too. Let me fix the documentation.

@noooop noooop added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 23, 2025
@noooop
Copy link
Collaborator Author

noooop commented Oct 23, 2025

The new plugin pooling task looks great. @christian-pinto Thanks for your help!

cc @DarkLight1337

Are there any more modifications needed for this PR?

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 23, 2025 12:54
@DarkLight1337 DarkLight1337 merged commit 3fa2c12 into vllm-project:main Oct 23, 2025
59 checks passed
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
…llm-project#26973)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
…llm-project#26973)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…llm-project#26973)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…llm-project#26973)

Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants