[Model][1/N] Support multiple poolers at model level #21227

DarkLight1337 · 2025-07-19T14:05:30Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

This PR enables models to support multiple poolers, by:

Splitting up the task validation part of Pooler.get_pooling_updates into Pooler.get_supported_tasks.
Introducing DispatchPooler which dispatches pooler calls to sub-poolers based on the pooling task.

Apart from enabling models to customize each pooling task, this would eventually also let us output intermediate results within the same request, e.g.:

Hidden states via the encode pooler with ALL pooling.
Logits by adapting LogitsProcessor into a pooler.

Future work:

Multi-task support based on DispatchPooler is not enabled at the API level yet. We still check model_config.supported_tasks which is based on the value of --task. This will be addressed in the next PR.
To avoid confusing with pooling task, we will rename --task to --convert and ask users to pass --runner pooling to use pooling models in the next PR, emitting a deprecation warning to users still using --task.
Memory profiling is based on list(Pooler.get_supported_tasks())[0] instead of running profiling on every sub-pooler and getting the maximum memory usage. So, it is possible for OOM to occur during inference time. We can refine this in a later PR.
Disabling chunked prefill is still based on the pooling_type in the pooler config instead of the actual poolers being used, because the poolers are only created after model construction. Can we disable chunked prefill inside the model runner after the model is initialized?
For now, Sentence Transformers pooler config overrides the poolers for all tasks. We should only override the "main" pooler based on the model architecture.

Notes:

The output of LLM.encode (Pooling API) for embedding and classification models (except for models with Sentence Transforming pooler config) is changed by this PR because the default encode pooler is different from the default embed and default classification pooler. While technically a breaking change, users aren't supposed to use this API for embedding and classification tasks in the first place anyway.

Other changes:

Renamed --task pooling to --task encode. This was introduced very recently in [Core] Support multiple tasks per model #20771 and is not documented, so this change should barely affect any users.
Split up BertModel into a pooling variant and a non pooling variant to keep the type checker happy.

cc @maxdebayser @noooop @22quinn

Test Plan

The existing tests should pass.

Test Result

(Optional) Documentation Update

Updated the Pooling Models page to explain the new mechanism.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

github-actions · 2025-07-19T14:05:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request introduces a major refactoring to support multiple pooling tasks at the model level. It introduces a DispatchPooler to route requests to different sub-poolers based on the task. The Pooler interface is updated with get_supported_tasks and helper methods like for_encode, for_embed, and for_classify. Many model implementations are updated to use this new structure. The changes are well-structured and improve modularity. I've found one high-severity issue related to weight loading in the new BertPoolingModel.

vllm/model_executor/models/bert.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py

Overall LGTM! Just a nit.

vllm/model_executor/layers/pooler.py

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop · 2025-07-21T07:46:42Z

Disabling chunked prefill is still based on the pooling_type in the pooler config instead of the actual poolers being used, because the poolers are only created after model construction. Can we disable chunked prefill inside the model runner after the model is initialized?

disabling chunked prefill & auto prefix cache should be controlled by attn_type rather than pooling_type. For instance, Alibaba-NLP/gte-Qwen2-1.5B-instruct uses encoder-only attention, which should have chunked prefill & auto prefix cache disabled.

### What this PR does / why we need it? Fix [#21227](vllm-project/vllm#21227) to make ci happy - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@6b46c4b --------- Signed-off-by: wangli <wangli858794774@gmail.com>

maxdebayser

LGTM

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: x22x22 <wadeking@qq.com>

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Paul Pak <paulpak58@gmail.com>

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

### What this PR does / why we need it? Fix [#21227](vllm-project/vllm#21227) to make ci happy - vLLM version: v0.9.2 - vLLM main: vllm-project/vllm@6b46c4b --------- Signed-off-by: wangli <wangli858794774@gmail.com>

[Model][1/N] Support multiple pooling tasks at model level

4e2b024

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from Isotr0py July 19, 2025 14:05

DarkLight1337 requested review from WoosukKwon, aarnphm, alexm-redhat, comaniac, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, simon-mo, tlrmchlsmth, youkaichao and ywang96 as code owners July 19, 2025 14:05

mergify bot added documentation Improvements or additions to documentation frontend qwen Related to Qwen models v1 tpu Related to Google TPUs labels Jul 19, 2025

gemini-code-assist bot reviewed Jul 19, 2025

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

DarkLight1337 changed the title ~~[Model][1/N] Support multiple pooling tasks at model level~~ [Model][1/N] Support multiple poolers at model level Jul 19, 2025

DarkLight1337 added 2 commits July 19, 2025 14:24

Fix load_weights

9b47576

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Fix for V0

e4a2460

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Isotr0py approved these changes Jul 19, 2025

View reviewed changes

vllm/model_executor/layers/pooler.py Show resolved Hide resolved

Handle list inputs

a1c0bfe

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 20, 2025

Increase precision

44a7a5f

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop mentioned this pull request Jul 21, 2025

[CI Failure]: Classification test failure for Qwen2.5-1.5B-apeach model in half precision #21277

Closed

3 tasks

Fix pooler tasks

6a433f5

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop mentioned this pull request Jul 21, 2025

[Model] Pooling models default to using chunked prefill & prefix caching if supported. #20930

Merged

4 tasks

DarkLight1337 enabled auto-merge (squash) July 21, 2025 07:29

vllm-bot merged commit 042af0c into vllm-project:main Jul 21, 2025
72 of 75 checks passed

DarkLight1337 deleted the dispatch-pooler branch July 21, 2025 09:26

Potabk mentioned this pull request Jul 21, 2025

[CI] Fix broken CI by supporting multiple poolers at model level vllm-project/vllm-ascend#1915

Merged

maxdebayser reviewed Jul 22, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Jul 23, 2025

[Deprecation][2/N] Replace --task with --runner and --convert #21470

Merged

4 tasks

DarkLight1337 mentioned this pull request Jul 25, 2025

[V1] Get supported tasks from model runner instead of model config #21585

Merged

4 tasks

This was referenced Jul 26, 2025

[Bug] Fix Qwen3-Embedding model loading #21642

Closed

[Bug]: Qwen3 Embedding does not load in 0.10.0 - There is no module or parameter named 'layers' in Qwen3ForCausalLM #21614

Closed

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

d954ee4

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: x22x22 <wadeking@qq.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

76edc2b

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

e8544bd

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

3e93ce6

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

c3affca

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

a634a52

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025

[Model][1/N] Support multiple poolers at model level (vllm-project#21227

31e513d

) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop mentioned this pull request Sep 22, 2025

[Model][2/N] Improve all pooling task | Support multi-vector retrieval #25370

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Model][1/N] Support multiple poolers at model level #21227

[Model][1/N] Support multiple poolers at model level #21227

Uh oh!

DarkLight1337 commented Jul 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

noooop commented Jul 21, 2025

Uh oh!

Uh oh!

maxdebayser left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Uh oh!

[Model][1/N] Support multiple poolers at model level #21227

[Model][1/N] Support multiple poolers at model level #21227

Uh oh!

Conversation

DarkLight1337 commented Jul 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

noooop commented Jul 21, 2025

Uh oh!

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

DarkLight1337 commented Jul 19, 2025 •

edited by github-actions bot

Loading