Skip to content

Conversation

@maxdebayser
Copy link
Contributor

@maxdebayser maxdebayser commented Jul 31, 2025

This PR is yet another follow-up to #16188 and #21270. It adds support for models such as cross-encoder/ms-marco-MiniLM-L-6-v2 that require token_type_ids ids to be passed from the tokenizer to the model.

Since passing the token_type_ids up the chain from the entrypoints to the model runner, I'm also exploring other implementation alternatives such as: #19988 and #20026.

PR #19988 tries the same approach as V0 but the problem is that it has to touch too many places in the code. #20026 tries to minimize the code impact by passing the token_types as multimodal args, which admittedly is a bit weird. This one adds the token type ids to the pooling params thereby removing the need to touch to many places in the code. It also avoids allocating persistent tensors by encoding the token types together with the token ids.

cc: @DarkLight1337

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
return full_prompt, engine_prompt


def compress_token_type_ids(token_type_ids: list[int]) -> int:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to minimize the amount of data that is transferred between processes

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a clever optimization for passing token_type_ids to models by bit-packing them into the input_ids tensor. This avoids changing many function signatures across the codebase. The overall approach is sound and the implementation appears correct.

My main feedback focuses on improving the maintainability of this new bit-packing mechanism in vllm/model_executor/models/bert.py. The functions for encoding and decoding token_type_ids have in-place side effects that are not obvious from their names, and the code could benefit from comments explaining the bit-packing logic. Addressing these points will make the code easier to understand and safer for future modifications.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@DarkLight1337
Copy link
Member

Can you merge from main again? Thanks

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@DarkLight1337
Copy link
Member

Pooling models tests are failing, please check

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@maxdebayser
Copy link
Contributor Author

Pooling models tests are failing, please check

In the BertWithRope model the position_ids argument was renamed to positions.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WoosukKwon is quite busy lately, I'll just merge this since the changes to model runner is really minimal and the pooling models tests have passed

@vllm-bot vllm-bot merged commit 39052db into vllm-project:main Aug 11, 2025
38 of 46 checks passed
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open a follow-up PR to update the docs accordingly?

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants