Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235

zhaotyer · 2025-09-04T08:56:46Z

Change the default value of truncate_prompt_tokens in the embedding model to -1，By default, the model is truncated according to its maximum length.

Purpose

The client no longer needs to worry about the maximum length supported by the model and will not report an error if the input text is too long

Test Plan

work wll on bge-m3

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request changes the default behavior for handling long inputs in embedding-related requests by setting the default value of truncate_prompt_tokens to -1. This means inputs exceeding the model's maximum length will be automatically truncated, which is a good improvement for user experience. My review identifies an inconsistency where ClassificationRequest was not updated, and I recommend applying the same change there for consistency across all pooling-based endpoints.

gemini-code-assist · 2025-09-04T08:58:05Z

vllm/entrypoints/openai/protocol.py

    dimensions: Optional[int] = None
    user: Optional[str] = None
-    truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = None
+    truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = -1


This change to default truncate_prompt_tokens to -1 is a great improvement for embedding-style requests. However, for consistency, this change should also be applied to ClassificationRequest on line 1661, which also performs a pooling operation but was not updated.

Additionally, the type hint for truncate_prompt_tokens in ClassificationRequest should be updated to match the one used here (Optional[Annotated[int, Field(ge=-1)]]) to include the ge=-1 validation.

zhaotyer · 2025-10-10T06:58:07Z

@chaunceyjiang @aarnphm
openai and jinja api does not require the truncate_prompt_tokens option，I think setting the default value to -1 is appropriate

chaunceyjiang · 2025-10-10T07:00:25Z

/cc @DarkLight1337 PTAL.

DarkLight1337

cc @maxdebayser @noooop would this cause any problems?

noooop · 2025-10-10T07:23:55Z

Thanks #14776. Awesome work. I find the default always automatic prompt truncation very convenient, and sentencetransformer always automatically truncates the prompt.

https://github.com/UKPLab/sentence-transformers/blob/051ce1a5f84f46f4b97b0074c3dc76fee2114ee4/sentence_transformers/models/Transformer.py#L318-L326

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 should be okay.

By the way, I'm not very sure if the bug triggered by "truncate_prompt_tokens": -1 has been fixed.

#24704 (comment)

#22635 (comment)

Can we do something?

maxdebayser · 2025-10-10T21:42:01Z

I also like the idea of changing the default to -1, as it's convenient and compatible with sentence-transformers. Currently the default is not -1 because we try to reproduce the OpenAI API behavior for /v1/embeddings . How important is it for us to maintain the same behavior?

Somehow I missed the notification on #24704. Regarding #2263 the actual problem is that we do not take the special tokens into account correctly. I started working on this, but the API server code has so many special cases that progress became really slow and I was pulled into something else. I need to look into this.

noooop · 2025-10-11T10:39:00Z

Currently the default is not -1 because we try to reproduce the OpenAI API behavior for /v1/embeddings .

Thanks for this information

Maybe we can do it like #20538, by using override_pooler_config to control the default truncate_prompt_tokens.

e.g.

override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1)

mergify · 2025-10-14T04:30:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhaotyer.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

noooop · 2025-10-16T06:09:46Z

We need to make a decision:

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1
Default is to maintain the current state (an error will be raised if it exceeds max_max_length, which is also the OpenAI API behavior for /v1/embeddings), using override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) can enable always automatic truncation of the prompt.

cc @maxdebayser @DarkLight1337

DarkLight1337 · 2025-10-16T06:14:37Z

I think truncating by default is reasonable, and this is also the default for Infinity as mentioned in #26992. Nevertheless this risks breaking people who depend on OpenAI-compatible behavior, so perhaps we should open a RFC and announce on Slack to see if anyone has objections to it.

noooop · 2025-10-16T06:32:03Z

I think truncating by default is reasonable, and this is also the default for Infinity as mentioned in #26992. Nevertheless this risks breaking people who depend on OpenAI-compatible behavior, so perhaps we should open a RFC and announce on Slack to see if anyone has objections to it.

+1

noooop · 2025-10-23T08:32:25Z

We need to make a decision:

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1

Default is to maintain the current state (an error will be raised if it exceeds max_max_length, which is also the OpenAI API behavior for /v1/embeddings), using override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) can enable always automatic truncation of the prompt.

cc @maxdebayser @DarkLight1337

Implementing override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) is a superset that changes the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1, allowing users to set it arbitrarily in one place.

zhaotyer requested a review from aarnphm as a code owner September 4, 2025 08:56

zhaotyer changed the title ~~Change the default value of truncate_prompt_tokens in the embedding model to -1~~ Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 Sep 4, 2025

mergify bot added the frontend label Sep 4, 2025

gemini-code-assist bot reviewed Sep 4, 2025

View reviewed changes

zhaotyer requested a review from chaunceyjiang as a code owner October 10, 2025 06:47

chaunceyjiang requested a review from DarkLight1337 October 10, 2025 07:00

DarkLight1337 reviewed Oct 10, 2025

View reviewed changes

noooop mentioned this pull request Oct 12, 2025

[Bugfix][CI/Build] Fix failing Mteb CI #26638

Merged

5 tasks

mergify bot added the needs-rebase label Oct 14, 2025

merge with main branch

68c046a

zhaotyer force-pushed the truncate-prompt-tokens-default-1 branch from 07b1349 to 68c046a Compare October 14, 2025 09:06

mergify bot removed the needs-rebase label Oct 14, 2025

DarkLight1337 mentioned this pull request Oct 16, 2025

[Benchmark] Use truncation by default for pooling benchmarks #26992

Merged

5 tasks

Merge branch 'main' into truncate-prompt-tokens-default-1

3b0a07d

Uh oh!

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235

Are you sure you want to change the base?

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235

Uh oh!

Conversation

zhaotyer commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

zhaotyer commented Oct 10, 2025

Uh oh!

chaunceyjiang commented Oct 10, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

noooop commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxdebayser commented Oct 10, 2025

Uh oh!

noooop commented Oct 11, 2025

Uh oh!

mergify bot commented Oct 14, 2025

Uh oh!

noooop commented Oct 16, 2025

Uh oh!

DarkLight1337 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop commented Oct 16, 2025

Uh oh!

noooop commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhaotyer commented Sep 4, 2025 •

edited by github-actions bot

Loading

noooop commented Oct 10, 2025 •

edited

Loading

DarkLight1337 commented Oct 16, 2025 •

edited

Loading

noooop commented Oct 23, 2025 •

edited

Loading