Skip to content

Conversation

@zhaotyer
Copy link
Contributor

@zhaotyer zhaotyer commented Sep 4, 2025

Change the default value of truncate_prompt_tokens in the embedding model to -1,By default, the model is truncated according to its maximum length.

Purpose

The client no longer needs to worry about the maximum length supported by the model and will not report an error if the input text is too long

Test Plan

work wll on bge-m3

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@zhaotyer zhaotyer requested a review from aarnphm as a code owner September 4, 2025 08:56
@zhaotyer zhaotyer changed the title Change the default value of truncate_prompt_tokens in the embedding model to -1 Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 Sep 4, 2025
@mergify mergify bot added the frontend label Sep 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request changes the default behavior for handling long inputs in embedding-related requests by setting the default value of truncate_prompt_tokens to -1. This means inputs exceeding the model's maximum length will be automatically truncated, which is a good improvement for user experience. My review identifies an inconsistency where ClassificationRequest was not updated, and I recommend applying the same change there for consistency across all pooling-based endpoints.

dimensions: Optional[int] = None
user: Optional[str] = None
truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = None
truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change to default truncate_prompt_tokens to -1 is a great improvement for embedding-style requests. However, for consistency, this change should also be applied to ClassificationRequest on line 1661, which also performs a pooling operation but was not updated.

Additionally, the type hint for truncate_prompt_tokens in ClassificationRequest should be updated to match the one used here (Optional[Annotated[int, Field(ge=-1)]]) to include the ge=-1 validation.

@zhaotyer
Copy link
Contributor Author

@chaunceyjiang @aarnphm
openai and jinja api does not require the truncate_prompt_tokens option,I think setting the default value to -1 is appropriate

@chaunceyjiang
Copy link
Collaborator

/cc @DarkLight1337 PTAL.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @maxdebayser @noooop would this cause any problems?

@noooop
Copy link
Collaborator

noooop commented Oct 10, 2025

Thanks #14776. Awesome work. I find the default always automatic prompt truncation very convenient, and sentencetransformer always automatically truncates the prompt.

https://github.com/UKPLab/sentence-transformers/blob/051ce1a5f84f46f4b97b0074c3dc76fee2114ee4/sentence_transformers/models/Transformer.py#L318-L326

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 should be okay.


By the way, I'm not very sure if the bug triggered by "truncate_prompt_tokens": -1 has been fixed.

#24704 (comment)

#22635 (comment)

Can we do something?

@maxdebayser
Copy link
Contributor

I also like the idea of changing the default to -1, as it's convenient and compatible with sentence-transformers. Currently the default is not -1 because we try to reproduce the OpenAI API behavior for /v1/embeddings . How important is it for us to maintain the same behavior?

Somehow I missed the notification on #24704. Regarding #2263 the actual problem is that we do not take the special tokens into account correctly. I started working on this, but the API server code has so many special cases that progress became really slow and I was pulled into something else. I need to look into this.

@noooop
Copy link
Collaborator

noooop commented Oct 11, 2025

Currently the default is not -1 because we try to reproduce the OpenAI API behavior for /v1/embeddings .

Thanks for this information

Maybe we can do it like #20538, by using override_pooler_config to control the default truncate_prompt_tokens.

e.g.

override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1)

@mergify
Copy link

mergify bot commented Oct 14, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhaotyer.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 14, 2025
@noooop
Copy link
Collaborator

noooop commented Oct 16, 2025

We need to make a decision:

  • Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1
  • Default is to maintain the current state (an error will be raised if it exceeds max_max_length, which is also the OpenAI API behavior for /v1/embeddings), using override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) can enable always automatic truncation of the prompt.

cc @maxdebayser @DarkLight1337

@DarkLight1337
Copy link
Member

DarkLight1337 commented Oct 16, 2025

I think truncating by default is reasonable, and this is also the default for Infinity as mentioned in #26992. Nevertheless this risks breaking people who depend on OpenAI-compatible behavior, so perhaps we should open a RFC and announce on Slack to see if anyone has objections to it.

@noooop
Copy link
Collaborator

noooop commented Oct 16, 2025

I think truncating by default is reasonable, and this is also the default for Infinity as mentioned in #26992. Nevertheless this risks breaking people who depend on OpenAI-compatible behavior, so perhaps we should open a RFC and announce on Slack to see if anyone has objections to it.

+1

@noooop
Copy link
Collaborator

noooop commented Oct 23, 2025

We need to make a decision:

  • Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1
  • Default is to maintain the current state (an error will be raised if it exceeds max_max_length, which is also the OpenAI API behavior for /v1/embeddings), using override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) can enable always automatic truncation of the prompt.

cc @maxdebayser @DarkLight1337

Implementing override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) is a superset that changes the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1, allowing users to set it arbitrarily in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants