-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request changes the default behavior for handling long inputs in embedding-related requests by setting the default value of truncate_prompt_tokens to -1. This means inputs exceeding the model's maximum length will be automatically truncated, which is a good improvement for user experience. My review identifies an inconsistency where ClassificationRequest was not updated, and I recommend applying the same change there for consistency across all pooling-based endpoints.
vllm/entrypoints/openai/protocol.py
Outdated
| dimensions: Optional[int] = None | ||
| user: Optional[str] = None | ||
| truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = None | ||
| truncate_prompt_tokens: Optional[Annotated[int, Field(ge=-1)]] = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to default truncate_prompt_tokens to -1 is a great improvement for embedding-style requests. However, for consistency, this change should also be applied to ClassificationRequest on line 1661, which also performs a pooling operation but was not updated.
Additionally, the type hint for truncate_prompt_tokens in ClassificationRequest should be updated to match the one used here (Optional[Annotated[int, Field(ge=-1)]]) to include the ge=-1 validation.
|
@chaunceyjiang @aarnphm |
|
/cc @DarkLight1337 PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @maxdebayser @noooop would this cause any problems?
|
Thanks #14776. Awesome work. I find the default always automatic prompt truncation very convenient, and sentencetransformer always automatically truncates the prompt. Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 should be okay. By the way, I'm not very sure if the bug triggered by "truncate_prompt_tokens": -1 has been fixed. Can we do something? |
|
I also like the idea of changing the default to -1, as it's convenient and compatible with sentence-transformers. Currently the default is not -1 because we try to reproduce the OpenAI API behavior for /v1/embeddings . How important is it for us to maintain the same behavior? Somehow I missed the notification on #24704. Regarding #2263 the actual problem is that we do not take the special tokens into account correctly. I started working on this, but the API server code has so many special cases that progress became really slow and I was pulled into something else. I need to look into this. |
Thanks for this information Maybe we can do it like #20538, by using override_pooler_config to control the default truncate_prompt_tokens. e.g. |
|
This pull request has merge conflicts that must be resolved before it can be |
07b1349 to
68c046a
Compare
|
We need to make a decision:
|
|
I think truncating by default is reasonable, and this is also the default for Infinity as mentioned in #26992. Nevertheless this risks breaking people who depend on OpenAI-compatible behavior, so perhaps we should open a RFC and announce on Slack to see if anyone has objections to it. |
+1 |
Implementing override_pooler_config=PoolerConfig(truncate_prompt_tokens=-1) is a superset that changes the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1, allowing users to set it arbitrarily in one place. |
Change the default value of truncate_prompt_tokens in the embedding model to -1,By default, the model is truncated according to its maximum length.
Purpose
The client no longer needs to worry about the maximum length supported by the model and will not report an error if the input text is too long
Test Plan
work wll on bge-m3
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.