Skip to content

[Feature]: Improve DX with respect to truncation in embedding and scoring tasks #13489

@maxdebayser

Description

@maxdebayser

🚀 The feature, motivation and pitch

Currently the online API for embeddings allows you to pass a parameter to control truncation:

class EmbeddingCompletionRequest(OpenAIBaseModel):
    ...
    truncate_prompt_tokens: Optional[Annotated[int, Field(ge=1)]] = None

This parameter, if given, must respect the following constraint: 0 < truncate_prompt_tokens <= max_seq_len, where max_seq_len is the maximum prompt length that the model supports. This API forces the clients to call /v1/models to find out the max model length before to make sure the they aren't going to exceed the limit and get a 400 error. In practice, the client has two options:

  1. Call /v1/models once and store the result somewhere, which requires the client to be stateful
  2. Call /v1/models for every embedding operation and pay the price of two network round trips

Other inference frameworks such as caikit allow the users to specify -1 to truncate at the max_seq_len automatically.

For the offline API the usability is even worse because the embed method doesn't even have a truncate_prompt_tokens parameter, forcing the developer to tokenize and truncate the inputs first:

    def embed(
        self,
        prompts: Union[PromptType, Sequence[PromptType]],
        /,
        *,
        use_tqdm: bool = True,
        lora_request: Optional[Union[List[LoRARequest], LoRARequest]] = None,
        prompt_adapter_request: Optional[PromptAdapterRequest] = None,
    ) -> List[EmbeddingRequestOutput]:

The same applies to the scoring and reranking functions.

Alternatives

No response

Additional context

FYI, @gmarinho2 and I are planning to implement the suggestions in this issue.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions