[Feature]: Improve DX with respect to truncation in embedding and scoring tasks

### 🚀 The feature, motivation and pitch

Currently the online API for embeddings allows you to pass a parameter to control truncation:
```
class EmbeddingCompletionRequest(OpenAIBaseModel):
    ...
    truncate_prompt_tokens: Optional[Annotated[int, Field(ge=1)]] = None
```
This parameter, if given, must respect the following constraint: `0 < truncate_prompt_tokens <= max_seq_len`, where `max_seq_len` is the maximum prompt length that the model supports. This API forces the clients to call `/v1/models` to find out the max model length before to make sure the they aren't going to exceed the limit and get a 400 error. In practice, the client has two options:

1. Call `/v1/models` once and store the result somewhere, which requires the client to be stateful
2. Call `/v1/models` for every embedding operation and pay the price of two network round trips

Other inference frameworks such as [caikit](https://github.com/caikit/caikit-nlp/blob/69f99cb45ff8a22bbd504de9ac16b6dbe7307a81/caikit_nlp/modules/text_embedding/embedding.py#L466) allow the users to specify `-1` to truncate at the `max_seq_len` automatically.

For the offline API the usability is even worse because the `embed` method doesn't even have a `truncate_prompt_tokens` parameter, forcing the developer to tokenize and truncate the inputs first:
```
    def embed(
        self,
        prompts: Union[PromptType, Sequence[PromptType]],
        /,
        *,
        use_tqdm: bool = True,
        lora_request: Optional[Union[List[LoRARequest], LoRARequest]] = None,
        prompt_adapter_request: Optional[PromptAdapterRequest] = None,
    ) -> List[EmbeddingRequestOutput]:
```

The same applies to the scoring and reranking functions.

### Alternatives

_No response_

### Additional context

FYI, @gmarinho2 and I are planning to implement the suggestions in this issue.



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Improve DX with respect to truncation in embedding and scoring tasks #13489

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Improve DX with respect to truncation in embedding and scoring tasks #13489

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions