Skip to content

Conversation

@maxdebayser
Copy link
Contributor

@maxdebayser maxdebayser commented Oct 10, 2025

The /rerank API can be support both by embedding models and native reranker models. However, with reranker models the query is concatenated with each document with a separator token in between. Therefore the amount of tokens that passes through the model has to be accounted for differently in each case.

Because of these details, this PR, adds a specialized random dataset to generates requests which send the expected amount of tokens. So when the use sets random-input-len, num-prompts and random-batch-size, in both cases we will generate requests such that the total amount of tokens is promptsinput-len in batches of size batch-sizeinput-len.

Here is an example of how this works. With the server running an reranker or embedding model

vllm serve BAAI/bge-reranker-v2-m3

run a benchmark using the vllm-rerank backend and the random-rerank dataset:

vllm bench serve --model BAAI/bge-reranker-v2-m3 --tokenizer BAAI/bge-reranker-v2-m3 \
  --backend vllm-rerank --endpoint /v1/rerank --dataset-name random-rerank \
  --random-input-len 512 --num-prompts 10 --random-batch-size 5

The /rerank API can be support both by embedding models
and native reranker models. However, with reranker
models the query is concatenated with each document with
a separator token in between. Therefore the amount
of tokens that passes through the model has to be accounted
for differently in each case.
Because of these details, this PR, adds a specialized random
dataset to generates requests which send the expected amount
of tokens. So when the use sets `random-input-len`, `num-prompts`
and `random-batch-size`, in both cases we will generate requests
such that the total amount of tokens is prompts*input-len in
batches of size batch-size*input-len.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@mergify
Copy link

mergify bot commented Oct 10, 2025

Documentation preview: https://vllm--26602.org.readthedocs.build/en/26602/

@mergify mergify bot added documentation Improvements or additions to documentation performance Performance-related issues labels Oct 10, 2025
@maxdebayser
Copy link
Contributor Author

cc: @noooop , @DarkLight1337 , @ZJY0516

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable support for benchmarking the /rerank endpoint, including a new specialized random dataset and documentation. The implementation is well-structured, refactoring existing embedding benchmark logic into a more general _run_pooling_request function to accommodate both embeddings and reranking. However, I've identified a critical issue that can cause the benchmark to crash under specific default conditions. Please see the detailed comment for the fix.

@maxdebayser
Copy link
Contributor Author

Related to #21796

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@mergify
Copy link

mergify bot commented Oct 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 13, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@mergify mergify bot removed the needs-rebase label Oct 13, 2025
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 14, 2025 02:31
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025
@DarkLight1337 DarkLight1337 merged commit fe3edb4 into vllm-project:main Oct 14, 2025
47 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025
anhuong pushed a commit to anhuong/vllm that referenced this pull request Oct 16, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#26602)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants