Add support for the /rerank endpoint in vllm bench serve #26602

maxdebayser · 2025-10-10T21:25:32Z

The /rerank API can be support both by embedding models and native reranker models. However, with reranker models the query is concatenated with each document with a separator token in between. Therefore the amount of tokens that passes through the model has to be accounted for differently in each case.

Because of these details, this PR, adds a specialized random dataset to generates requests which send the expected amount of tokens. So when the use sets random-input-len, num-prompts and random-batch-size, in both cases we will generate requests such that the total amount of tokens is promptsinput-len in batches of size batch-sizeinput-len.

Here is an example of how this works. With the server running an reranker or embedding model

vllm serve BAAI/bge-reranker-v2-m3

run a benchmark using the vllm-rerank backend and the random-rerank dataset:

vllm bench serve --model BAAI/bge-reranker-v2-m3 --tokenizer BAAI/bge-reranker-v2-m3 \
  --backend vllm-rerank --endpoint /v1/rerank --dataset-name random-rerank \
  --random-input-len 512 --num-prompts 10 --random-batch-size 5

The /rerank API can be support both by embedding models and native reranker models. However, with reranker models the query is concatenated with each document with a separator token in between. Therefore the amount of tokens that passes through the model has to be accounted for differently in each case. Because of these details, this PR, adds a specialized random dataset to generates requests which send the expected amount of tokens. So when the use sets `random-input-len`, `num-prompts` and `random-batch-size`, in both cases we will generate requests such that the total amount of tokens is prompts*input-len in batches of size batch-size*input-len. Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify · 2025-10-10T21:26:10Z

Documentation preview: https://vllm--26602.org.readthedocs.build/en/26602/

maxdebayser · 2025-10-10T21:26:45Z

cc: @noooop , @DarkLight1337 , @ZJY0516

gemini-code-assist

Code Review

This pull request adds valuable support for benchmarking the /rerank endpoint, including a new specialized random dataset and documentation. The implementation is well-structured, refactoring existing embedding benchmark logic into a more general _run_pooling_request function to accommodate both embeddings and reranking. However, I've identified a critical issue that can cause the benchmark to crash under specific default conditions. Please see the detailed comment for the fix.

vllm/benchmarks/datasets.py

maxdebayser · 2025-10-10T21:27:23Z

Related to #21796

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

docs/contributing/benchmarks.md

vllm/benchmarks/datasets.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify · 2025-10-13T15:37:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

docs/contributing/benchmarks.md

vllm/benchmarks/datasets.py

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…t#26602) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify bot added documentation Improvements or additions to documentation performance Performance-related issues labels Oct 10, 2025

gemini-code-assist bot reviewed Oct 10, 2025

View reviewed changes

vllm/benchmarks/datasets.py Show resolved Hide resolved

maxdebayser added 2 commits October 10, 2025 18:30

appease linter

a01fec0

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

address gemini review

0f42b29

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

DarkLight1337 reviewed Oct 11, 2025

View reviewed changes

docs/contributing/benchmarks.md Outdated Show resolved Hide resolved

ZJY0516 reviewed Oct 11, 2025

View reviewed changes

vllm/benchmarks/datasets.py Outdated Show resolved Hide resolved

Merge branch 'upstream_main' into reranker_benchmark

72a851d

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify bot added the needs-rebase label Oct 13, 2025

Address review comments

0ce53ad

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify bot removed the needs-rebase label Oct 13, 2025