Enable --profile in 'vllm bench throughput' #24575

tomasruizt · 2025-09-10T10:41:01Z

Purpose

The goal is easily profile a throughput workload. Concretely the command vllm bench throughput --profile ... will generate a profiling file, which is useful to debug performance gaps using the https://ui.perfetto.dev/ UI.

Test Plan

The vllm bench command can take 4 different paths. Here is how I tested them.
First I set env vars to enable profiling and also enable CUDA blocking to make long GPU ops show long runtime (optional).

export VLLM_USE_V1=1
export VLLM_TORCH_PROFILER_DIR=./profiles/
export CUDA_LAUNCH_BLOCKING=1

The run_vllm() path

vllm bench throughput     --model=Qwen/Qwen3-1.7B     --dataset-name=hf     --dataset-path=likaixin/InstructCoder     --max-num-seqs=100     --num-prompts=10     --input-len=1000     --output-len=10     --max-model-len=2048     --gpu-memory-utilization=0.6     --profile --enforce-eager

The run_vllm_async() path

vllm bench throughput     --model=Qwen/Qwen3-1.7B     --dataset-name=hf     --dataset-path=likaixin/InstructCoder     --max-num-seqs=100     --num-prompts=10     --input-len=1000     --output-len=10     --max-model-len=2048     --gpu-memory-utilization=0.6     --profile --enforce-eager --async-engine

the backend=vllm-chat path for multimodal models

vllm bench throughput     --model=Qwen/Qwen2.5-VL-3B-Instruct     --dataset-name=hf --dataset-path=lmarena-ai/VisionArena-Chat     --max-num-seqs=100     --num-prompts=10     --input-len=1000     --output-len=10     --max-model-len=2048     --gpu-memory-utilization=0.6     --profile --enforce-eager  --backend=vllm-chat

The backend=hf path

vllm bench throughput     --model=Qwen/Qwen3-1.7B     --dataset-name=sharegpt --max-num-seqs=100     --num-prompts=10 --input-len=1000 --output-len=10 --profile --enforce-eager --backend=hf  --hf-max-batch-size=10

Test Result

All paths except backend=hf generate a profile file, like this one: file.pt.trace.json.gz. As mentioned, they can be opened with https://ui.perfetto.dev/.

The reason is that the class AutoModelForCausalLM used in backend=hf does not implement .start_profiler() and stop_profiler(). Therefore, I raise a NotImplementedError, so the user knows to remove the --profile flag.

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

gemini-code-assist

Code Review

This pull request adds support for profiling throughput benchmarks using the --profile flag. The implementation correctly wraps the model execution calls with start_profile() and stop_profile(). My main feedback is to ensure the profiler is always stopped, even if an error occurs during model execution. This can be achieved by using try...finally blocks, which will make the profiling logic more robust. I've added specific suggestions for this improvement.

vllm/benchmarks/throughput.py

tomasruizt · 2025-09-10T10:56:50Z

I don't agree with Geminis review. One problem with its suggestion that I've seen in real life is that the generation fails, the profiler is graciously stopped, and it creates a profile result (which is very short obviously). Therefore, it appears as if the generation had been successful and super fast based on this short profile.

That is why, if the generation fails, the profiler should not dump a profiling result IMO. It should not be graciously stopped.

vllm/benchmarks/throughput.py

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Enable --profile in 'vllm bench throughput'

fb02647

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

mergify bot added the performance Performance-related issues label Sep 10, 2025

Merge branch 'main' into feature/enable-profile-in-vllm-bench-throughput

fa9cfbd

gemini-code-assist bot reviewed Sep 10, 2025

View reviewed changes

vllm/benchmarks/throughput.py Show resolved Hide resolved

vllm/benchmarks/throughput.py Show resolved Hide resolved

vllm/benchmarks/throughput.py Show resolved Hide resolved

tomasruizt mentioned this pull request Sep 10, 2025

feat: spec decode with draft models #24322

Open

DarkLight1337 reviewed Sep 10, 2025

View reviewed changes

vllm/benchmarks/throughput.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Sep 10, 2025

View reviewed changes

vllm/benchmarks/throughput.py Outdated Show resolved Hide resolved

Fix review comments

20be6da

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

DarkLight1337 approved these changes Sep 10, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 10, 2025 13:39

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

Merge branch 'main' into feature/enable-profile-in-vllm-bench-throughput

7be906f

vllm-bot merged commit ee0bc5e into vllm-project:main Sep 11, 2025
36 of 38 checks passed

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

9254401

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

dsxsteven pushed a commit to dsxsteven/vllm_splitPR that referenced this pull request Sep 15, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

7151355

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

f5a2ab7

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

d9dcb3f

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

95cdc09

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

Enable --profile in 'vllm bench throughput' (vllm-project#24575)

e4c5291

Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable --profile in 'vllm bench throughput' #24575

Enable --profile in 'vllm bench throughput' #24575

Uh oh!

tomasruizt commented Sep 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomasruizt commented Sep 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Enable --profile in 'vllm bench throughput' #24575

Enable --profile in 'vllm bench throughput' #24575

Uh oh!

Conversation

tomasruizt commented Sep 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomasruizt commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tomasruizt commented Sep 10, 2025 •

edited by github-actions bot

Loading

tomasruizt commented Sep 10, 2025 •

edited

Loading