Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Oct 19, 2025

Purpose

Follow-up to #27085

  • Split up the old script into multiple files in a new directory vllm/benchmarks/sweep, abstracting away common code.
  • Add a separate plotting CLI vllm/benchmarks/sweep/plot.py.

cc @lengrongfu

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@mergify
Copy link

mergify bot commented Oct 19, 2025

Documentation preview: https://vllm--27168.org.readthedocs.build/en/27168/

@mergify mergify bot added documentation Improvements or additions to documentation performance Performance-related issues labels Oct 19, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better code organization than our mainline code! 😂

A couple of thoughts:

  • Should we move SLA into a separate entrypoint serve_sla.py? To me it's different enough than the serve sweep, and we're already reusing utils. But up to you the current approach is fine
  • Could you post an example plot command with its output plot? Hard to imagine what the plots look like just from the code
  • I'll try the script when I can and let you know if I have more thoughts, we can merge before that though

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337
Copy link
Member Author

Should we move SLA into a separate entrypoint serve_sla.py? To me it's different enough than the serve sweep, and we're already reusing utils. But up to you the current approach is fine

Moved

Could you post an example plot command with its output plot? Hard to imagine what the plots look like just from the code

python -m vllm.benchmarks.sweep.plot benchmarks/results/20251019_101029 \
    --fig-dir throughput_vs_concurrency \
    --var-x max_concurrency \
    --var-y request_throughput \
    --col-by api_server_count \
    --curve-by max_num_batched_tokens \
    --filter-by 'max_concurrency<=1024'
image

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
```bash
python vllm/benchmarks/serve_multi.py \
python -m vllm.benchmarks.sweep.serve_sla \
--serve-cmd 'vllm serve meta-llama/Llama-2-7b-chat-hf' \
Copy link
Contributor

@lengrongfu lengrongfu Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use a vllm serve api address, we should how to config?

Maybe we should add a --serve-host param, user can set a vllm online server, then this --serve-params param can be invalid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set the server's host via --serve-cmd. And for resetting the server cache after each benchmark run, you can use --after-bench-cmd.

Copy link
Member Author

@DarkLight1337 DarkLight1337 Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean that the benchmark should not be responsible for launching the server, you can just use a dummy command that sleeps infinitely and adjust --bench-cmd to access the real server. Of course, you should also set --after-bench-cmd in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe i not need set --serve-cmd param, use --bench-cmd param to set vllm bench serve --model meta-llama/Llama-2-7b-chat-hf --backend openai is enough.

@lengrongfu
Copy link
Contributor

The generated file name has single quotes above it, which is weird, not sure if it's just my environment.
image

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337
Copy link
Member Author

The generated file name has single quotes above it, which is weird, not sure if it's just my environment.

Fixed now

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 added this to the v0.11.1 milestone Oct 22, 2025
@vllm-bot vllm-bot merged commit ceacedc into vllm-project:main Oct 22, 2025
3 of 6 checks passed
@DarkLight1337 DarkLight1337 deleted the benchmark-sweep branch October 22, 2025 03:30
usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants