Skip to content

Commit 61e6f00

Browse files
DarkLight13370xrushi
authored andcommitted
[Benchmark] Add plot utility for parameter sweep (vllm-project#27168)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
1 parent f1d24eb commit 61e6f00

File tree

10 files changed

+1788
-1168
lines changed

10 files changed

+1788
-1168
lines changed

docs/contributing/benchmarks.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ toc_depth: 4
77
vLLM provides comprehensive benchmarking tools for performance testing and evaluation:
88

99
- **[Benchmark CLI](#benchmark-cli)**: `vllm bench` CLI tools and specialized benchmark scripts for interactive performance testing
10-
- **[Batch Scripts](#batch-scripts)**: Run `vllm bench` against multiple configurations conveniently
10+
- **[Parameter sweeps](#parameter-sweeps)**: Automate `vllm bench` runs for multiple configurations
1111
- **[Performance benchmarks](#performance-benchmarks)**: Automated CI benchmarks for development
1212
- **[Nightly benchmarks](#nightly-benchmarks)**: Comparative benchmarks against alternatives
1313

@@ -925,15 +925,13 @@ throughput numbers correctly is also adjusted.
925925

926926
</details>
927927

928-
## Batch Scripts
928+
## Parameter Sweeps
929929

930-
### Batch Serving Script
930+
### Online Benchmark
931931

932-
[`vllm/benchmarks/serve_multi.py`](../../vllm/benchmarks/serve_multi.py) automatically starts `vllm serve` and runs `vllm bench serve` over multiple configurations.
932+
[`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) automatically starts `vllm serve` and runs `vllm bench serve` to evaluate vLLM over multiple configurations.
933933

934-
#### Batch Mode
935-
936-
The basic purpose of this script is to evaluate vLLM under different settings. Follows these steps to run the script:
934+
Follow these steps to run the script:
937935

938936
1. Construct the base command to `vllm serve`, and pass it to the `--serve-cmd` option.
939937
2. Construct the base command to `vllm bench serve`, and pass it to the `--bench-cmd` option.
@@ -996,7 +994,7 @@ The basic purpose of this script is to evaluate vLLM under different settings. F
996994
Example command:
997995

998996
```bash
999-
python vllm/benchmarks/serve_multi.py \
997+
python -m vllm.benchmarks.sweep.serve \
1000998
--serve-cmd 'vllm serve meta-llama/Llama-2-7b-chat-hf' \
1001999
--bench-cmd 'vllm bench serve --model meta-llama/Llama-2-7b-chat-hf --backend vllm --endpoint /v1/completions --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
10021000
--serve-params benchmarks/serve_hparams.json \
@@ -1018,9 +1016,9 @@ python vllm/benchmarks/serve_multi.py \
10181016
!!! tip
10191017
You can use the `--resume` option to continue the parameter sweep if one of the runs failed.
10201018

1021-
#### SLA Mode
1019+
### SLA Auto-Tuner
10221020

1023-
By passing SLA constraints via `--sla-params`, you can run this script in SLA mode, causing it to adjust either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints.
1021+
[`vllm/benchmarks/sweep/serve_sla.py`](../../vllm/benchmarks/sweep/serve_sla.py) is a wrapper over [`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
10241022

10251023
For example, to ensure E2E latency within different target values for 99% of requests:
10261024

@@ -1044,7 +1042,7 @@ For example, to ensure E2E latency within different target values for 99% of req
10441042
Example command:
10451043

10461044
```bash
1047-
python vllm/benchmarks/serve_multi.py \
1045+
python -m vllm.benchmarks.sweep.serve_sla \
10481046
--serve-cmd 'vllm serve meta-llama/Llama-2-7b-chat-hf' \
10491047
--bench-cmd 'vllm bench serve --model meta-llama/Llama-2-7b-chat-hf --backend vllm --endpoint /v1/completions --dataset-name sharegpt --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json' \
10501048
--serve-params benchmarks/serve_hparams.json \
@@ -1066,6 +1064,24 @@ The algorithm for adjusting the SLA variable is as follows:
10661064

10671065
For a given combination of `--serve-params` and `--bench-params`, we share the benchmark results across `--sla-params` to avoid rerunning benchmarks with the same SLA variable value.
10681066

1067+
### Visualizer
1068+
1069+
[`vllm/benchmarks/sweep/plot.py`](../../vllm/benchmarks/sweep/plot.py) can be used to plot performance curves from parameter sweep results.
1070+
1071+
Example command:
1072+
1073+
```bash
1074+
python -m vllm.benchmarks.sweep.plot benchmarks/results/<timestamp> \
1075+
--var-x max_concurrency \
1076+
--row-by random_input_len \
1077+
--col-by random_output_len \
1078+
--curve-by api_server_count,max_num_batched_tokens \
1079+
--filter-by 'max_concurrency<=1024'
1080+
```
1081+
1082+
!!! tip
1083+
You can use `--dry-run` to preview the figures to be plotted.
1084+
10691085
## Performance Benchmarks
10701086

10711087
The performance benchmarks are used for development to confirm whether new changes improve performance under various workloads. They are triggered on every commit with both the `perf-benchmarks` and `ready` labels, and when a PR is merged into vLLM.

0 commit comments

Comments
 (0)