You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can use the `--resume` option to continue the parameter sweep if one of the runs failed.
1020
1018
1021
-
####SLA Mode
1019
+
### SLA Auto-Tuner
1022
1020
1023
-
By passing SLA constraints via `--sla-params`, you can run this script in SLA mode, causing it to adjust either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints.
1021
+
[`vllm/benchmarks/sweep/serve_sla.py`](../../vllm/benchmarks/sweep/serve_sla.py) is a wrapper over [`vllm/benchmarks/sweep/serve.py`](../../vllm/benchmarks/sweep/serve.py) that tunes either the request rate or concurrency (choose using `--sla-variable`) in order to satisfy the SLA constraints given by `--sla-params`.
1024
1022
1025
1023
For example, to ensure E2E latency within different target values for 99% of requests:
1026
1024
@@ -1044,7 +1042,7 @@ For example, to ensure E2E latency within different target values for 99% of req
@@ -1066,6 +1064,24 @@ The algorithm for adjusting the SLA variable is as follows:
1066
1064
1067
1065
For a given combination of `--serve-params` and `--bench-params`, we share the benchmark results across `--sla-params` to avoid rerunning benchmarks with the same SLA variable value.
1068
1066
1067
+
### Visualizer
1068
+
1069
+
[`vllm/benchmarks/sweep/plot.py`](../../vllm/benchmarks/sweep/plot.py) can be used to plot performance curves from parameter sweep results.
You can use `--dry-run` to preview the figures to be plotted.
1084
+
1069
1085
## Performance Benchmarks
1070
1086
1071
1087
The performance benchmarks are used for development to confirm whether new changes improve performance under various workloads. They are triggered on every commit with both the `perf-benchmarks` and `ready` labels, and when a PR is merged into vLLM.
0 commit comments