Skip to content

Commit ae0c359

Browse files
namanlalitnyuyewentao256
authored andcommitted
[Doc] Add documentation for vLLM continuous benchmarking and profiling (#25819)
Signed-off-by: Naman Lalit <nl2688@nyu.edu> Signed-off-by: yewentao256 <zhyanwentao@126.com>
1 parent c692506 commit ae0c359

File tree

2 files changed

+40
-0
lines changed

2 files changed

+40
-0
lines changed

docs/contributing/benchmarks.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -823,6 +823,30 @@ The latest performance results are hosted on the public [vLLM Performance Dashbo
823823

824824
More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md).
825825

826+
### Continuous Benchmarking
827+
828+
The continuous benchmarking provides automated performance monitoring for vLLM across different models and GPU devices. This helps track vLLM's performance characteristics over time and identify any performance regressions or improvements.
829+
830+
#### How It Works
831+
832+
The continuous benchmarking is triggered via a [GitHub workflow CI](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) in the PyTorch infrastructure repository, which runs automatically every 4 hours. The workflow executes three types of performance tests:
833+
834+
- **Serving tests**: Measure request handling and API performance
835+
- **Throughput tests**: Evaluate token generation rates
836+
- **Latency tests**: Assess response time characteristics
837+
838+
#### Benchmark Configuration
839+
840+
The benchmarking currently runs on a predefined set of models configured in the [vllm-benchmarks directory](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks). To add new models for benchmarking:
841+
842+
1. Navigate to the appropriate GPU directory in the benchmarks configuration
843+
2. Add your model specifications to the corresponding configuration files
844+
3. The new models will be included in the next scheduled benchmark run
845+
846+
#### Viewing Results
847+
848+
All continuous benchmarking results are automatically published to the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm).
849+
826850
[](){ #nightly-benchmarks }
827851

828852
## Nightly Benchmarks

docs/contributing/profiling.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,22 @@ GUI example:
160160

161161
<img width="1799" alt="Screenshot 2025-03-05 at 11 48 42 AM" src="https://github.com/user-attachments/assets/c7cff1ae-6d6f-477d-a342-bd13c4fc424c" />
162162

163+
## Continuous Profiling
164+
165+
There is a [GitHub CI workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-profiling.yml) in the PyTorch infrastructure repository that provides continuous profiling for different models on vLLM. This automated profiling helps track performance characteristics over time and across different model configurations.
166+
167+
### How It Works
168+
169+
The workflow currently runs weekly profiling sessions for selected models, generating detailed performance traces that can be analyzed using different tools to identify performance regressions or optimization opportunities. But, it can be triggered manually as well, using the Github Action tool.
170+
171+
### Adding New Models
172+
173+
To extend the continuous profiling to additional models, you can modify the [profiling-tests.json](https://github.com/pytorch/pytorch-integration-testing/blob/main/vllm-profiling/cuda/profiling-tests.json) configuration file in the PyTorch integration testing repository. Simply add your model specifications to this file to include them in the automated profiling runs.
174+
175+
### Viewing Profiling Results
176+
177+
The profiling traces generated by the continuous profiling workflow are publicly available on the [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). Look for the **Profiling traces** table to access and download the traces for different models and runs.
178+
163179
## Profiling vLLM Python Code
164180

165181
The Python standard library includes

0 commit comments

Comments
 (0)