diff --git a/docs/contributing/benchmarks.md b/docs/contributing/benchmarks.md index a97d1fa6a3a5..cf14770c01a6 100644 --- a/docs/contributing/benchmarks.md +++ b/docs/contributing/benchmarks.md @@ -823,6 +823,30 @@ The latest performance results are hosted on the public [vLLM Performance Dashbo More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md). +### Continuous Benchmarking + +The continuous benchmarking provides automated performance monitoring for vLLM across different models and GPU devices. This helps track vLLM's performance characteristics over time and identify any performance regressions or improvements. + +#### How It Works + +The continuous benchmarking is triggered via a [GitHub workflow CI](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) in the PyTorch infrastructure repository, which runs automatically every 4 hours. The workflow executes three types of performance tests: + +- **Serving tests**: Measure request handling and API performance +- **Throughput tests**: Evaluate token generation rates +- **Latency tests**: Assess response time characteristics + +#### Benchmark Configuration + +The benchmarking currently runs on a predefined set of models configured in the [vllm-benchmarks directory](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks). To add new models for benchmarking: + +1. Navigate to the appropriate GPU directory in the benchmarks configuration +2. Add your model specifications to the corresponding configuration files +3. The new models will be included in the next scheduled benchmark run + +#### Viewing Results + +All continuous benchmarking results are automatically published to the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). + [](){ #nightly-benchmarks } ## Nightly Benchmarks diff --git a/docs/contributing/profiling.md b/docs/contributing/profiling.md index a1b7927a95d1..b62560a58748 100644 --- a/docs/contributing/profiling.md +++ b/docs/contributing/profiling.md @@ -160,6 +160,22 @@ GUI example: Screenshot 2025-03-05 at 11 48 42 AM +## Continuous Profiling + +There is a [GitHub CI workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-profiling.yml) in the PyTorch infrastructure repository that provides continuous profiling for different models on vLLM. This automated profiling helps track performance characteristics over time and across different model configurations. + +### How It Works + +The workflow currently runs weekly profiling sessions for selected models, generating detailed performance traces that can be analyzed using different tools to identify performance regressions or optimization opportunities. But, it can be triggered manually as well, using the Github Action tool. + +### Adding New Models + +To extend the continuous profiling to additional models, you can modify the [profiling-tests.json](https://github.com/pytorch/pytorch-integration-testing/blob/main/vllm-profiling/cuda/profiling-tests.json) configuration file in the PyTorch integration testing repository. Simply add your model specifications to this file to include them in the automated profiling runs. + +### Viewing Profiling Results + +The profiling traces generated by the continuous profiling workflow are publicly available on the [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). Look for the **Profiling traces** table to access and download the traces for different models and runs. + ## Profiling vLLM Python Code The Python standard library includes