feat: add benchmarking guide #2620

hhzhang16 · 2025-08-21T20:59:31Z

Overview:

This MR adds comprehensive benchmarking framework for Dynamo deployments with automated performance comparison between aggregated, disaggregated, and vanilla backend configurations. Includes both endpoint testing and full deployment lifecycle management.

Details:

New benchmarking framework (benchmarks/utils/) with GenAI-Perf integration, plotting, and deployment automation
Automated benchmark script (benchmarks/benchmark.sh) supporting both endpoint and deployment testing modes
Comprehensive documentation (docs/benchmarks/benchmarking.md)
Updated Kubernetes deployment utilities (deploy/utils/) for namespace setup, PVC management, and manifest injection
Refactored profiling tools - moved common utilities to shared deploy/utils/ package

Where should the reviewer start?

docs/benchmarks/benchmarking.md: complete benchmarking guide
benchmarks/benchmark.sh: main benchmarking script
deploy/utils/README.md: Kubernetes setup utilities that support benchmarking and other Kubernetes workflows

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- End-to-end benchmarking tooling for deployments and existing endpoints, including a CLI runner, async workflow, concurrency sweeps, and automatic plot generation and summaries.
- Vanilla backend support with a ready-to-use template and Kubernetes-managed deployment lifecycle with port-forwarding.
- Namespace setup script and utilities to inject manifests and download PVC results.
Documentation
- New Benchmarking and Pre-Deployment Profiling guides; expanded benchmarking README and links updated across docs.
Chores
- Standardized Kubernetes resources (ServiceAccount, Role/RoleBinding, PVC) and profiling job updates.
- Added dependencies manifest; package markers and licensing headers.
- Removed deprecated profiling helper scripts.

copy-pr-bot · 2025-08-21T20:59:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

grahamking · 2025-08-22T14:26:03Z

You're adding 26,000 lines of mostly YAML to Dynamo @hhzhang16 ? Am I reading that MR right? Maybe we don't need the .bak files. I know it's Draft.

coderabbitai · 2025-08-26T23:45:28Z

Walkthrough

Adds end-to-end benchmarking tooling: a Bash orchestrator, Python workflow/CLI, GenAI-Perf integration, plotting, a vanilla backend client/template, and Kubernetes utilities for manifest injection and PVC downloads. Updates profiler job, refactors imports, standardizes RBAC/PVC names, and expands documentation with a comprehensive benchmarking guide and link fixes.

Changes

Cohort / File(s)	Summary of changes
Benchmark docs `README.md`, `benchmarks/README.md`, `docs/benchmarks/`, `docs/architecture/`, `components/backends/vllm/deploy/README.md`, `docs/guides/dynamo_deploy/sla_planner_deployment.md`, `tests/planner/README.md`	Added benchmarking guide; updated links to pre-deployment profiling; clarified benchmarking modes and quick starts; adjusted references (PVC name to dynamo-pvc).
Benchmark orchestration `benchmarks/benchmark.sh`, `benchmarks/utils/benchmark.py`, `benchmarks/utils/workflow.py`	Introduced Bash runner and Python CLI/workflow to validate inputs, deploy/teardown backends, run benchmarks, and summarize results.
GenAI-Perf and plotting `benchmarks/utils/genai.py`, `benchmarks/utils/plot.py`	Added GenAI-Perf wrapper and concurrency sweep; implemented parsing and generation of multiple plots and a summary.
Vanilla backend support `benchmarks/utils/vanilla_client.py`, `benchmarks/utils/templates/vanilla-vllm.yaml`	Added Kubernetes client to deploy/manage a vanilla vLLM service; provided a deployment/service manifest template.
Profiler updates/removals `benchmarks/profiler/deploy/profile_sla_job.yaml`, `benchmarks/profiler/profile_sla.py`, `benchmarks/profiler/download_pvc_results.py` (deleted), `benchmarks/profiler/inject_disagg_config.py` (deleted), `benchmarks/profiler/utils/utils.py` (deleted)	Switched service account/PVC names, module invocation, and config path; updated imports; removed legacy PVC tools and profiler utilities.
Deploy utils and K8s setup `deploy/utils/...` (README, `__init__.py`, `kubernetes.py`, `dynamo_deployment.py`, `inject_manifest.py`, `download_pvc_results.py`, `manifests/*`, `requirements.txt`, `setup_k8s_namespace.sh`), `deploy/__init__.py`	Added utilities for PVC access, manifest injection, results download; async/UX updates to DynamoDeploymentClient incl. port-forward; standardized PVC (dynamo-pvc) and RBAC names; added setup script and requirements.
Package markers `benchmarks/utils/__init__.py`	Added SPDX headers and package marker; no logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant SH as benchmark.sh
  participant PY as benchmarks.utils.benchmark (CLI)
  participant WF as workflow
  participant K as Kubernetes API
  participant C as Client (Dynamo/Vanilla)
  participant G as GenAI-Perf
  participant P as Plotter
  participant FS as Results Dir

  U->>SH: Run ./benchmarks/benchmark.sh [flags]
  SH->>PY: python -m benchmarks.utils.benchmark [...args...]
  PY->>WF: run_benchmark_workflow(...)
  alt Endpoint benchmarking
    WF->>G: run_concurrency_sweep(endpoint, model, ISL/OSL/STD)
    G-->>FS: Write per-concurrency results
  else Deployment benchmarking
    loop For each provided manifest (agg/disagg/vanilla)
      WF->>C: create client (Dynamo/Vanilla)
      C->>K: create Deployment/Service
      C->>K: wait_for_deployment_ready()
      C->>C: port_forward_frontend()
      WF->>G: run_concurrency_sweep(local URL, model, ISL/OSL/STD)
      G-->>FS: Write per-concurrency results (subdir)
      C->>C: stop_port_forward()
      C->>K: delete Deployment/Service
    end
  end
  WF->>P: generate_plots(base_output_dir)
  P-->>FS: plots/, SUMMARY.txt
  PY-->>U: Exit code/status and result paths
  SH-->>U: Final summary and plot locations

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: deploy SLA profiler to k8s #2030 — Touches DynamoDeploymentClient and profiler deployment utilities; overlaps with the new deploy.utils.dynamo_deployment usage here.
chore: add instructions to modify SLA to profile_sla doc; update component name #2167 — Modifies the same profiling job YAML; closely related to this PR’s updates to service account, args, and PVC mounts.
feat: profiling PVC updates for better UX #2402 — Involves PVC access tooling; this PR replaces/relocates similar download/inject scripts and updates the pvc-access-pod flow.

Poem

A bunny with charts and a stopwatch in paw,
Spins up pods, then listens in awe.
Concurrency hops, plots sprout like clover,
Tokens per second—leaps over and over.
PVC trails, RBAC tails tied neat—
Benchmark burrow complete. 🥕📈

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.