Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Aug 15, 2025

  • fix bugs in pre-deployment sweep
  • fix bugs in vllm_v1 planner k8s example
  • expose num_d/p to k8s metrics and update k8s metric docs

Summary by CodeRabbit

  • New Features
    • Planner now exposes Prometheus metrics with a configurable port via a new CLI flag.
  • Bug Fixes
    • More reliable PVC access pod deployment through corrected path handling.
    • Clearer profiler output pointing to the correct config file location.
  • Chores
    • Updated vLLM runtime images across components.
    • Added deployment annotation to disable Grove.
    • Exposed planner metrics port in deployment.
  • Documentation
    • Expanded Kubernetes metrics guide with namespace templating, planner PodMonitor, apply steps, and dashboard/port-forward updates.
    • Updated profiling docs with the new config path.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 15, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Walkthrough

The PR updates profiler import paths and messages, adjusts a Kubernetes utility path, modifies vLLM planner deployment to include Prometheus metrics and new images, adds a Prometheus port default and CLI flag to the planner, integrates Prometheus gauges and async observation in the planner core, and updates related documentation and PodMonitor samples.

Changes

Cohort / File(s) Summary
Profiler import refactor
benchmarks/profiler/profile_endpoint.py, benchmarks/profiler/profile_sla.py
Switch imports of profile_decode from benchmarks.profiler.utils to utils.profile_decode; no functional changes.
Profiler messaging and k8s util
benchmarks/profiler/inject_disagg_config.py, benchmarks/profiler/utils/kubernetes.py
inject_disagg_config now instructs DGD_CONFIG_FILE=/workspace/{target_path}; kubernetes.py resolves pvc-access-pod.yaml from parent dir (…/profiler/deploy/).
Planner metrics enablement
components/planner/src/dynamo/planner/defaults.py, components/planner/src/dynamo/planner/planner_sla.py, components/planner/src/dynamo/planner/utils/planner_core.py
Add BasePlannerDefaults.prometheus_port=0; add --prometheus-port CLI arg; integrate Prometheus server and gauges, make observe_metrics async and update counts; run loop awaits observe_metrics.
vLLM disagg planner manifest
components/backends/vllm/deploy/disagg_planner.yaml
Add annotation nvidia.com/enable-grove: "false"; update vllm-runtime images to hzhou-0814-02; expose planner metrics port 9085 and add --prometheus-port=9085 arg.
Docs updates
docs/architecture/pre_deployment_profiling.md, docs/guides/deploy/k8s_metrics.md
Update DGD_CONFIG_FILE example to /workspace path; parameterize namespace as $NAMESPACE, add planner PodMonitor, update apply steps and port-forward namespaces, adjust Grafana ConfigMap path/namespace.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Planner
  participant PrefillWorkers
  participant DecodeWorkers
  participant Prometheus as Prometheus Scraper

  User->>Planner: Start with --prometheus-port (0 disables)
  alt port != 0
    Planner->>Planner: start_http_server(port)
  end

  loop periodic
    Planner->>PrefillWorkers: get_workers_info()
    PrefillWorkers-->>Planner: prefill endpoints
    Planner->>DecodeWorkers: get_workers_info()
    DecodeWorkers-->>Planner: decode endpoints
    Planner->>Planner: update Gauges (num_p_workers, num_d_workers)
  end

  Prometheus-->>Planner: scrape /metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A planner hums, its gauges bright,
I twitch my ears at metrics’ light.
Ports ajar at 9085,
Prometheus comes by to hive.
In pods and paths we hop along—
/workspace set, the scrape is strong. 🐇📊

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🔭 Outside diff range comments (1)
components/planner/src/dynamo/planner/utils/planner_core.py (1)

181-205: Avoid blocking the event loop when querying Prometheus; fetch concurrently

If PrometheusAPIClient uses blocking I/O, these calls will stall the loop. Fetch concurrently via asyncio.to_thread and gather for better responsiveness.

Apply this diff:

-        self.last_metrics.ttft = self.prometheus_api_client.get_avg_time_to_first_token(
-            f"{self.args.adjustment_interval}s"
-        )
-        self.last_metrics.itl = self.prometheus_api_client.get_avg_inter_token_latency(
-            f"{self.args.adjustment_interval}s"
-        )
-        self.last_metrics.num_req = self.prometheus_api_client.get_avg_request_count(
-            f"{self.args.adjustment_interval}s"
-        )
-        self.last_metrics.request_duration = (
-            self.prometheus_api_client.get_avg_request_duration(
-                f"{self.args.adjustment_interval}s"
-            )
-        )
-        self.last_metrics.isl = (
-            self.prometheus_api_client.get_avg_input_sequence_tokens(
-                f"{self.args.adjustment_interval}s"
-            )
-        )
-        self.last_metrics.osl = (
-            self.prometheus_api_client.get_avg_output_sequence_tokens(
-                f"{self.args.adjustment_interval}s"
-            )
-        )
+        window = f"{self.args.adjustment_interval}s"
+        ttft_f = asyncio.to_thread(self.prometheus_api_client.get_avg_time_to_first_token, window)
+        itl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_inter_token_latency, window)
+        num_req_f = asyncio.to_thread(self.prometheus_api_client.get_avg_request_count, window)
+        req_dur_f = asyncio.to_thread(self.prometheus_api_client.get_avg_request_duration, window)
+        isl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_input_sequence_tokens, window)
+        osl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_output_sequence_tokens, window)
+        (
+            self.last_metrics.ttft,
+            self.last_metrics.itl,
+            self.last_metrics.num_req,
+            self.last_metrics.request_duration,
+            self.last_metrics.isl,
+            self.last_metrics.osl,
+        ) = await asyncio.gather(ttft_f, itl_f, num_req_f, req_dur_f, isl_f, osl_f, return_exceptions=False)
🧹 Nitpick comments (3)
benchmarks/profiler/inject_disagg_config.py (1)

165-165: Avoid double slash in printed DGD_CONFIG_FILE path

When target_path starts with “/”, this prints “/workspace//profiling_results/...”. Not functionally wrong, but noisy. Remove the extra slash.

-    print(f"🔧 Set DGD_CONFIG_FILE=/workspace/{args.target_path} in your profiler job")
+    print(f"🔧 Set DGD_CONFIG_FILE=/workspace{args.target_path} in your profiler job")
benchmarks/profiler/utils/kubernetes.py (1)

81-83: Path resolution change looks correct; consider resolving symlinks for robustness

Moving up one directory to reach benchmarks/profiler/deploy is correct. Minor nit: use resolve() to avoid surprises if the file is symlinked.

Apply this diff:

-    script_dir = Path(__file__).parent.parent
+    script_dir = Path(__file__).resolve().parent.parent

Optionally, allow overriding via an env var for flexibility:

-    pod_yaml_path = script_dir / "deploy" / "pvc-access-pod.yaml"
+    pod_yaml_path = Path(
+        os.environ.get("PVC_ACCESS_POD_YAML", str(script_dir / "deploy" / "pvc-access-pod.yaml"))
+    )
components/planner/src/dynamo/planner/utils/planner_core.py (1)

106-121: Prometheus server bootstrap: good guard; consider namespacing metrics

Starting the HTTP server only when port != 0 is correct. Consider namespacing metrics to avoid collisions in multi-process environments and to follow metric naming best practices.

Apply this diff to add namespace/subsystem and consolidate metric naming:

-        # Initialize Prometheus metrics
-        self.num_p_workers_gauge = Gauge("num_p_workers", "Number of prefill workers")
-        self.num_d_workers_gauge = Gauge("num_d_workers", "Number of decode workers")
+        # Initialize Prometheus metrics
+        # Use a single gauge with a role label for better cardinality control and querying
+        self.num_workers_gauge = Gauge(
+            "dynamo_planner_workers",
+            "Number of engine workers by role",
+            labelnames=("role",),
+        )

And update the setters (see observe_metrics diff below).

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a3d624a and 73659fd.

📒 Files selected for processing (10)
  • benchmarks/profiler/inject_disagg_config.py (1 hunks)
  • benchmarks/profiler/profile_endpoint.py (1 hunks)
  • benchmarks/profiler/profile_sla.py (1 hunks)
  • benchmarks/profiler/utils/kubernetes.py (1 hunks)
  • components/backends/vllm/deploy/disagg_planner.yaml (7 hunks)
  • components/planner/src/dynamo/planner/defaults.py (1 hunks)
  • components/planner/src/dynamo/planner/planner_sla.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/planner_core.py (5 hunks)
  • docs/architecture/pre_deployment_profiling.md (1 hunks)
  • docs/guides/deploy/k8s_metrics.md (8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

Applied to files:

  • components/backends/vllm/deploy/disagg_planner.yaml
🧬 Code Graph Analysis (4)
benchmarks/profiler/profile_sla.py (1)
benchmarks/profiler/utils/profile_decode.py (1)
  • profile_decode (21-85)
components/planner/src/dynamo/planner/planner_sla.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
  • SLAPlannerDefaults (64-74)
benchmarks/profiler/profile_endpoint.py (1)
benchmarks/profiler/utils/profile_decode.py (1)
  • profile_decode (21-85)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
  • SLAPlannerDefaults (64-74)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/deploy/k8s_metrics.md

139-139: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

🔇 Additional comments (10)
benchmarks/profiler/profile_sla.py (1)

38-38: Import path change verified — no remaining old imports

Confirmed the new import path is correct, the module defines profile_decode, and all call sites use the updated import.

  • benchmarks/profiler/utils/profile_decode.py — defines def profile_decode(...) (around line 21)
  • benchmarks/profiler/profile_sla.py — import at line 38: from utils.profile_decode import profile_decode; call at ~line 476
  • benchmarks/profiler/profile_endpoint.py — import at line 8: from utils.profile_decode import profile_decode; call at ~line 89
components/backends/vllm/deploy/disagg_planner.yaml (2)

8-9: Grove disabled annotation acknowledged

Setting nvidia.com/enable-grove: "false" is explicit and clear for this deployment.


50-50: Image tag bumps: confirm provenance and digest pinning policy

All components now use nvcr.io/.../vllm-runtime:hzhou-0814-02. If your org policy prefers immutability, consider digest pinning to avoid tag drift. Otherwise, these bumps are fine.

To help track runtime compatibility, document the image change in the release notes and validate the image is accessible in your cluster registry.

Also applies to: 94-94, 143-143, 193-193, 243-243

components/planner/src/dynamo/planner/planner_sla.py (1)

138-143: No action required — prometheus_port already defaults to 0

components/planner/src/dynamo/planner/defaults.py defines SLAPlannerDefaults.prometheus_port = 0 (around line 38), so the argparse default is safe and no change is needed.

docs/guides/deploy/k8s_metrics.md (3)

96-96: Namespace templating via $NAMESPACE is good

Switching to $NAMESPACE in PodMonitor resources + envsubst in the apply step improves reuse across clusters/namespaces.

Also applies to: 108-108, 118-118, 130-130


187-187: Including -n monitoring in port-forward commands is correct

This avoids relying on default namespace selection and reduces operator error.

Also applies to: 198-198


171-171: Grafana ConfigMap path updated — file present and labeled for auto-discovery

Verified: deploy/metrics/k8s/grafana-dynamo-dashboard-configmap.yaml exists and contains grafana_dashboard: "1" (lines 9–11). No changes required.

components/planner/src/dynamo/planner/utils/planner_core.py (3)

362-362: Awaiting observe_metrics in the loop is correct

Switching observe_metrics to async and awaiting it in the loop keeps sequencing deterministic before make_adjustments runs.


474-479: Same potential defaults issue here: prometheus_port may be undefined

The main block also uses SLAPlannerDefaults.prometheus_port. Ensure it exists or default to 0.

Apply this diff if needed:

-    parser.add_argument(
-        "--prometheus-port",
-        type=int,
-        default=SLAPlannerDefaults.prometheus_port,
-        help="Prometheus port for metrics server (0 to disable)",
-    )
+    parser.add_argument(
+        "--prometheus-port",
+        type=int,
+        default=0,
+        help="Prometheus port for metrics server (0 to disable)",
+    )

You can verify the attribute with the same script provided for planner_sla.py.


24-25: Verify runtime image or dependency manifests include "prometheus-client"

The file components/planner/src/dynamo/planner/utils/planner_core.py now imports prometheus_client (Gauge, start_http_server). If the runtime image or dependency manifests don't include the pip package prometheus-client, the import will fail at module import time (before the try/except around start_http_server).

  • Location to check:
    • components/planner/src/dynamo/planner/utils/planner_core.py — lines ~24–25:
      from prometheus_client import Gauge, start_http_server
      
  • What I ran: attempted to find Dockerfile and requirements.* but both were not present in the workspace, so I could not confirm whether the image includes the package.
  • Please verify (run in repo root):
    • rg -n --hidden -S 'promheus_client|prometheus-client' || true
    • rg -n --hidden -S 'Dockerfile|requirements|pyproject.toml|setup.cfg|Pipfile|environment.yml' || true
    • check any image build steps in .github/workflows or infra folders for pip install steps

If the package is missing, add prometheus-client to your dependency manifest or ensure the Dockerfile installs it (or catch ImportError around the import if a non-fatal absence is acceptable).

@tedzhouhk tedzhouhk merged commit 922850a into main Aug 15, 2025
12 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/sla-planner-bench branch August 15, 2025 21:30
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
…se num_d/p to k8s metrics (#2454)

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants