-
Notifications
You must be signed in to change notification settings - Fork 692
chore: bug fixes in pre-deployment sweeping and vllm_v1 planner; expose num_d/p to k8s metrics #2454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe PR updates profiler import paths and messages, adjusts a Kubernetes utility path, modifies vLLM planner deployment to include Prometheus metrics and new images, adds a Prometheus port default and CLI flag to the planner, integrates Prometheus gauges and async observation in the planner core, and updates related documentation and PodMonitor samples. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Planner
participant PrefillWorkers
participant DecodeWorkers
participant Prometheus as Prometheus Scraper
User->>Planner: Start with --prometheus-port (0 disables)
alt port != 0
Planner->>Planner: start_http_server(port)
end
loop periodic
Planner->>PrefillWorkers: get_workers_info()
PrefillWorkers-->>Planner: prefill endpoints
Planner->>DecodeWorkers: get_workers_info()
DecodeWorkers-->>Planner: decode endpoints
Planner->>Planner: update Gauges (num_p_workers, num_d_workers)
end
Prometheus-->>Planner: scrape /metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
181-205: Avoid blocking the event loop when querying Prometheus; fetch concurrentlyIf PrometheusAPIClient uses blocking I/O, these calls will stall the loop. Fetch concurrently via asyncio.to_thread and gather for better responsiveness.
Apply this diff:
- self.last_metrics.ttft = self.prometheus_api_client.get_avg_time_to_first_token( - f"{self.args.adjustment_interval}s" - ) - self.last_metrics.itl = self.prometheus_api_client.get_avg_inter_token_latency( - f"{self.args.adjustment_interval}s" - ) - self.last_metrics.num_req = self.prometheus_api_client.get_avg_request_count( - f"{self.args.adjustment_interval}s" - ) - self.last_metrics.request_duration = ( - self.prometheus_api_client.get_avg_request_duration( - f"{self.args.adjustment_interval}s" - ) - ) - self.last_metrics.isl = ( - self.prometheus_api_client.get_avg_input_sequence_tokens( - f"{self.args.adjustment_interval}s" - ) - ) - self.last_metrics.osl = ( - self.prometheus_api_client.get_avg_output_sequence_tokens( - f"{self.args.adjustment_interval}s" - ) - ) + window = f"{self.args.adjustment_interval}s" + ttft_f = asyncio.to_thread(self.prometheus_api_client.get_avg_time_to_first_token, window) + itl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_inter_token_latency, window) + num_req_f = asyncio.to_thread(self.prometheus_api_client.get_avg_request_count, window) + req_dur_f = asyncio.to_thread(self.prometheus_api_client.get_avg_request_duration, window) + isl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_input_sequence_tokens, window) + osl_f = asyncio.to_thread(self.prometheus_api_client.get_avg_output_sequence_tokens, window) + ( + self.last_metrics.ttft, + self.last_metrics.itl, + self.last_metrics.num_req, + self.last_metrics.request_duration, + self.last_metrics.isl, + self.last_metrics.osl, + ) = await asyncio.gather(ttft_f, itl_f, num_req_f, req_dur_f, isl_f, osl_f, return_exceptions=False)
🧹 Nitpick comments (3)
benchmarks/profiler/inject_disagg_config.py (1)
165-165: Avoid double slash in printed DGD_CONFIG_FILE pathWhen target_path starts with “/”, this prints “/workspace//profiling_results/...”. Not functionally wrong, but noisy. Remove the extra slash.
- print(f"🔧 Set DGD_CONFIG_FILE=/workspace/{args.target_path} in your profiler job") + print(f"🔧 Set DGD_CONFIG_FILE=/workspace{args.target_path} in your profiler job")benchmarks/profiler/utils/kubernetes.py (1)
81-83: Path resolution change looks correct; consider resolving symlinks for robustnessMoving up one directory to reach benchmarks/profiler/deploy is correct. Minor nit: use resolve() to avoid surprises if the file is symlinked.
Apply this diff:
- script_dir = Path(__file__).parent.parent + script_dir = Path(__file__).resolve().parent.parentOptionally, allow overriding via an env var for flexibility:
- pod_yaml_path = script_dir / "deploy" / "pvc-access-pod.yaml" + pod_yaml_path = Path( + os.environ.get("PVC_ACCESS_POD_YAML", str(script_dir / "deploy" / "pvc-access-pod.yaml")) + )components/planner/src/dynamo/planner/utils/planner_core.py (1)
106-121: Prometheus server bootstrap: good guard; consider namespacing metricsStarting the HTTP server only when port != 0 is correct. Consider namespacing metrics to avoid collisions in multi-process environments and to follow metric naming best practices.
Apply this diff to add namespace/subsystem and consolidate metric naming:
- # Initialize Prometheus metrics - self.num_p_workers_gauge = Gauge("num_p_workers", "Number of prefill workers") - self.num_d_workers_gauge = Gauge("num_d_workers", "Number of decode workers") + # Initialize Prometheus metrics + # Use a single gauge with a role label for better cardinality control and querying + self.num_workers_gauge = Gauge( + "dynamo_planner_workers", + "Number of engine workers by role", + labelnames=("role",), + )And update the setters (see observe_metrics diff below).
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
benchmarks/profiler/inject_disagg_config.py(1 hunks)benchmarks/profiler/profile_endpoint.py(1 hunks)benchmarks/profiler/profile_sla.py(1 hunks)benchmarks/profiler/utils/kubernetes.py(1 hunks)components/backends/vllm/deploy/disagg_planner.yaml(7 hunks)components/planner/src/dynamo/planner/defaults.py(1 hunks)components/planner/src/dynamo/planner/planner_sla.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(5 hunks)docs/architecture/pre_deployment_profiling.md(1 hunks)docs/guides/deploy/k8s_metrics.md(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-25T22:34:11.384Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.
Applied to files:
components/backends/vllm/deploy/disagg_planner.yaml
🧬 Code Graph Analysis (4)
benchmarks/profiler/profile_sla.py (1)
benchmarks/profiler/utils/profile_decode.py (1)
profile_decode(21-85)
components/planner/src/dynamo/planner/planner_sla.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(64-74)
benchmarks/profiler/profile_endpoint.py (1)
benchmarks/profiler/utils/profile_decode.py (1)
profile_decode(21-85)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(64-74)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/deploy/k8s_metrics.md
139-139: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (10)
benchmarks/profiler/profile_sla.py (1)
38-38: Import path change verified — no remaining old importsConfirmed the new import path is correct, the module defines profile_decode, and all call sites use the updated import.
- benchmarks/profiler/utils/profile_decode.py — defines def profile_decode(...) (around line 21)
- benchmarks/profiler/profile_sla.py — import at line 38: from utils.profile_decode import profile_decode; call at ~line 476
- benchmarks/profiler/profile_endpoint.py — import at line 8: from utils.profile_decode import profile_decode; call at ~line 89
components/backends/vllm/deploy/disagg_planner.yaml (2)
8-9: Grove disabled annotation acknowledgedSetting nvidia.com/enable-grove: "false" is explicit and clear for this deployment.
50-50: Image tag bumps: confirm provenance and digest pinning policyAll components now use nvcr.io/.../vllm-runtime:hzhou-0814-02. If your org policy prefers immutability, consider digest pinning to avoid tag drift. Otherwise, these bumps are fine.
To help track runtime compatibility, document the image change in the release notes and validate the image is accessible in your cluster registry.
Also applies to: 94-94, 143-143, 193-193, 243-243
components/planner/src/dynamo/planner/planner_sla.py (1)
138-143: No action required — prometheus_port already defaults to 0components/planner/src/dynamo/planner/defaults.py defines
SLAPlannerDefaults.prometheus_port = 0(around line 38), so the argparse default is safe and no change is needed.docs/guides/deploy/k8s_metrics.md (3)
96-96: Namespace templating via $NAMESPACE is goodSwitching to $NAMESPACE in PodMonitor resources + envsubst in the apply step improves reuse across clusters/namespaces.
Also applies to: 108-108, 118-118, 130-130
187-187: Including -n monitoring in port-forward commands is correctThis avoids relying on default namespace selection and reduces operator error.
Also applies to: 198-198
171-171: Grafana ConfigMap path updated — file present and labeled for auto-discoveryVerified: deploy/metrics/k8s/grafana-dynamo-dashboard-configmap.yaml exists and contains
grafana_dashboard: "1"(lines 9–11). No changes required.components/planner/src/dynamo/planner/utils/planner_core.py (3)
362-362: Awaiting observe_metrics in the loop is correctSwitching observe_metrics to async and awaiting it in the loop keeps sequencing deterministic before make_adjustments runs.
474-479: Same potential defaults issue here: prometheus_port may be undefinedThe main block also uses SLAPlannerDefaults.prometheus_port. Ensure it exists or default to 0.
Apply this diff if needed:
- parser.add_argument( - "--prometheus-port", - type=int, - default=SLAPlannerDefaults.prometheus_port, - help="Prometheus port for metrics server (0 to disable)", - ) + parser.add_argument( + "--prometheus-port", + type=int, + default=0, + help="Prometheus port for metrics server (0 to disable)", + )You can verify the attribute with the same script provided for planner_sla.py.
24-25: Verify runtime image or dependency manifests include "prometheus-client"The file components/planner/src/dynamo/planner/utils/planner_core.py now imports prometheus_client (Gauge, start_http_server). If the runtime image or dependency manifests don't include the pip package prometheus-client, the import will fail at module import time (before the try/except around start_http_server).
- Location to check:
- components/planner/src/dynamo/planner/utils/planner_core.py — lines ~24–25:
from prometheus_client import Gauge, start_http_server- What I ran: attempted to find Dockerfile and requirements.* but both were not present in the workspace, so I could not confirm whether the image includes the package.
- Please verify (run in repo root):
- rg -n --hidden -S 'promheus_client|prometheus-client' || true
- rg -n --hidden -S 'Dockerfile|requirements|pyproject.toml|setup.cfg|Pipfile|environment.yml' || true
- check any image build steps in .github/workflows or infra folders for pip install steps
If the package is missing, add prometheus-client to your dependency manifest or ensure the Dockerfile installs it (or catch ImportError around the import if a non-fatal absence is acceptable).
…/sla-planner-bench
…o/dynamo into hzhou/sla-planner-bench
…se num_d/p to k8s metrics (#2454) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Summary by CodeRabbit