-
Notifications
You must be signed in to change notification settings - Fork 690
fix: small planner manifest/doc fixes #3129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates profiler job to use a runtime config path variable, adjusts vLLM planner deployment env and PVC mount, adds a conditional skip for a specific manifest in the benchmarking setup script, and restructures pre-deployment profiling docs to a simplified, single-path configuration workflow. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant SB as setup_benchmarking_resources.sh
participant K as kubectl
U->>SB: Run setup script
loop For each manifest
SB->>SB: If basename == "pvc-access-pod.yaml"?
alt Skip specific manifest
SB-->>U: Log "Skipping pvc-access-pod.yaml"
else Apply manifest
SB->>K: kubectl apply -f <manifest> (via envsubst if available)
K-->>SB: Apply result
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
deploy/utils/setup_benchmarking_resources.sh (1)
73-78: Fix trailing whitespace to unblock pre-commit and CI.The pre-commit hook flagged trailing whitespace in this block.
Apply:
- # Skip pvc-access-pod.yaml as it's managed by inject_manifest.py - if [[ "$(basename "$mf")" == "pvc-access-pod.yaml" ]]; then - log "Skipping $mf (managed by inject_manifest.py)" - continue - fi - + # Skip pvc-access-pod.yaml as it's managed by inject_manifest.py + if [[ "$(basename "$mf")" == "pvc-access-pod.yaml" ]]; then + log "Skipping $mf (managed by inject_manifest.py)" + continue + fiOptional hardening (not required):
-for mf in "$(dirname "$0")/manifests"/*.yaml; do +shopt -s nullglob +for mf in "$(dirname "$0")/manifests"/*.yaml; dodocs/benchmarks/pre_deployment_profiling.md (5)
141-152: Use headings instead of bold labels (markdownlint MD036).Convert the bold “Step 3” line to a heading.
-**Step 3: Define the container image and config path** +### Step 3: Define the container image and config path
153-157: Use headings instead of bold labels (markdownlint MD036).-**Step 4: Run profiling (required)** +### Step 4: Run profiling (required)
159-164: Use headings instead of bold labels (markdownlint MD036).-**Step 5: Wait for profiling to complete** +### Step 5: Wait for profiling to complete
145-146: Avoid a known-broken example image tag in docs.The note says 0.4.1 is broken; using it in the example will derail users. Replace with a known-good tag or phrase as “use a working tag for your cluster.”
If you want, I can update the example once you confirm the current recommended tag.
150-151: Call out required non-empty var before envsubst.Add an explicit guard so users don’t apply a manifest with an empty --config.
export DGD_CONFIG_FILE=/data/configs/disagg.yaml # should be the same path you set for --dest in Step 1 +test -n "$DGD_CONFIG_FILE" || { echo "DGD_CONFIG_FILE must be set"; exit 1; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
benchmarks/profiler/deploy/profile_sla_job.yaml(1 hunks)components/backends/vllm/deploy/disagg_planner.yaml(2 hunks)deploy/utils/setup_benchmarking_resources.sh(1 hunks)docs/benchmarks/pre_deployment_profiling.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/benchmarks/pre_deployment_profiling.md
141-141: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
153-153: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
159-159: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3129/merge) by tmonty12.
deploy/utils/setup_benchmarking_resources.sh
[error] 75-75: Trailing whitespace check failed. The pre-commit hook 'trailing-whitespace' modified this file. Please commit the changes and re-run pre-commit.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
benchmarks/profiler/deploy/profile_sla_job.yaml (1)
32-32: Good switch to env-driven config; add a guard to avoid empty substitution.Using ${DGD_CONFIG_FILE} is correct. Please ensure Step 4 fails fast if the var is unset so envsubst doesn’t inject an empty value.
Add before applying in docs:
: "${DGD_CONFIG_FILE:?Set DGD_CONFIG_FILE (PVC path) before running envsubst}"components/backends/vllm/deploy/disagg_planner.yaml (1)
53-53: Broader PVC mount looks fine; confirm paths used elsewhere.Mounting /data (vs /data/profiling_results) matches planner args pointing to /data/profiling_results and the injector writing under /data. Looks consistent.
Double-check any scripts assuming the previous subpath mount.
Signed-off-by: tmontfort <tmontfort@nvidia.com>
Signed-off-by: tmontfort <tmontfort@nvidia.com>
Signed-off-by: tmontfort <tmontfort@nvidia.com>
Signed-off-by: tmontfort <tmontfort@nvidia.com>
6f58daa to
4f8973f
Compare
Signed-off-by: tmontfort <tmontfort@nvidia.com>
Overview:
Going through the planner pre deployment profiling and launching the vllm disagg planner DGD -
/components/backends/vllm/deploy/disagg_planner.yaml, uncovered some small issues and doc inconsistencies. This PR addresses these.Details:
benchmarks/profiler/deploy/profile_sla_job.yaml- config set to env varDGD_CONFIG_FILEas it's dependent on the previousinject_manifest.py--destarg.components/backends/vllm/deploy/disagg_planner.yaml- fixes PVC mount for planner to consume. Also, need to specifyPROMETHEUS_PORTas operator automatically createsServicefor process running on port 8000 (default prometheus port is 9090 - planner metric scraping will hang without this env var set). Thinking about longer term solution here...deploy/utils/setup_benchmarking_resources.sh- avoids creating pvc access pod asinject_manifest.pywill delete and recreate (saves ~30s)Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Chores
Documentation