-
Notifications
You must be signed in to change notification settings - Fork 692
feat: profiling PVC updates for better UX #2402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a Kubernetes PVC access pod manifest, two new Python CLI tools to inject a disagg config into a PVC and to download profiling results from it, and updates documentation to reflect the new access pod workflow, commands, paths, and troubleshooting notes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant DL as download_pvc_results.py
participant K as kubectl
participant API as Kubernetes API
participant P as pvc-access-pod
participant PVC as profiling-pvc
U->>DL: Run CLI (namespace, output-dir, [--no-config])
DL->>K: kubectl auth/namespace check
K->>API: Validate access
DL->>K: kubectl get/apply pvc-access-pod
K->>API: Ensure pod Running
DL->>K: kubectl exec ls /profiling_results
K->>P: List files
P->>PVC: Read directory
DL->>K: kubectl cp P:/profiling_results/... -> local
K->>P: Stream files
P->>PVC: Read files
DL-->>U: Summary + README generated
sequenceDiagram
autonumber
participant U as User
participant INJ as inject_disagg_config.py
participant K as kubectl
participant API as Kubernetes API
participant P as pvc-access-pod
participant PVC as profiling-pvc
U->>INJ: Run CLI (namespace, [--disagg-config], [--target-path], [--cleanup])
INJ->>K: kubectl auth/namespace check
K->>API: Validate access
INJ->>K: kubectl get/apply pvc-access-pod
K->>API: Ensure pod Running
INJ->>K: kubectl cp local disagg.yaml -> P:/profiling_results/...
K->>P: Transfer file
P->>PVC: Write file
INJ->>K: kubectl exec ls target path
K->>P: Verify file exists
INJ-->>U: Report target path and optional cleanup
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (11)
benchmarks/profiler/deploy/pvc-access-pod.yaml (2)
31-31: Add newline at end of file- restartPolicy: Never + restartPolicy: Never +
14-16: Consider using a pre-built image with tools installedInstalling packages at runtime (
apt-get update && apt-get install) adds startup delay and network dependency. Consider using a pre-built image with curl and jq already installed.Example alternatives:
- Use
curlimages/curlimage which has curl pre-installed- Build a custom image FROM ubuntu:22.04 with RUN apt-get commands
- Use
alpinewith pre-installed tools for smaller footprintbenchmarks/profiler/inject_disagg_config.py (2)
237-237: Add newline at end of fileif __name__ == "__main__": main() +
62-70: Improve error handling in deploy_access_podThe current implementation uses a bare
except subprocess.CalledProcessErrorwithpass, which could mask other issues. The function should be more explicit about what it's catching.# Check if pod already exists - try: - result = run_command( - ["kubectl", "get", "pod", pod_name, "-n", namespace], capture_output=True - ) - print(f"✓ Access pod '{pod_name}' already exists") - return - except subprocess.CalledProcessError: - # Pod doesn't exist, deploy it - pass + result = subprocess.run( + ["kubectl", "get", "pod", pod_name, "-n", namespace, "-o", "jsonpath={.status.phase}"], + capture_output=True, text=True, check=False + ) + + if result.returncode == 0: + phase = result.stdout.strip() + if phase == "Running": + print(f"✓ Access pod '{pod_name}' already exists and is running") + return + elif phase: + print(f"Access pod exists but is in {phase} state, will redeploy...") + run_command(["kubectl", "delete", "pod", pod_name, "-n", namespace, "--wait"], capture_output=True)benchmarks/profiler/download_pvc_results.py (3)
413-413: Add newline at end of fileif __name__ == "__main__": main() +
28-32: Use consistent type hints throughout the fileThe file uses
List[str]from typing module but Python 3.9+ supports nativelist[str]. Sinceinject_disagg_config.pyuseslist[str], maintain consistency.-from typing import List +# Remove typing import if not needed for other types def run_command( - cmd: List[str], capture_output: bool = True + cmd: list[str], capture_output: bool = True ) -> subprocess.CompletedProcess:Apply similar changes to other function signatures using
List[str].
291-294: Improve exception handling specificityUsing bare
except Exceptioncan mask unexpected errors. Be more specific about what exceptions you're catching.- except Exception as e: - # File doesn't exist or failed to download, skip silently - print(f" ⚠️ Skipped {file_path.split('/')[-1]}: {e}") - pass + except subprocess.CalledProcessError as e: + # File doesn't exist or failed to download + print(f" ⚠️ Skipped {file_path.split('/')[-1]}: {e}") + except Exception as e: + # Unexpected error, log but continue + print(f" ⚠️ Unexpected error for {file_path.split('/')[-1]}: {e}")docs/architecture/pre_deployment_profiling.md (4)
79-79: Use proper heading syntax instead of emphasisMultiple lines use bold emphasis (
**text**) where heading syntax would be more appropriate for document structure.-**Step 1: Configure container image** +### Step 1: Configure container image -**Option A: Use pre-built image with custom config injection (recommended)** +#### Option A: Use pre-built image with custom config injection (recommended) -**Option B: Build custom image (only if you need code changes)** +#### Option B: Build custom image (only if you need code changes) -**Option 1: Automated Download (Recommended)** +##### Option 1: Automated Download (Recommended) -**Option 2: Manual Download** +##### Option 2: Manual DownloadAlso applies to: 83-83, 118-118, 226-226, 248-248
110-112: Use consistent list style markersThe document mixes asterisk (
*) and dash (-) for unordered lists. Maintain consistency by using asterisks throughout.This approach allows you to: -- Customize DGD configurations without rebuilding container images -- Test different model configurations easily -- Version control your DGD configs alongside your code +* Customize DGD configurations without rebuilding container images +* Test different model configurations easily +* Version control your DGD configs alongside your code The script will: -- Deploy a temporary access pod (auto-deletes after 5 minutes) -- Scan for relevant files (*.png, *.npz, *.yaml) -- Download all files maintaining directory structure -- Generate a README.md with file descriptions -- Clean up automatically +* Deploy a temporary access pod (auto-deletes after 5 minutes) +* Scan for relevant files (*.png, *.npz, *.yaml) +* Download all files maintaining directory structure +* Generate a README.md with file descriptions +* Clean up automaticallyAlso applies to: 242-246
243-243: Remove spaces inside emphasis markers-- Scan for relevant files (*.png, *.npz, *.yaml) +- Scan for relevant files (*.png, *.npz, *.yaml)
114-116: Remove blank line inside blockquoteMarkdown blockquotes should not contain blank lines between consecutive quoted lines.
> **Important**: For profiling, disagg configs should be run with Grove disabled by adding the annotation `nvidia.com/enable-grove: "false"` to avoid alpha Grove status issues. - > **Note**: The default location in the PVC is `/profiling_results/disagg.yaml`. If you don't inject a config, the profiler will fall back to the built-in config at `/workspace/components/backends/vllm/deploy/disagg.yaml`.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
benchmarks/profiler/deploy/pvc-access-pod.yaml(1 hunks)benchmarks/profiler/download_pvc_results.py(1 hunks)benchmarks/profiler/inject_disagg_config.py(1 hunks)docs/architecture/pre_deployment_profiling.md(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
benchmarks/profiler/inject_disagg_config.py (1)
benchmarks/profiler/download_pvc_results.py (4)
check_kubectl_access(50-54)deploy_access_pod(57-135)cleanup_access_pod(297-300)main(354-408)
🪛 Checkov (3.2.334)
benchmarks/profiler/deploy/pvc-access-pod.yaml
[MEDIUM] 4-31: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 4-31: Minimize the admission of root containers
(CKV_K8S_23)
🪛 YAMLlint (1.37.1)
benchmarks/profiler/deploy/pvc-access-pod.yaml
[error] 31-31: no new line character at the end of file
(new-line-at-end-of-file)
🪛 GitHub Actions: Copyright Checks
benchmarks/profiler/download_pvc_results.py
[error] 1-1: Copyright header missing/invalid in file. Step: pwsh /workspace/.github/workflows/copyright-check.ps1.
benchmarks/profiler/inject_disagg_config.py
[error] 1-1: Copyright header missing/invalid in file. Step: pwsh /workspace/.github/workflows/copyright-check.ps1.
🪛 markdownlint-cli2 (0.17.2)
docs/architecture/pre_deployment_profiling.md
79-79: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
83-83: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
110-110: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
111-111: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
112-112: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
115-115: Blank line inside blockquote
(MD028, no-blanks-blockquote)
118-118: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
226-226: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
242-242: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
243-243: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
243-243: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
244-244: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
245-245: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
246-246: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
248-248: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
|
Could you define "PVC" in the description? |
Done, let me know if more detail is needed @grahamking |
Overview:
Use scripts to provide better result storage access and to improve user profiling UX.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation