-
Notifications
You must be signed in to change notification settings - Fork 690
feat: revamp kubernetes doc #3173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDocumentation updates across Kubernetes guides: retargeted links to /docs/kubernetes, added disaggregated multi-node entries for several backends, introduced Helm customization guidance, clarified multinode prerequisites (Grove and Kai Scheduler), adjusted an install command, and standardized Grove terminology from PodGangSet to PodCliqueSet. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/kubernetes/installation_guide.md (1)
68-82: Standardize NAMESPACE across Kubernetes docs (choose one canonical value or a single placeholder)Multiple docs export different NAMESPACE values — consolidate to a single namespace or consistent placeholder to avoid copy/paste "not found" errors. Occurrences to update:
- docs/kubernetes/installation_guide.md — line 72: export NAMESPACE=dynamo-kubernetes; line 111: export NAMESPACE=dynamo-cloud
- docs/kubernetes/README.md — line 26: export NAMESPACE=dynamo-kubernetes; line 55: export NAMESPACE=dynamo-cloud
- docs/kubernetes/metrics.md — line 56: export NAMESPACE=dynamo
- docs/kubernetes/fluxcd.md — line 70: export NAMESPACE=
- docs/kubernetes/sla_planner_deployment.md — line 32: export NAMESPACE=your-namespace
- docs/kubernetes/gke_setup.md — line 15: export NAMESPACE=your-k8s-namespace
Choose a canonical value or a single clear placeholder and update the listed files (and top-level README) accordingly.
🧹 Nitpick comments (11)
docs/kubernetes/grove.md (2)
42-46: Minor style nit: tighten phrasing.“within a PodCliqueSet” is implied; consider trimming for brevity.
- PodCliques and PodCliqueScalingGroups allow users to specify flexible gang-scheduling requirements at multiple levels within a PodCliqueSet to prevent resource deadlocks and ensure all components of a disaggregated system start together. + PodCliques and PodCliqueScalingGroups allow flexible gang-scheduling at multiple levels in a PodCliqueSet to prevent deadlocks and ensure all components start together.
90-95: Standardize "KAI" vs "Kai" capitalization across docsMixed usage found: docs/kubernetes/grove.md uses "KAI Scheduler" (lines 90, 92, 100) while docs/kubernetes/installation_guide.md uses "Kai Scheduler" (lines 85–88). Pick one form and make all occurrences consistent — recommend "KAI Scheduler" to match the linked NVIDIA repo.
docs/kubernetes/fluxcd.md (3)
21-22: Fix grammar in Step 1 line.-First, follow to [See Install Dynamo Cloud](/docs/kubernetes/installation_guide.md). +First, follow the [Dynamo Kubernetes Platform Installation Guide](/docs/kubernetes/installation_guide.md).
3-3: Align example link path with backend docs layout.Other docs reference backend deploy READMEs under deploy/. Update for consistency.
-This section describes how to use FluxCD ... We'll use the [aggregated vLLM example](../../../components/backends/vllm/README.md) to demonstrate the workflow. +This section describes how to use FluxCD ... We'll use the [aggregated vLLM example](../../../components/backends/vllm/deploy/README.md) to demonstrate the workflow.
73-74: Use canonical kubectl resource name (lowercase plural).Kubernetes CLI commonly uses plural, lowercase resource names for CRDs.
-kubectl get dynamographdeployment llm-agg -n $NAMESPACE +kubectl get dynamographdeployments llm-agg -n $NAMESPACEdocs/kubernetes/dynamo_operator.md (1)
30-31: Likely broken anchor (#overview).installation_guide.md doesn’t define an “Overview” header. Link to the guide root or an existing section.
-[See installation steps](/docs/kubernetes/installation_guide.md#overview) +[See installation steps](/docs/kubernetes/installation_guide.md)docs/kubernetes/installation_guide.md (3)
35-47: Helm customization note: clarify multi-file and type handling.Small clarity gains: mention multiple -f files and --set-string for literal strings.
-All helm install commands could be overridden by either setting the values.yaml file or by passing in your own values.yaml: +You can customize any helm install by supplying one or more values files or inline flags: @@ -helm install ... - -f your-values.yaml +helm install ... \ + -f your-values.yaml \ # can be repeated multiple times @@ - --set "your-value=your-value" + --set "your-value=your-value" \ + --set-string "someString=00123" # preserve leading zeros, etc.
84-92: Nice: explicit flags to enable Grove and Kai. Add a full command example.Including the flags within a full helm command reduces user error.
+Example: +```bash +helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \ + --namespace ${NAMESPACE} --create-namespace \ + --set "grove.enabled=true" \ + --set "kai-scheduler.enabled=true" +```
200-211: Tiny wording/case nit in troubleshooting.-just add the following to the helm install command: +Just add the following to the helm install command:docs/kubernetes/README.md (2)
55-56: Namespace mismatch with earlier section.Earlier you export NAMESPACE=dynamo-kubernetes; here it’s dynamo-cloud. Pick one.
-export NAMESPACE=dynamo-cloud +export NAMESPACE=dynamo-kubernetes
61-66: Use canonical kubectl resource name (lowercase plural).Mirror the suggestion made in fluxcd.md for consistency.
-kubectl get dynamoGraphDeployment -n ${NAMESPACE} +kubectl get dynamographdeployments -n ${NAMESPACE}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/kubernetes/README.md(3 hunks)docs/kubernetes/dynamo_operator.md(1 hunks)docs/kubernetes/fluxcd.md(2 hunks)docs/kubernetes/grove.md(2 hunks)docs/kubernetes/installation_guide.md(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - dynamo
- GitHub Check: Build and Test - sglang
🔇 Additional comments (8)
docs/kubernetes/grove.md (1)
22-27: Terminology update looks consistent.PodCliqueSet description reads well and aligns with the rest of the doc.
docs/kubernetes/fluxcd.md (1)
7-7: Good: prerequisite now points to the Kubernetes install guide.docs/kubernetes/dynamo_operator.md (1)
26-27: API reference link update: LGTM.docs/kubernetes/installation_guide.md (1)
145-149: Switch to helm install: LGTM.Install path reads well and matches earlier steps.
docs/kubernetes/README.md (4)
39-39: Good cross-link to the Installation Guide.
47-50: Backend table additions: LGTM.The multi‑node options look consistent across backends.
76-85: Kubernetes docs links: LGTM.
169-176: Verify Helm charts path; both deploy/helm/README.md and deploy/cloud/helm/platform/README.md existBoth files are present in the repo; installation_guide refers to the cloud path — update docs/kubernetes/README.md to point to /deploy/cloud/helm/platform/README.md for consistency, or confirm the canonical location.
- **[Helm Charts](/deploy/helm/README.md)** - For advanced users + **[Helm Charts](/deploy/cloud/helm/platform/README.md)** - For advanced users
|
I think CI caught some broken links. also, can you grep for docs/guides/dynamo_deploy across the codebase to see if there's any lingering references? also left a comment re: bitnami otherwise, awesome changes & LGTM! i think this will help us quite a bit :) |
Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
Signed-off-by: Julien Mancuso <161955438+julienmancuso@users.noreply.github.com> Co-authored-by: hhzhang16 <54051230+hhzhang16@users.noreply.github.com> Signed-off-by: Kyle H <kylhuang@nvidia.com>
Overview:
revamp kubernetes doc
Summary by CodeRabbit