-
Notifications
You must be signed in to change notification settings - Fork 693
feat: use official Grove 0.1.0-alpha release #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
WalkthroughThis PR renames Grove PodGangSet to PodCliqueSet across operator code, RBAC, Helm, and docs; updates generator and canonicalization functions and types; adjusts controller ownership and RBAC annotations; updates module dependencies and Helm chart dependency; and aligns tests and README accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor K as K8s Events
participant M as Controller Manager
participant R as DynamoGraphDeployment Reconciler
participant G as GenerateGrovePodCliqueSet
participant A as K8s API Server
K->>M: Resource change: DynamoGraphDeployment
M->>R: Reconcile(request)
Note over R: Build desired Grove resource
R->>G: GenerateGrovePodCliqueSet(ctx, dgd, cfg, secrets)
G-->>R: *grovev1alpha1.PodCliqueSet
Note over R,A: SyncResource (create/update) PodCliqueSet
R->>A: Apply PodCliqueSet
A-->>R: Status/result
R-->>M: Requeue/Done
rect rgba(227,245,255,0.5)
Note right of G: New/renamed: PodCliqueSet types/funcs
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
deploy/cloud/operator/config/rbac/role.yaml (1)
113-121: Fix remaining PodGangSet references — RBAC rename incompleteRBAC change to podcliquesets is fine, but leftover PodGangSet references remain; update to PodCliqueSet / podcliquesets.
- deploy/helm/chart/templates/grove-podgangset.yaml:18 — kind: PodGangSet → PodCliqueSet.
- docs/guides/dynamo_deploy/grove.md:22, 42, 45 — update PodGangSet mentions.
Re-run: rg -nP '(?i)podgangset(s)?' to confirm no matches.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
424-430: Controller ownership should match upstream CRD kind.If you target the official release, Owns must reference PodGangSet, not PodCliqueSet. (pkg.go.dev)
Apply this diff:
- ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodCliqueSet{}, builder.WithPredicates(predicate.Funcs{ + ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodGangSet{}, builder.WithPredicates(predicate.Funcs{deploy/cloud/operator/internal/dynamo/graph.go (2)
320-333: applyCliqueStartupDependencies signature diverges from upstream types.Parameter typed as *grovev1alpha1.PodCliqueSet will not match the official API, which exposes PodGangSet as the top‑level. Use *grovev1alpha1.PodGangSet here to keep compatibility. (pkg.go.dev)
Apply this diff:
-// - Sets the PodCliqueSet StartupType to Explicit if any dependencies are configured +// - Sets the PodGangSet StartupType to Explicit if any dependencies are configured func applyCliqueStartupDependencies( - gangSet *grovev1alpha1.PodCliqueSet, + gangSet *grovev1alpha1.PodGangSet,
883-991: Top‑level resource rename to PodCliqueSet breaks against Grove 0.1.0‑alpha.Official v1alpha1 still defines PodGangSet (spec/template types too). Returning/constructing PodCliqueSet here will fail unless you carry a local alias. Revert function name, return type, and canonicalizer to PodGangSet variants. (pkg.go.dev)
Apply this diff:
-// GenerateGrovePodCliqueSet generates a Grove PodCliqueSet for the given deployment, supporting both single-node and multinode cases. -func GenerateGrovePodCliqueSet( +// GenerateGrovePodGangSet generates a Grove PodGangSet for the given deployment, supporting both single-node and multinode cases. +func GenerateGrovePodGangSet( ctx context.Context, dynamoDeployment *v1alpha1.DynamoGraphDeployment, controllerConfig controller_common.Config, secretsRetriever SecretsRetriever, -) (*grovev1alpha1.PodCliqueSet, error) { - gangSet := &grovev1alpha1.PodCliqueSet{} +) (*grovev1alpha1.PodGangSet, error) { + gangSet := &grovev1alpha1.PodGangSet{} @@ - return controller_common.CanonicalizePodCliqueSet(gangSet), nil + return controller_common.CanonicalizePodGangSet(gangSet), nilNote: adjust any references to PodCliqueSetSpec/TemplateSpec back to PodGangSetSpec/TemplateSpec if they were renamed alongside. Ensure CanonicalizePodGangSet exists (it does upstream with PodGangSet). (pkg.go.dev)
deploy/cloud/operator/internal/dynamo/graph_test.go (3)
1051-1062: Tests target PodCliqueSet; upstream exposes PodGangSet.To validate against official Grove 0.1.0‑alpha, rename the test and expected types to PodGangSet. Otherwise compilation will fail. (pkg.go.dev)
Apply this diff:
-func TestGenerateGrovePodCliqueSet(t *testing.T) { +func TestGenerateGrovePodGangSet(t *testing.T) { @@ - want *grovev1alpha1.PodCliqueSet + want *grovev1alpha1.PodGangSet
1738-1765: Multinode test expectations should also use PodGangSet.Mirror the earlier fixes here: change expected type to PodGangSet and associated template/spec types.
- want: &grovev1alpha1.PodCliqueSet{ + want: &grovev1alpha1.PodGangSet{
3102-3128: Generator call uses GenerateGrovePodCliqueSet.Update to GenerateGrovePodGangSet to reflect upstream.
- got, err := GenerateGrovePodCliqueSet(tt.args.ctx, tt.args.dynamoDeployment, tt.args.controllerConfig, nil) + got, err := GenerateGrovePodGangSet(tt.args.ctx, tt.args.dynamoDeployment, tt.args.controllerConfig, nil)
🧹 Nitpick comments (5)
deploy/cloud/operator/go.mod (1)
9-9: Prefer the tagged Grove module version (if available).If github.com/NVIDIA/grove/operator/api has a v0.1.0-alpha tag, pin to it instead of the pseudo-version for clearer provenance.
Suggested change (only if the tag exists):
- github.com/NVIDIA/grove/operator/api v0.0.0-20250915162920-f19bde6c0ca8 + github.com/NVIDIA/grove/operator/api v0.1.0-alphaAfter switching, run:
go mod tidy.deploy/helm/README.md (1)
41-41: Docs updated to PodCliqueSet: LGTM; align version wording with pins.README cites “Grove v0.1.0+” while chart/go.mod currently pin a commit/pseudo-version. Please align wording or switch dependencies to the 0.1.0‑alpha tag if available.
deploy/cloud/operator/internal/controller_common/podgangset.go (1)
9-19: Nit: rename param and use stable sort for deterministic ordering.Minor polish: rename gangSet -> cliqueSet to match the type; use sort.SliceStable to preserve order for equal names.
Proposed change:
-func CanonicalizePodCliqueSet(gangSet *grovev1alpha1.PodCliqueSet) *grovev1alpha1.PodCliqueSet { +func CanonicalizePodCliqueSet(cliqueSet *grovev1alpha1.PodCliqueSet) *grovev1alpha1.PodCliqueSet { // sort cliques by name - sort.Slice(gangSet.Spec.Template.Cliques, func(i, j int) bool { - return gangSet.Spec.Template.Cliques[i].Name < gangSet.Spec.Template.Cliques[j].Name + sort.SliceStable(cliqueSet.Spec.Template.Cliques, func(i, j int) bool { + return cliqueSet.Spec.Template.Cliques[i].Name < cliqueSet.Spec.Template.Cliques[j].Name }) // sort scaling groups by name - sort.Slice(gangSet.Spec.Template.PodCliqueScalingGroupConfigs, func(i, j int) bool { - return gangSet.Spec.Template.PodCliqueScalingGroupConfigs[i].Name < gangSet.Spec.Template.PodCliqueScalingGroupConfigs[j].Name + sort.SliceStable(cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs, func(i, j int) bool { + return cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs[i].Name < cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs[j].Name }) - return gangSet + return cliqueSet }Optional: consider renaming the file to podcliqueset.go for consistency.
deploy/cloud/operator/internal/dynamo/graph.go (1)
320-381: Minor: variable naming still uses “gangSet” while comments say PodCliqueSet.If you keep the upstream PodGangSet, naming is consistent. If you later adopt a PodCliqueSet alias, rename gangSet -> pcs for clarity.
deploy/cloud/operator/internal/dynamo/graph_test.go (1)
4237-4307: StartsAfter dependency XTests: keep type names aligned with upstream.These disabled tests construct a PodCliqueSet; switch to PodGangSet if you plan to re‑enable later.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
deploy/cloud/operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
deploy/cloud/helm/platform/Chart.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml(1 hunks)deploy/cloud/operator/cmd/main.go(1 hunks)deploy/cloud/operator/config/rbac/role.yaml(1 hunks)deploy/cloud/operator/go.mod(1 hunks)deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go(3 hunks)deploy/cloud/operator/internal/controller_common/podgangset.go(1 hunks)deploy/cloud/operator/internal/controller_common/predicate.go(1 hunks)deploy/cloud/operator/internal/dynamo/graph.go(3 hunks)deploy/cloud/operator/internal/dynamo/graph_test.go(10 hunks)deploy/helm/README.md(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/graph.go (2)
deploy/cloud/operator/internal/controller_common/predicate.go (1)
Config(50-63)deploy/cloud/operator/internal/controller_common/podgangset.go (1)
CanonicalizePodCliqueSet(9-19)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (2)
GenerateGrovePodCliqueSet(884-990)Config(64-74)deploy/cloud/operator/internal/controller_common/resource.go (1)
SyncResource(60-195)
deploy/cloud/operator/internal/dynamo/graph_test.go (1)
deploy/cloud/operator/internal/dynamo/graph.go (1)
GenerateGrovePodCliqueSet(884-990)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
deploy/cloud/operator/go.mod (1)
11-11: Direct logr dependency: LGTM.Promoting go-logr/logr to a direct dep is appropriate if used in this module.
deploy/helm/README.md (1)
75-80: Feature table terminology: LGTM.PodCliqueSet/LWS references read correctly and match the codebase rename.
deploy/cloud/operator/cmd/main.go (1)
162-162: Help text rename: LGTM.Matches the Grove resource rename.
deploy/cloud/operator/internal/controller_common/predicate.go (1)
36-37: Doc update: LGTM.Comment now correctly references PodCliqueSets.
deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (1)
122-130: RBAC rename to podcliquesets in Helm: LGTM.This aligns with the CRD rename. Ensure Helm and kustomize RBAC stay in sync.
deploy/cloud/helm/platform/Chart.yaml (1)
43-43: Align Grove chart to the published v0.1.0-alpha OCI tag (if available)Chart and go.mod are pinned to a commit pseudo-version (deploy/cloud/helm/platform/Chart.yaml — version v0.0.0-f19bde6; deploy/cloud/operator/go.mod — github.com/NVIDIA/grove/operator/api v0.0.0-20250915162920-f19bde6c0ca8). Docs mention Grove v0.1.0+. If the OCI chart tag v0.1.0-alpha is published, replace the Chart.yaml version (and update the module reference) for traceability:
- version: v0.0.0-f19bde6 + version: v0.1.0-alphadeploy/cloud/operator/internal/dynamo/graph_test.go (1)
1318-1321: Env var naming still references “POD_GANG_SET”. Confirm intent.You’re asserting DYNAMO_POD_GANG_SET_REPLICAS in expectations. If you keep upstream (PodGangSet), this is fine; if you intend to rebrand to “clique set,” consider a deprecation plan.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
184-208: Scale GVR mapping: confirmed correct.consts.PodCliqueGVR and consts.PodCliqueScalingGroupGVR are group=grove.io, version=v1alpha1, resources=podcliques and podcliquescalinggroups (deploy/cloud/operator/internal/consts/consts.go).
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Show resolved
Hide resolved
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Show resolved
Hide resolved
|
/ok to test 24342b7 |
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com> Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com> Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Overview:
use official Grove 0.1.0-alpha release
Summary by CodeRabbit
New Features
Refactor
Documentation
Chores