Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Sep 15, 2025

Overview:

use official Grove 0.1.0-alpha release

Summary by CodeRabbit

  • New Features

    • None
  • Refactor

    • Renamed Grove resource terminology from PodGangSet to PodCliqueSet across the operator and reconciliation flows.
    • Updated permissions to target the new PodCliqueSet resource.
    • CLI help text updated to reflect PodCliqueSet.
  • Documentation

    • Updated installation and feature guidance to reference PodCliqueSet instead of PodGangSet.
  • Chores

    • Upgraded Grove operator API and related dependencies.
    • Bumped Helm chart dependency version.

Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 15, 2025

Walkthrough

This PR renames Grove PodGangSet to PodCliqueSet across operator code, RBAC, Helm, and docs; updates generator and canonicalization functions and types; adjusts controller ownership and RBAC annotations; updates module dependencies and Helm chart dependency; and aligns tests and README accordingly.

Changes

Cohort / File(s) Summary of changes
Helm platform dependency
deploy/cloud/helm/platform/Chart.yaml
Bumps grove-charts dependency version to v0.0.0-f19bde6.
Operator RBAC (Helm templates)
deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml
RBAC resources updated: grove.io podgangsetspodcliquesets; verbs unchanged.
Operator RBAC (kustomize config)
deploy/cloud/operator/config/rbac/role.yaml
ClusterRole rules: podgangsetspodcliquesets; verbs unchanged.
Operator entrypoint
deploy/cloud/operator/cmd/main.go
Help text updated: “PodGangSets” → “PodCliqueSets”; no logic changes.
Module/dependencies
deploy/cloud/operator/go.mod
Updates github.com/NVIDIA/grove/operator/api to ...f19bde6c0ca8; promotes github.com/go-logr/logr v1.4.2 to direct dep.
Controller: DynamoGraphDeployment reconciler
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
All usages, RBAC annotations, owner refs, and SyncResource types switched from PodGangSet to PodCliqueSet; calls generator GenerateGrovePodCliqueSet.
Controller common utilities
deploy/cloud/operator/internal/controller_common/podgangset.go
Renames and retargets CanonicalizePodGangSetCanonicalizePodCliqueSet; operates on PodCliqueSet.
Controller predicates/docs
deploy/cloud/operator/internal/controller_common/predicate.go
Comment update: references PodCliqueSet; no code changes.
Dynamo graph generation
deploy/cloud/operator/internal/dynamo/graph.go
Public API rename: GenerateGrovePodGangSetGenerateGrovePodCliqueSet; all types and helper signatures updated to PodCliqueSet; canonicalization path updated.
Dynamo graph tests
deploy/cloud/operator/internal/dynamo/graph_test.go
Tests updated to new types and function names: PodCliqueSet, PodCliqueSetSpec, PodCliqueSetTemplateSpec, and GenerateGrovePodCliqueSet.
Docs
deploy/helm/README.md
Documentation updated to reference PodCliqueSet instead of PodGangSet.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor K as K8s Events
  participant M as Controller Manager
  participant R as DynamoGraphDeployment Reconciler
  participant G as GenerateGrovePodCliqueSet
  participant A as K8s API Server

  K->>M: Resource change: DynamoGraphDeployment
  M->>R: Reconcile(request)
  Note over R: Build desired Grove resource
  R->>G: GenerateGrovePodCliqueSet(ctx, dgd, cfg, secrets)
  G-->>R: *grovev1alpha1.PodCliqueSet
  Note over R,A: SyncResource (create/update) PodCliqueSet
  R->>A: Apply PodCliqueSet
  A-->>R: Status/result
  R-->>M: Requeue/Done

  rect rgba(227,245,255,0.5)
    Note right of G: New/renamed: PodCliqueSet types/funcs
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

A whisk of code, a clique we weave,
From gangs to cliques, we now believe.
Helm bells ring, controllers sing,
RBAC doors the rabbits spring.
Tests all nod, the charts align—
Hop, hop, shipped! Our cliques divine. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description contains only a one-line Overview ("use official Grove 0.1.0-alpha release") and does not follow the repository's required template: it is missing the "Details", "Where should the reviewer start?", and "Related Issues" sections, which is insufficient given the changes include dependency bumps and multiple API/type renames. Please expand the PR description to follow the template by adding a "Details" section that enumerates the key changes (notably the Grove dependency bump and PodGangSet → PodCliqueSet renames), a "Where should the reviewer start?" section pointing to high-impact files (for example deploy/cloud/operator/go.mod, deploy/cloud/helm/platform/Chart.yaml, deploy/cloud/operator/internal/dynamo/graph.go, and deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go), and a "Related Issues" section linking any issue/PR numbers; also include testing notes and any compatibility or migration guidance for consumers.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "feat: use official Grove 0.1.0-alpha release" succinctly and accurately summarizes the primary change — switching the project to the official Grove 0.1.0-alpha release — which matches the dependency updates in go.mod and Chart.yaml and is clear for a teammate scanning history.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
deploy/cloud/operator/config/rbac/role.yaml (1)

113-121: Fix remaining PodGangSet references — RBAC rename incomplete

RBAC change to podcliquesets is fine, but leftover PodGangSet references remain; update to PodCliqueSet / podcliquesets.

  • deploy/helm/chart/templates/grove-podgangset.yaml:18 — kind: PodGangSet → PodCliqueSet.
  • docs/guides/dynamo_deploy/grove.md:22, 42, 45 — update PodGangSet mentions.

Re-run: rg -nP '(?i)podgangset(s)?' to confirm no matches.

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)

424-430: Controller ownership should match upstream CRD kind.

If you target the official release, Owns must reference PodGangSet, not PodCliqueSet. (pkg.go.dev)

Apply this diff:

-		ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodCliqueSet{}, builder.WithPredicates(predicate.Funcs{
+		ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodGangSet{}, builder.WithPredicates(predicate.Funcs{
deploy/cloud/operator/internal/dynamo/graph.go (2)

320-333: applyCliqueStartupDependencies signature diverges from upstream types.

Parameter typed as *grovev1alpha1.PodCliqueSet will not match the official API, which exposes PodGangSet as the top‑level. Use *grovev1alpha1.PodGangSet here to keep compatibility. (pkg.go.dev)

Apply this diff:

-// - Sets the PodCliqueSet StartupType to Explicit if any dependencies are configured
+// - Sets the PodGangSet StartupType to Explicit if any dependencies are configured
 func applyCliqueStartupDependencies(
-	gangSet *grovev1alpha1.PodCliqueSet,
+	gangSet *grovev1alpha1.PodGangSet,

883-991: Top‑level resource rename to PodCliqueSet breaks against Grove 0.1.0‑alpha.

Official v1alpha1 still defines PodGangSet (spec/template types too). Returning/constructing PodCliqueSet here will fail unless you carry a local alias. Revert function name, return type, and canonicalizer to PodGangSet variants. (pkg.go.dev)

Apply this diff:

-// GenerateGrovePodCliqueSet generates a Grove PodCliqueSet for the given deployment, supporting both single-node and multinode cases.
-func GenerateGrovePodCliqueSet(
+// GenerateGrovePodGangSet generates a Grove PodGangSet for the given deployment, supporting both single-node and multinode cases.
+func GenerateGrovePodGangSet(
 	ctx context.Context,
 	dynamoDeployment *v1alpha1.DynamoGraphDeployment,
 	controllerConfig controller_common.Config,
 	secretsRetriever SecretsRetriever,
-) (*grovev1alpha1.PodCliqueSet, error) {
-	gangSet := &grovev1alpha1.PodCliqueSet{}
+) (*grovev1alpha1.PodGangSet, error) {
+	gangSet := &grovev1alpha1.PodGangSet{}
@@
-	return controller_common.CanonicalizePodCliqueSet(gangSet), nil
+	return controller_common.CanonicalizePodGangSet(gangSet), nil

Note: adjust any references to PodCliqueSetSpec/TemplateSpec back to PodGangSetSpec/TemplateSpec if they were renamed alongside. Ensure CanonicalizePodGangSet exists (it does upstream with PodGangSet). (pkg.go.dev)

deploy/cloud/operator/internal/dynamo/graph_test.go (3)

1051-1062: Tests target PodCliqueSet; upstream exposes PodGangSet.

To validate against official Grove 0.1.0‑alpha, rename the test and expected types to PodGangSet. Otherwise compilation will fail. (pkg.go.dev)

Apply this diff:

-func TestGenerateGrovePodCliqueSet(t *testing.T) {
+func TestGenerateGrovePodGangSet(t *testing.T) {
@@
-		want    *grovev1alpha1.PodCliqueSet
+		want    *grovev1alpha1.PodGangSet

1738-1765: Multinode test expectations should also use PodGangSet.

Mirror the earlier fixes here: change expected type to PodGangSet and associated template/spec types.

-			want: &grovev1alpha1.PodCliqueSet{
+			want: &grovev1alpha1.PodGangSet{

3102-3128: Generator call uses GenerateGrovePodCliqueSet.

Update to GenerateGrovePodGangSet to reflect upstream.

-			got, err := GenerateGrovePodCliqueSet(tt.args.ctx, tt.args.dynamoDeployment, tt.args.controllerConfig, nil)
+			got, err := GenerateGrovePodGangSet(tt.args.ctx, tt.args.dynamoDeployment, tt.args.controllerConfig, nil)
🧹 Nitpick comments (5)
deploy/cloud/operator/go.mod (1)

9-9: Prefer the tagged Grove module version (if available).

If github.com/NVIDIA/grove/operator/api has a v0.1.0-alpha tag, pin to it instead of the pseudo-version for clearer provenance.

Suggested change (only if the tag exists):

-	github.com/NVIDIA/grove/operator/api v0.0.0-20250915162920-f19bde6c0ca8
+	github.com/NVIDIA/grove/operator/api v0.1.0-alpha

After switching, run: go mod tidy.

deploy/helm/README.md (1)

41-41: Docs updated to PodCliqueSet: LGTM; align version wording with pins.

README cites “Grove v0.1.0+” while chart/go.mod currently pin a commit/pseudo-version. Please align wording or switch dependencies to the 0.1.0‑alpha tag if available.

deploy/cloud/operator/internal/controller_common/podgangset.go (1)

9-19: Nit: rename param and use stable sort for deterministic ordering.

Minor polish: rename gangSet -> cliqueSet to match the type; use sort.SliceStable to preserve order for equal names.

Proposed change:

-func CanonicalizePodCliqueSet(gangSet *grovev1alpha1.PodCliqueSet) *grovev1alpha1.PodCliqueSet {
+func CanonicalizePodCliqueSet(cliqueSet *grovev1alpha1.PodCliqueSet) *grovev1alpha1.PodCliqueSet {
 	// sort cliques by name
-	sort.Slice(gangSet.Spec.Template.Cliques, func(i, j int) bool {
-		return gangSet.Spec.Template.Cliques[i].Name < gangSet.Spec.Template.Cliques[j].Name
+	sort.SliceStable(cliqueSet.Spec.Template.Cliques, func(i, j int) bool {
+		return cliqueSet.Spec.Template.Cliques[i].Name < cliqueSet.Spec.Template.Cliques[j].Name
 	})
 	// sort scaling groups by name
-	sort.Slice(gangSet.Spec.Template.PodCliqueScalingGroupConfigs, func(i, j int) bool {
-		return gangSet.Spec.Template.PodCliqueScalingGroupConfigs[i].Name < gangSet.Spec.Template.PodCliqueScalingGroupConfigs[j].Name
+	sort.SliceStable(cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs, func(i, j int) bool {
+		return cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs[i].Name < cliqueSet.Spec.Template.PodCliqueScalingGroupConfigs[j].Name
 	})
-	return gangSet
+	return cliqueSet
 }

Optional: consider renaming the file to podcliqueset.go for consistency.

deploy/cloud/operator/internal/dynamo/graph.go (1)

320-381: Minor: variable naming still uses “gangSet” while comments say PodCliqueSet.

If you keep the upstream PodGangSet, naming is consistent. If you later adopt a PodCliqueSet alias, rename gangSet -> pcs for clarity.

deploy/cloud/operator/internal/dynamo/graph_test.go (1)

4237-4307: StartsAfter dependency XTests: keep type names aligned with upstream.

These disabled tests construct a PodCliqueSet; switch to PodGangSet if you plan to re‑enable later.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd835b0 and 24342b7.

⛔ Files ignored due to path filters (1)
  • deploy/cloud/operator/go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • deploy/cloud/helm/platform/Chart.yaml (1 hunks)
  • deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (1 hunks)
  • deploy/cloud/operator/cmd/main.go (1 hunks)
  • deploy/cloud/operator/config/rbac/role.yaml (1 hunks)
  • deploy/cloud/operator/go.mod (1 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (3 hunks)
  • deploy/cloud/operator/internal/controller_common/podgangset.go (1 hunks)
  • deploy/cloud/operator/internal/controller_common/predicate.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/graph.go (3 hunks)
  • deploy/cloud/operator/internal/dynamo/graph_test.go (10 hunks)
  • deploy/helm/README.md (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (3)
deploy/cloud/operator/internal/dynamo/graph.go (2)
deploy/cloud/operator/internal/controller_common/predicate.go (1)
  • Config (50-63)
deploy/cloud/operator/internal/controller_common/podgangset.go (1)
  • CanonicalizePodCliqueSet (9-19)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (2)
deploy/cloud/operator/internal/dynamo/graph.go (2)
  • GenerateGrovePodCliqueSet (884-990)
  • Config (64-74)
deploy/cloud/operator/internal/controller_common/resource.go (1)
  • SyncResource (60-195)
deploy/cloud/operator/internal/dynamo/graph_test.go (1)
deploy/cloud/operator/internal/dynamo/graph.go (1)
  • GenerateGrovePodCliqueSet (884-990)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
deploy/cloud/operator/go.mod (1)

11-11: Direct logr dependency: LGTM.

Promoting go-logr/logr to a direct dep is appropriate if used in this module.

deploy/helm/README.md (1)

75-80: Feature table terminology: LGTM.

PodCliqueSet/LWS references read correctly and match the codebase rename.

deploy/cloud/operator/cmd/main.go (1)

162-162: Help text rename: LGTM.

Matches the Grove resource rename.

deploy/cloud/operator/internal/controller_common/predicate.go (1)

36-37: Doc update: LGTM.

Comment now correctly references PodCliqueSets.

deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (1)

122-130: RBAC rename to podcliquesets in Helm: LGTM.

This aligns with the CRD rename. Ensure Helm and kustomize RBAC stay in sync.

deploy/cloud/helm/platform/Chart.yaml (1)

43-43: Align Grove chart to the published v0.1.0-alpha OCI tag (if available)

Chart and go.mod are pinned to a commit pseudo-version (deploy/cloud/helm/platform/Chart.yaml — version v0.0.0-f19bde6; deploy/cloud/operator/go.mod — github.com/NVIDIA/grove/operator/api v0.0.0-20250915162920-f19bde6c0ca8). Docs mention Grove v0.1.0+. If the OCI chart tag v0.1.0-alpha is published, replace the Chart.yaml version (and update the module reference) for traceability:

-    version: v0.0.0-f19bde6
+    version: v0.1.0-alpha
deploy/cloud/operator/internal/dynamo/graph_test.go (1)

1318-1321: Env var naming still references “POD_GANG_SET”. Confirm intent.

You’re asserting DYNAMO_POD_GANG_SET_REPLICAS in expectations. If you keep upstream (PodGangSet), this is fine; if you intend to rebrand to “clique set,” consider a deprecation plan.

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)

184-208: Scale GVR mapping: confirmed correct.

consts.PodCliqueGVR and consts.PodCliqueScalingGroupGVR are group=grove.io, version=v1alpha1, resources=podcliques and podcliquescalinggroups (deploy/cloud/operator/internal/consts/consts.go).

@julienmancuso
Copy link
Contributor Author

/ok to test 24342b7

Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
@julienmancuso julienmancuso merged commit c382005 into main Sep 15, 2025
13 of 16 checks passed
@julienmancuso julienmancuso deleted the jsm/dep-381 branch September 15, 2025 20:55
julienmancuso added a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
saturley-hall pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: zhongdaor <zhongdaor@nvidia.com>
biswapanda pushed a commit that referenced this pull request Sep 16, 2025
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025
Signed-off-by: Julien Mancuso <jmancuso@nvidia.com>
Signed-off-by: Kristen Kelleher <kkelleher@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants