Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Jun 30, 2025

Overview:

simplify k8s deployment

Details:

  • removing image builder
  • removing dependency on dynamo artifacts
  • CRDs is now source of truth

Summary by CodeRabbit

  • New Features

    • Added support for dynamic retrieval of Docker image pull secrets per namespace and registry, improving secret management for deployments.
    • Introduced a utility for extracting hostnames from URLs.
    • Added new fields and methods for component type and image retrieval in deployment specifications.
  • Refactor

    • Simplified deployment reconciliation logic by removing dependencies on certain resources and centralizing metadata and secret retrieval.
    • Streamlined deployment generation to use service maps directly, with improved handling of namespaces and ingress specifications.
    • Removed the DynamoComponent resource and related controllers, API clients, and Helm charts, simplifying the operator and deployment architecture.
    • Cleaned up Helm charts by removing the API store component and related resources, ingress, and RBAC configurations.
    • Removed image build engine configurations and internal image references from operator settings and Helm values.
    • Updated RBAC roles to remove permissions related to removed resources.
    • Simplified operator code by removing unused constants, functions, and dependencies.
  • Bug Fixes

    • Made several spec fields optional, enhancing flexibility and compatibility with various deployment scenarios.
  • Chores

    • Updated Helm chart version for CRDs.
    • Enhanced Makefile to standardize CRD YAML files and annotations.
    • Removed redundant resource declarations and cleaned up dependencies.
  • Tests

    • Added and updated tests for Docker secret indexing and URL utilities.
    • Revised tests to align with refactored deployment logic and new secret retrieval mechanisms.

@github-actions github-actions bot added the feat label Jun 30, 2025
@pull-request-size pull-request-size bot removed the size/L label Jul 2, 2025
@julienmancuso julienmancuso changed the title feat: skip image builder if images are provided feat: simplify k8s deployment Jul 2, 2025
@julienmancuso julienmancuso marked this pull request as ready for review July 2, 2025 23:48
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 2, 2025

Walkthrough

This update removes the DynamoComponent resource, its controller, API, and CRD from the operator, consolidates deployment logic around DynamoComponentDeployment and DynamoGraphDeployment, introduces a DockerSecretIndexer for efficient Docker secret management, simplifies deployment generation, updates Helm charts and manifests by removing the API store and image builder components, and adds new utility functions and tests.

Changes

Files/Paths Change Summary
deploy/cloud/helm/crds/Chart.yaml Bumped Helm chart version from 0.1.6 to 0.1.7.
deploy/cloud/operator/Makefile Enhanced manifests target to post-process CRD YAMLs: remove "name" from required fields, add NVIDIA copyright/license header, and add Helm resource-policy annotation.
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go
Made several CRD fields optional via omitempty, added ComponentType field and GetImage method, updated JSON tags.
deploy/cloud/operator/cmd/main.go Added DockerSecretIndexer with periodic refresh; wired it into reconciler setup and event informer for Docker secrets.
deploy/cloud/operator/internal/common/url.go
url_test.go
Added GetHost function to extract host from URLs and corresponding unit tests.
deploy/cloud/operator/internal/consts/consts.go Removed many constants related to image builders, API store, and internal images; added new constants for component types and planner service account.
deploy/cloud/operator/internal/controller/common.go Removed getK8sName and isGoogleRegistry functions; added dockerSecretRetriever interface.
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go Deleted entire controller and related functions for DynamoComponent resource.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go Removed all dependencies on DynamoComponent resource; refactored methods to use only DynamoComponentDeployment; added Docker secret retriever logic.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go Updated tests to remove DynamoComponent references; added mock Docker secret retriever; adjusted test inputs to use ExtraPodSpec.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go Removed fetching and syncing of DynamoComponent; reconciler now manages only DynamoComponentDeployment resources.
deploy/cloud/operator/internal/dynamo/graph.go
graph_test.go
Removed external API store config fetching and parsing; simplified deployment generation to use only deployment spec; updated tests accordingly.
deploy/cloud/operator/internal/secrets/docker.go
docker_test.go
Added DockerSecretIndexer for indexing and retrieving Docker config secrets by namespace and registry; included unit tests.
deploy/cloud/helm/platform/Chart.yaml Updated chart version from 25.2.0-rc3 to 0.1.0; removed dynamo-api-store dependency; bumped dynamo-operator dependency version.
deploy/cloud/helm/platform/components/api-store/ (all files) Removed entire api-store Helm chart component: Chart.yaml, templates, values, and .helmignore.
deploy/cloud/helm/platform/components/operator/Chart.yaml Bumped dynamo-operator chart version from 0.1.7 to 0.1.8.
deploy/cloud/helm/platform/components/operator/templates/buildkit.yaml Removed BuildKit StatefulSet, Service, and ConfigMap Helm templates.
deploy/cloud/helm/platform/components/operator/templates/deployment.yaml Removed Docker config secret env var, volume mount, and volume from operator deployment template.
deploy/cloud/helm/platform/components/operator/templates/image-builder-serviceaccount.yaml Removed image-builder ServiceAccount Helm template.
deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml Removed dynamocomponents resource permissions from RBAC rules.
deploy/cloud/helm/platform/components/operator/templates/secret-env.yaml Removed many environment variables related to API store, Docker registry, internal images, build engine, and caching from secret-env template.
deploy/cloud/helm/platform/components/operator/values.yaml Removed API store config, internal images references, image build engine, and related settings from values.yaml.
deploy/cloud/helm/platform/values.yaml Removed apiStore section and related operator image build settings from platform values.yaml.
deploy/cloud/operator/PROJECT Removed redundant DynamoComponent resource entry.
deploy/cloud/operator/api/dynamo/api_store_client/client.go Deleted API store client code for fetching DynamoComponent and presigned URLs.
deploy/cloud/operator/api/dynamo/api_store_client/http.go Deleted HTTP client wrapper for API store client.
deploy/cloud/operator/api/v1alpha1/dynamocomponent_types.go Deleted DynamoComponent CRD API types, constants, methods, and registration.
deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go Removed deepcopy functions related to DynamoComponent types.
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponents.yaml Deleted DynamoComponent CRD manifest YAML.
deploy/cloud/operator/config/rbac/role.yaml Removed permissions related to dynamocomponents, pods, secrets, and serviceaccounts from ClusterRole.
deploy/cloud/operator/go.mod Removed multiple dependencies related to cloud SDKs, container registries, HTTP clients, and utilities; added testify indirectly.
deploy/cloud/operator/internal/archive/tar.go Deleted function to extract a file from a tar archive.
deploy/cloud/operator/internal/archive/tar_test.go Deleted tests for tar extraction function.
deploy/cloud/operator/internal/config/config.go Deleted configuration utilities for Docker registry, API store, internal images, and environment variable helpers.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Manager
    participant DockerSecretIndexer
    participant Reconciler

    User->>Manager: Start operator
    Manager->>DockerSecretIndexer: Initialize with client
    loop Every 60 seconds
        Manager->>DockerSecretIndexer: RefreshIndex()
    end
    Manager->>Reconciler: Setup with DockerSecretIndexer
    Reconciler->>DockerSecretIndexer: GetSecrets(namespace, registry)
    DockerSecretIndexer-->>Reconciler: Return secret names
    Reconciler->>Kubernetes: Deploy resources using secrets
Loading
sequenceDiagram
    participant Reconciler
    participant DynamoComponentDeployment
    participant DockerSecretIndexer

    Reconciler->>DynamoComponentDeployment: Read spec (image, metadata)
    Reconciler->>DockerSecretIndexer: GetSecrets(namespace, registry)
    DockerSecretIndexer-->>Reconciler: Return image pull secrets
    Reconciler->>Kubernetes: Create/Update Deployment
Loading

Possibly related PRs

Poem

🐰
Refactored fields and secrets anew,
Controllers simpler, tests that grew.
Docker secrets found with speed,
No more configs for us to heed!
With every hop and every run,
Deployments shine—our work is done.
🥕✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
deploy/cloud/operator/internal/secrets/docker_test.go (1)

13-95: Consider adding test cases for edge cases and error scenarios.

While the happy path is well tested, consider adding test coverage for:

  1. Malformed docker config JSON
  2. Secrets that are not of type DockerConfigJson (should be ignored)
  3. Empty auths in docker config
  4. Query for non-existent namespace or registry

Example additional test case:

// Add to mockSecrets slice
{
    ObjectMeta: metav1.ObjectMeta{
        Name:      "non-docker-secret",
        Namespace: "default",
    },
    Type: corev1.SecretTypeOpaque, // Should be ignored
    Data: map[string][]byte{
        "key": []byte("value"),
    },
},
{
    ObjectMeta: metav1.ObjectMeta{
        Name:      "malformed-secret",
        Namespace: "default",  
    },
    Type: corev1.SecretTypeDockerConfigJson,
    Data: map[string][]byte{
        ".dockerconfigjson": []byte(`invalid json`),
    },
},

// After RefreshIndex, add tests:
// Test non-existent registry
secrets, err = i.GetSecrets("default", "non-existent.registry.com")
if err != nil {
    t.Errorf("GetSecrets() unexpected error = %v", err)
}
if len(secrets) != 0 {
    t.Errorf("Expected 0 secrets for non-existent registry, got %d", len(secrets))
}
deploy/cloud/operator/Makefile (1)

58-59: Consider making the copyright year dynamic.

The hard-coded copyright year "2024-2025" might need manual updates. Consider using a dynamic approach or making it configurable.

-				'# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.' \
+				"# SPDX-FileCopyrightText: Copyright (c) 2024-$$(date +%Y) NVIDIA CORPORATION & AFFILIATES. All rights reserved." \
deploy/cloud/operator/internal/secrets/docker.go (1)

35-36: Consider incremental updates for better performance.

The current implementation clears and rebuilds the entire index on each refresh. For large clusters with many secrets, consider implementing incremental updates using watch events.

deploy/cloud/operator/internal/dynamo/graph.go (1)

280-338: Consider decomposing this function to reduce complexity.

The // nolint:gocyclo comment indicates high cyclomatic complexity. Consider extracting the component deployment creation logic into a separate function.

func createComponentDeployment(
    ctx context.Context,
    parentDynamoGraphDeployment *v1alpha1.DynamoGraphDeployment,
    componentName string,
    component v1alpha1.DynamoComponentSpec,
    graphDynamoNamespace string,
    ingressSpec *v1alpha1.IngressSpec,
) (*v1alpha1.DynamoComponentDeployment, error) {
    // Extract lines 286-329 into this function
}
deploy/cloud/operator/internal/dynamo/graph_test.go (1)

442-442: Fix inconsistent JSON formatting in test data.

The JSON string has inconsistent spacing after colons. Consider using consistent formatting.

-				Value: `{"service1":{"port":8080,"ServiceArgs":{"Workers":2, "Resources":{"CPU":"2", "Memory":"2Gi", "GPU":"2"}}}}`,
+				Value: `{"service1":{"port":8080,"ServiceArgs":{"Workers":2,"Resources":{"CPU":"2","Memory":"2Gi","GPU":"2"}}}}`,
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2bed47e and 54c7985.

📒 Files selected for processing (17)
  • deploy/cloud/helm/crds/Chart.yaml (1 hunks)
  • deploy/cloud/operator/Makefile (1 hunks)
  • deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (4 hunks)
  • deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1 hunks)
  • deploy/cloud/operator/cmd/main.go (3 hunks)
  • deploy/cloud/operator/internal/common/url.go (1 hunks)
  • deploy/cloud/operator/internal/common/url_test.go (1 hunks)
  • deploy/cloud/operator/internal/consts/consts.go (1 hunks)
  • deploy/cloud/operator/internal/controller/common.go (1 hunks)
  • deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (2 hunks)
  • deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (14 hunks)
  • deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (7 hunks)
  • deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/graph.go (3 hunks)
  • deploy/cloud/operator/internal/dynamo/graph_test.go (12 hunks)
  • deploy/cloud/operator/internal/secrets/docker.go (1 hunks)
  • deploy/cloud/operator/internal/secrets/docker_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
deploy/cloud/operator/internal/controller/common.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1623
File: deploy/cloud/helm/platform/values.yaml:48-52
Timestamp: 2025-06-24T21:58:17.757Z
Learning: In published Helm charts, it's a best practice to set `useKubernetesSecret: true` as the default for docker registry configuration while leaving server, username, and password fields empty. This ensures users must explicitly provide registry credentials at deployment time via `--set` flags or custom values files, rather than having sensitive data hardcoded in the chart repository.
deploy/cloud/operator/internal/consts/consts.go (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/cmd/main.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (3)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
deploy/cloud/operator/internal/dynamo/graph.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/internal/dynamo/graph_test.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
🧬 Code Graph Analysis (6)
deploy/cloud/operator/internal/common/url_test.go (1)
deploy/cloud/operator/internal/common/url.go (1)
  • GetHost (9-22)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
deploy/cloud/operator/internal/dynamo/graph.go (1)
  • GenerateDynamoComponentsDeployments (282-338)
deploy/cloud/operator/internal/secrets/docker_test.go (1)
deploy/cloud/operator/internal/secrets/docker.go (1)
  • NewDockerSecretIndexer (20-25)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
deploy/cloud/operator/internal/consts/consts.go (1)
  • ComponentTypeMain (77-77)
deploy/cloud/operator/api/dynamo/common/common.go (1)
  • ExtraPodSpec (56-66)
deploy/cloud/operator/internal/secrets/docker.go (1)
deploy/cloud/operator/internal/common/url.go (1)
  • GetHost (9-22)
deploy/cloud/operator/internal/dynamo/graph.go (6)
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
  • DynamoGraphDeployment (57-63)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (3)
  • IngressSpec (100-111)
  • DynamoComponentDeployment (131-137)
  • DynamoComponentDeploymentSharedSpec (52-85)
deploy/cloud/operator/internal/consts/consts.go (2)
  • ComponentTypePlanner (76-76)
  • PlannerServiceAccountName (78-78)
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
  • ComponentType (43-46)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
  • ComponentTypePlanner (83-83)
deploy/cloud/operator/api/dynamo/common/common.go (1)
  • ExtraPodSpec (56-66)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (20)
deploy/cloud/operator/internal/common/url.go (1)

9-22: Well-designed URL host extraction utility.

The implementation correctly handles URLs with and without schemes using the dummy scheme approach, which is a common pattern for this use case. Error handling is appropriate and covers both parsing failures and missing host scenarios.

deploy/cloud/operator/internal/consts/consts.go (1)

76-78: Good addition of standardized component type constants.

These constants provide centralized definitions for component types and service account naming, which improves maintainability and consistency across the codebase.

deploy/cloud/helm/crds/Chart.yaml (1)

19-19: Appropriate version bump for CRD changes.

The patch-level version increment from 0.1.6 to 0.1.7 is appropriate for the CRD updates and refactoring changes in this PR.

deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)

35-35: Good addition of omitempty to support refactoring goals.

Adding omitempty to the DynamoGraph field JSON tag makes it optional during serialization, which aligns with the refactoring to make DynamoGraphDeployment more self-contained and decouple from legacy config dependencies.

deploy/cloud/operator/internal/controller/common.go (1)

74-77: Well-designed interface for dynamic secret retrieval.

The dockerSecretRetriever interface provides a clean abstraction for retrieving docker secrets dynamically based on namespace and registry. This supports better testability through dependency injection and replaces static secret name usage as described in the refactoring goals.

deploy/cloud/operator/internal/common/url_test.go (1)

1-76: LGTM! Comprehensive test coverage for GetHost function.

The test cases properly cover all the expected scenarios including plain hostnames, URLs with ports, schemes, and paths, as well as error handling for empty input. The table-driven test pattern is well-structured.

deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)

121-126: LGTM! Simplified deployment generation aligns with PR objectives.

The removal of the dynamoGraphConfig parameter and the elimination of DynamoComponent resource creation/syncing successfully simplifies the deployment process as intended.

deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (3)

41-43: Good addition of omitempty tags for optional fields.

Making DynamoComponent and DynamoTag optional aligns well with the removal of DynamoComponent resource dependency.


169-171: Excellent backward compatibility in IsMainComponent method.

The updated logic maintains backward compatibility by checking both the legacy DynamoTag suffix pattern and the new explicit ComponentType field. This ensures smooth migration.


195-201: Clean implementation of GetImage method.

The method provides a safe way to extract the image from the nested structure with proper nil checks.

deploy/cloud/operator/internal/secrets/docker.go (1)

13-18: Well-structured thread-safe implementation.

The DockerSecretIndexer is well-designed with proper thread safety using RWMutex. The nested map structure efficiently indexes secrets by namespace and registry.

deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)

777-784: Clean mock implementation for Docker secret retrieval.

The mockDockerSecretRetriever provides a simple and effective way to test the secret retrieval functionality without external dependencies.

deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)

633-654: Good simplification of image name generation.

The removal of the dynamoComponentRepositoryName parameter simplifies the function signature while maintaining the same functionality by using dynamoComponent.Name directly.

deploy/cloud/operator/internal/dynamo/graph.go (2)

280-282: Excellent refactoring to remove external dependencies.

The removal of the DynamoGraphConfig parameter and direct use of parentDynamoGraphDeployment.Spec.Services simplifies the function and aligns with the broader architectural changes.


307-311: Good use of centralized constants.

The use of commonconsts.ComponentTypePlanner and commonconsts.PlannerServiceAccountName from the internal consts package improves maintainability.

deploy/cloud/operator/internal/dynamo/graph_test.go (1)

686-686: LGTM!

The function call correctly matches the updated signature without the config parameter.

deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (4)

86-96: LGTM!

The addition of DockerSecretRetriever interface properly replaces the static secret retrieval mechanism.


1091-1096: LGTM!

The method correctly retrieves labels from the deployment and handles the nil case appropriately.


1466-1469: LGTM!

The image retrieval correctly uses the new GetImage() method with proper error handling for missing image configuration.


1838-1840: LGTM!

The service name generation correctly uses the updated method signatures.

@hhzhang16
Copy link
Contributor

@biswapanda is there a corresponding CI MR?

@julienmancuso julienmancuso enabled auto-merge (squash) July 7, 2025 23:46
@julienmancuso julienmancuso merged commit 7a341f8 into main Jul 8, 2025
8 of 10 checks passed
@julienmancuso julienmancuso deleted the jsm/dyn-618 branch July 8, 2025 00:29
wantErr: false,
},
{
name: "gitlab-master.nvidia.com:5005",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienmancuso @hhzhang16 Do we want internal NVIDIA urls in here? Does it mean no-one else will be able to run this test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants