-
Notifications
You must be signed in to change notification settings - Fork 686
feat: simplify k8s deployment #1708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update removes the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Manager
participant DockerSecretIndexer
participant Reconciler
User->>Manager: Start operator
Manager->>DockerSecretIndexer: Initialize with client
loop Every 60 seconds
Manager->>DockerSecretIndexer: RefreshIndex()
end
Manager->>Reconciler: Setup with DockerSecretIndexer
Reconciler->>DockerSecretIndexer: GetSecrets(namespace, registry)
DockerSecretIndexer-->>Reconciler: Return secret names
Reconciler->>Kubernetes: Deploy resources using secrets
sequenceDiagram
participant Reconciler
participant DynamoComponentDeployment
participant DockerSecretIndexer
Reconciler->>DynamoComponentDeployment: Read spec (image, metadata)
Reconciler->>DockerSecretIndexer: GetSecrets(namespace, registry)
DockerSecretIndexer-->>Reconciler: Return image pull secrets
Reconciler->>Kubernetes: Create/Update Deployment
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (5)
deploy/cloud/operator/internal/secrets/docker_test.go (1)
13-95: Consider adding test cases for edge cases and error scenarios.While the happy path is well tested, consider adding test coverage for:
- Malformed docker config JSON
- Secrets that are not of type
DockerConfigJson(should be ignored)- Empty
authsin docker config- Query for non-existent namespace or registry
Example additional test case:
// Add to mockSecrets slice { ObjectMeta: metav1.ObjectMeta{ Name: "non-docker-secret", Namespace: "default", }, Type: corev1.SecretTypeOpaque, // Should be ignored Data: map[string][]byte{ "key": []byte("value"), }, }, { ObjectMeta: metav1.ObjectMeta{ Name: "malformed-secret", Namespace: "default", }, Type: corev1.SecretTypeDockerConfigJson, Data: map[string][]byte{ ".dockerconfigjson": []byte(`invalid json`), }, }, // After RefreshIndex, add tests: // Test non-existent registry secrets, err = i.GetSecrets("default", "non-existent.registry.com") if err != nil { t.Errorf("GetSecrets() unexpected error = %v", err) } if len(secrets) != 0 { t.Errorf("Expected 0 secrets for non-existent registry, got %d", len(secrets)) }deploy/cloud/operator/Makefile (1)
58-59: Consider making the copyright year dynamic.The hard-coded copyright year "2024-2025" might need manual updates. Consider using a dynamic approach or making it configurable.
- '# SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.' \ + "# SPDX-FileCopyrightText: Copyright (c) 2024-$$(date +%Y) NVIDIA CORPORATION & AFFILIATES. All rights reserved." \deploy/cloud/operator/internal/secrets/docker.go (1)
35-36: Consider incremental updates for better performance.The current implementation clears and rebuilds the entire index on each refresh. For large clusters with many secrets, consider implementing incremental updates using watch events.
deploy/cloud/operator/internal/dynamo/graph.go (1)
280-338: Consider decomposing this function to reduce complexity.The
// nolint:gocyclocomment indicates high cyclomatic complexity. Consider extracting the component deployment creation logic into a separate function.func createComponentDeployment( ctx context.Context, parentDynamoGraphDeployment *v1alpha1.DynamoGraphDeployment, componentName string, component v1alpha1.DynamoComponentSpec, graphDynamoNamespace string, ingressSpec *v1alpha1.IngressSpec, ) (*v1alpha1.DynamoComponentDeployment, error) { // Extract lines 286-329 into this function }deploy/cloud/operator/internal/dynamo/graph_test.go (1)
442-442: Fix inconsistent JSON formatting in test data.The JSON string has inconsistent spacing after colons. Consider using consistent formatting.
- Value: `{"service1":{"port":8080,"ServiceArgs":{"Workers":2, "Resources":{"CPU":"2", "Memory":"2Gi", "GPU":"2"}}}}`, + Value: `{"service1":{"port":8080,"ServiceArgs":{"Workers":2,"Resources":{"CPU":"2","Memory":"2Gi","GPU":"2"}}}}`,
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
deploy/cloud/helm/crds/Chart.yaml(1 hunks)deploy/cloud/operator/Makefile(1 hunks)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go(4 hunks)deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go(1 hunks)deploy/cloud/operator/cmd/main.go(3 hunks)deploy/cloud/operator/internal/common/url.go(1 hunks)deploy/cloud/operator/internal/common/url_test.go(1 hunks)deploy/cloud/operator/internal/consts/consts.go(1 hunks)deploy/cloud/operator/internal/controller/common.go(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponent_controller.go(2 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(14 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go(7 hunks)deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go(1 hunks)deploy/cloud/operator/internal/dynamo/graph.go(3 hunks)deploy/cloud/operator/internal/dynamo/graph_test.go(12 hunks)deploy/cloud/operator/internal/secrets/docker.go(1 hunks)deploy/cloud/operator/internal/secrets/docker_test.go(1 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
deploy/cloud/operator/internal/controller/common.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1623
File: deploy/cloud/helm/platform/values.yaml:48-52
Timestamp: 2025-06-24T21:58:17.757Z
Learning: In published Helm charts, it's a best practice to set `useKubernetesSecret: true` as the default for docker registry configuration while leaving server, username, and password fields empty. This ensures users must explicitly provide registry credentials at deployment time via `--set` flags or custom values files, rather than having sensitive data hardcoded in the chart repository.
deploy/cloud/operator/internal/consts/consts.go (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/cmd/main.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (3)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
deploy/cloud/operator/internal/dynamo/graph.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/internal/dynamo/graph_test.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
🧬 Code Graph Analysis (6)
deploy/cloud/operator/internal/common/url_test.go (1)
deploy/cloud/operator/internal/common/url.go (1)
GetHost(9-22)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
deploy/cloud/operator/internal/dynamo/graph.go (1)
GenerateDynamoComponentsDeployments(282-338)
deploy/cloud/operator/internal/secrets/docker_test.go (1)
deploy/cloud/operator/internal/secrets/docker.go (1)
NewDockerSecretIndexer(20-25)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
deploy/cloud/operator/internal/consts/consts.go (1)
ComponentTypeMain(77-77)deploy/cloud/operator/api/dynamo/common/common.go (1)
ExtraPodSpec(56-66)
deploy/cloud/operator/internal/secrets/docker.go (1)
deploy/cloud/operator/internal/common/url.go (1)
GetHost(9-22)
deploy/cloud/operator/internal/dynamo/graph.go (6)
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
DynamoGraphDeployment(57-63)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (3)
IngressSpec(100-111)DynamoComponentDeployment(131-137)DynamoComponentDeploymentSharedSpec(52-85)deploy/cloud/operator/internal/consts/consts.go (2)
ComponentTypePlanner(76-76)PlannerServiceAccountName(78-78)deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
ComponentType(43-46)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
ComponentTypePlanner(83-83)deploy/cloud/operator/api/dynamo/common/common.go (1)
ExtraPodSpec(56-66)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (20)
deploy/cloud/operator/internal/common/url.go (1)
9-22: Well-designed URL host extraction utility.The implementation correctly handles URLs with and without schemes using the dummy scheme approach, which is a common pattern for this use case. Error handling is appropriate and covers both parsing failures and missing host scenarios.
deploy/cloud/operator/internal/consts/consts.go (1)
76-78: Good addition of standardized component type constants.These constants provide centralized definitions for component types and service account naming, which improves maintainability and consistency across the codebase.
deploy/cloud/helm/crds/Chart.yaml (1)
19-19: Appropriate version bump for CRD changes.The patch-level version increment from 0.1.6 to 0.1.7 is appropriate for the CRD updates and refactoring changes in this PR.
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
35-35: Good addition of omitempty to support refactoring goals.Adding
omitemptyto the DynamoGraph field JSON tag makes it optional during serialization, which aligns with the refactoring to make DynamoGraphDeployment more self-contained and decouple from legacy config dependencies.deploy/cloud/operator/internal/controller/common.go (1)
74-77: Well-designed interface for dynamic secret retrieval.The
dockerSecretRetrieverinterface provides a clean abstraction for retrieving docker secrets dynamically based on namespace and registry. This supports better testability through dependency injection and replaces static secret name usage as described in the refactoring goals.deploy/cloud/operator/internal/common/url_test.go (1)
1-76: LGTM! Comprehensive test coverage forGetHostfunction.The test cases properly cover all the expected scenarios including plain hostnames, URLs with ports, schemes, and paths, as well as error handling for empty input. The table-driven test pattern is well-structured.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
121-126: LGTM! Simplified deployment generation aligns with PR objectives.The removal of the
dynamoGraphConfigparameter and the elimination ofDynamoComponentresource creation/syncing successfully simplifies the deployment process as intended.deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (3)
41-43: Good addition ofomitemptytags for optional fields.Making
DynamoComponentandDynamoTagoptional aligns well with the removal ofDynamoComponentresource dependency.
169-171: Excellent backward compatibility inIsMainComponentmethod.The updated logic maintains backward compatibility by checking both the legacy
DynamoTagsuffix pattern and the new explicitComponentTypefield. This ensures smooth migration.
195-201: Clean implementation ofGetImagemethod.The method provides a safe way to extract the image from the nested structure with proper nil checks.
deploy/cloud/operator/internal/secrets/docker.go (1)
13-18: Well-structured thread-safe implementation.The DockerSecretIndexer is well-designed with proper thread safety using RWMutex. The nested map structure efficiently indexes secrets by namespace and registry.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
777-784: Clean mock implementation for Docker secret retrieval.The mockDockerSecretRetriever provides a simple and effective way to test the secret retrieval functionality without external dependencies.
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
633-654: Good simplification of image name generation.The removal of the
dynamoComponentRepositoryNameparameter simplifies the function signature while maintaining the same functionality by usingdynamoComponent.Namedirectly.deploy/cloud/operator/internal/dynamo/graph.go (2)
280-282: Excellent refactoring to remove external dependencies.The removal of the
DynamoGraphConfigparameter and direct use ofparentDynamoGraphDeployment.Spec.Servicessimplifies the function and aligns with the broader architectural changes.
307-311: Good use of centralized constants.The use of
commonconsts.ComponentTypePlannerandcommonconsts.PlannerServiceAccountNamefrom the internal consts package improves maintainability.deploy/cloud/operator/internal/dynamo/graph_test.go (1)
686-686: LGTM!The function call correctly matches the updated signature without the config parameter.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (4)
86-96: LGTM!The addition of
DockerSecretRetrieverinterface properly replaces the static secret retrieval mechanism.
1091-1096: LGTM!The method correctly retrieves labels from the deployment and handles the nil case appropriately.
1466-1469: LGTM!The image retrieval correctly uses the new
GetImage()method with proper error handling for missing image configuration.
1838-1840: LGTM!The service name generation correctly uses the updated method signatures.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
Show resolved
Hide resolved
|
@biswapanda is there a corresponding CI MR? |
| wantErr: false, | ||
| }, | ||
| { | ||
| name: "gitlab-master.nvidia.com:5005", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julienmancuso @hhzhang16 Do we want internal NVIDIA urls in here? Does it mean no-one else will be able to run this test?
Overview:
simplify k8s deployment
Details:
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores
Tests