Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix minikube start failure #2771

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

kmrmt
Copy link
Contributor

@kmrmt kmrmt commented Dec 10, 2024

Description

Fix minikube start failure at readreplica e2e test.

Related Issue

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced Kubernetes cluster management with new commands for volume snapshots.
    • Added configuration for a new read replica in the agent setup.
    • Introduced a new YAML file for Kind cluster setup.
    • New input parameter for GitHub Actions to specify Kind environment requirements.
    • New phony targets for managing Kind volume snapshots.
    • New command added for Docker CLI installation in the build process.
  • Improvements

    • Updated Docker installation process for better integration with the build environment.
    • Enhanced logging options for Minikube startup.
    • Increased timeout and sleep duration for pod readiness checks.
    • Improved specificity in .gitignore for Rust-related files.
    • Adjusted deployment commands for Vald read replicas.
    • Enhanced configuration for health checks and logging in the read replica setup.
  • Bug Fixes

    • Improved readiness checks for Kubernetes pods.
    • Commented out readiness checks for pod deployments in the Vald read replica workflow.
    • Updated logging level for better debugging in the read replica configuration.

Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

📝 Walkthrough
<details>
<summary>📝 Walkthrough</summary>

## Walkthrough

This pull request introduces significant updates to the development and testing infrastructure for a Kubernetes-based project. The modifications enhance cluster management, particularly for Kind (Kubernetes in Docker) environments. Key changes include the installation of the Docker CLI, configuration for volume snapshot support, updates to GitHub Actions workflows, and preparations for read replica deployments. The changes aim to streamline the setup and testing processes across various configuration files and build scripts.

## Changes

| File | Change Summary |
|------|----------------|
| `Makefile.d/e2e.mk` | Updated readiness checks for metrics-server and agent, replaced `minikube/restart` with `kind/vs/restart`. |
| `Makefile.d/minikube.mk` | Added Docker binary prerequisite, updated Minikube start command with Docker driver and logging options, enabled metrics-server addon. |
| `Makefile.d/tools.mk` | Added new `docker-cli/install` phony target for Docker CLI installation. |
| `dockers/ci/base/Dockerfile` | Added `sudo` package and Docker CLI installation step. |
| `dockers/dev/Dockerfile` | Integrated Docker CLI installation step. |
| `Makefile.d/kind.mk` | Added volume snapshot support, new targets for Kind cluster management, Docker binary dependency. |
| `.github/actions/setup-e2e/action.yaml` | Added `require_kind` input parameter and Kind environment setup step. |
| `.github/workflows/e2e.yaml` | Updated E2E job to use Kind, added read replica deployment step. |
| `.gitignore` | Refined Rust-related ignored files and added new ignore rules. |
| `k8s/debug/kind/e2e.yaml` | New Kubernetes Kind cluster configuration file. |
| `Makefile.d/k8s.mk` | Increased sleep duration before checking read replica pod readiness. |
| `.github/helm/values/values-readreplica.yaml` | Added read replica specific name and volume configuration. |
| `.github/actions/e2e-deploy-vald-readreplica/action.yaml` | Commented out readiness checks for pod deployment. |
| `.github/actions/e2e-deploy-vald/action.yaml` | Updated conditions for waiting on pod readiness. |

## Possibly related PRs

- **#2702**: This PR implements a feature for deleting expired indexes using a Kubernetes Job, which is directly related to the changes in the main PR that also involve index deletion functionality.
- **#2722**: This PR backports the implementation of the delete expired index job to the release version, aligning with the main PR's focus on the same feature.
- **#2707**: This PR introduces an automated workflow for updating dependency versions, which may indirectly support the infrastructure needed for the changes in the main PR related to index deletion.
- **#2775**: This PR updates various library dependencies, which could be relevant to ensuring that the index deletion feature operates with the latest libraries and tools.

## Suggested labels

`size/S`, `area/internal`

## Suggested reviewers

- kpango

</details>

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84c941b and 3358169.

📒 Files selected for processing (1)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link

cloudflare-workers-and-pages bot commented Dec 11, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 3358169
Status: ✅  Deploy successful!
Preview URL: https://b66465ef.vald.pages.dev
Branch Preview URL: https://bugfix-e2e-readreplica-minik.vald.pages.dev

View logs

@kmrmt kmrmt changed the title [WIP] fix minikube start failure fix minikube start failure Dec 11, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e06133d and 0473919.

⛔ Files ignored due to path filters (1)
  • hack/docker/gen/main.go is excluded by !**/gen/**
📒 Files selected for processing (6)
  • .github/workflows/e2e.yaml (1 hunks)
  • Makefile.d/e2e.mk (1 hunks)
  • Makefile.d/minikube.mk (2 hunks)
  • Makefile.d/tools.mk (1 hunks)
  • dockers/ci/base/Dockerfile (2 hunks)
  • dockers/dev/Dockerfile (1 hunks)
🔇 Additional comments (7)
Makefile.d/minikube.mk (2)

28-30: LGTM: Explicit Docker driver configuration

The changes properly ensure Docker CLI availability and explicitly specify the Docker driver, which helps prevent driver selection issues during Minikube startup.


35-35: Verify metrics-server requirement

The addition of metrics-server addon might be related to the readreplica e2e test requirements.

✅ Verification successful

Metrics-server requirement is verified for readreplica e2e tests

Based on the analysis of the e2e test files, specifically in Makefile.d/e2e.mk, the metrics-server is indeed required for the readreplica e2e tests. The e2e/actions/run/readreplica target explicitly waits for the metrics-server to be available and ready before proceeding with the readreplica tests:

kubectl wait -n kube-system --for=condition=Available deployment/metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)
kubectl wait -n kube-system --for=condition=Ready pod -l k8s-app=metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)
kubectl wait -n kube-system --for=condition=ContainersReady pod -l k8s-app=metrics-server --timeout=$(E2E_WAIT_FOR_START_TIMEOUT)

Additionally, the readreplica configuration in k8s/readreplica/configmap.yaml shows metrics-related configurations, indicating that metrics monitoring is an integral part of the readreplica functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if readreplica tests depend on metrics-server

# Search for metrics-server usage in e2e tests
rg -l "metrics-server|metrics\.k8s\.io" --type yaml --type go | grep -i "e2e\|test"

# Check if readreplica manifests have resource requirements
rg -l "readreplica.*resources:" --type yaml

Length of output: 131


Script:

#!/bin/bash
# Let's try a broader search to understand the context better

# Search for metrics-server references in all files
rg -l "metrics-server"

# Search for any metrics-related configurations in test files
rg -l "metrics" test/ e2e/

# Look for resource requirements or monitoring in test files
rg -A 5 "resources:|monitoring" test/ e2e/

# Check if there are any readreplica test files
fd readreplica

Length of output: 732


Script:

#!/bin/bash
# Let's check the content of relevant files to understand the metrics-server requirement

# Check Makefile.d/e2e.mk for metrics-server usage context
cat Makefile.d/e2e.mk

# Check readreplica job manifests for resource requirements
rg -A 5 "resources:" charts/vald/templates/index/job/readreplica/
rg -A 5 "resources:" charts/vald-readreplica/

# Check if metrics-server is mentioned in readreplica related files
rg "metrics" k8s/readreplica/
rg "metrics" pkg/index/job/readreplica/

Length of output: 10509

dockers/ci/base/Dockerfile (2)

126-126: LGTM: Docker CLI installation

Adding docker-cli installation aligns with the Minikube prerequisites and supports the fix for Minikube start failure.


75-75: Verify sudo requirement

The addition of sudo package might be related to Docker operations. However, since we're running as root (USER root:root), verify if sudo is actually needed.

dockers/dev/Dockerfile (1)

140-140: LGTM: Consistent Docker CLI installation

Adding docker-cli installation in the dev environment maintains consistency with the CI environment and ensures Minikube works in both contexts.

Makefile.d/tools.mk (1)

264-272: LGTM: Docker CLI installation implementation

The implementation is well-structured and follows the established pattern for tool installations:

  • Uses official Docker repository
  • Handles architecture detection
  • Includes proper cleanup
  • Follows the same pattern as other tool installations
.github/workflows/e2e.yaml (1)

Line range hint 312-316: Verify the necessity of disabling k3d

The change to use Minikube exclusively (by disabling k3d) for this test might indicate an underlying issue that should be investigated.

Let's check if there are any k3d-specific configurations that might be causing issues:

✅ Verification successful

Let me gather more information about the usage of k3d and minikube in the e2e workflow.


Let me check the actual setup-k3d and setup-minikube actions to understand their configurations and potential differences.


Let me check the e2e workflow file to understand the specific test that's being modified and its requirements.


Let me check the specific test configuration and Makefile to understand the minikube setup.


The switch to Minikube for read replica test is justified and necessary

The read replica test specifically requires CSI Driver and Volume Snapshots support for testing the read replica functionality, as evidenced by the configuration in values-readreplica.yaml which uses:

  • storageClass: csi-hostpath-sc
  • snapshot_classname: "csi-hostpath-snapclass"
  • readreplica.enabled: true

Minikube is explicitly configured in minikube.mk to support these features with the commands:

minikube addons enable volumesnapshots
minikube addons enable csi-hostpath-driver

The comment in minikube.mk also confirms this: "Only use this for development related to Volume Snapshots. Usually k3d is faster."

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for k3d-specific configurations
rg -A 5 "k3d" .github/helm/values/
rg -A 5 "k3d" .github/actions/setup-e2e/

Length of output: 1407


Script:

#!/bin/bash
# Check for k3d and minikube usage in e2e workflow
rg -A 5 "setup-e2e" .github/workflows/e2e.yaml

# Check for any k3d or minikube related issues or discussions
rg -i "k3d|minikube" .github/ISSUE_TEMPLATE/
rg -i "k3d|minikube" .github/pull_request_template.md

Length of output: 2255


Script:

#!/bin/bash
# Check k3d and minikube setup actions
cat .github/actions/setup-k3d/action.yaml
cat .github/actions/setup-minikube/action.yaml

# Check for any related test failures or issues
rg -i "failed.*k3d|k3d.*failed" .github/workflows/
rg -i "failed.*minikube|minikube.*failed" .github/workflows/

Length of output: 4113


Script:

#!/bin/bash
# Check the specific e2e test configuration and its context
rg -B 10 -A 10 'require_k3d: "false"' .github/workflows/e2e.yaml

# Check if there are any minikube-specific configurations in the repository
rg -l "minikube" .

Length of output: 1049


Script:

#!/bin/bash
# Check the read replica test configuration
cat .github/helm/values/values-readreplica.yaml

# Check minikube configuration in Makefile
cat Makefile.d/minikube.mk

# Check if this is the only test using minikube
rg "require_minikube: \"true\"" .github/workflows/

Length of output: 3809

Makefile.d/e2e.mk Outdated Show resolved Hide resolved
.github/workflows/e2e.yaml Outdated Show resolved Hide resolved
@vdaas-ci
Copy link
Collaborator

Profile Report

typevald-agent-ngtvald-lb-gatewayvald-discoverervald-manager-index
cpu
heap
other images

Copy link
Contributor

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 11, 2024
@vdaas-ci
Copy link
Collaborator

Profile Report

typevald-agent-ngtvald-lb-gatewayvald-discoverervald-manager-index
cpu
heap
other images

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
.github/helm/values/values-readreplica.yaml (3)

19-19: Consider reverting debug logging after troubleshooting

While debug logging is helpful for troubleshooting the minikube start failure, it can impact performance and generate excessive logs. Consider adding a TODO to revert this to info level after the issues are resolved.


42-60: Optimize init containers configuration

The init containers configuration could be optimized:

  1. Consider using IfNotPresent instead of Always for the pull policy of init containers
  2. The configuration is repetitive and could be templated

Consider refactoring to reduce duplication, for example:

initContainers:
  defaults: &initContainerDefaults
    type: wait-for
    image: busybox:stable
    imagePullPolicy: IfNotPresent
    sleepDuration: 2
  containers:
    - <<: *initContainerDefaults
      name: wait-for-discoverer
      target: discoverer
    - <<: *initContainerDefaults
      name: wait-for-agent
      target: agent
    - <<: *initContainerDefaults
      name: wait-for-agent-readreplica
      target: agent-readreplica

Line range hint 19-90: Overall changes look well-structured

The modifications form a cohesive solution for the minikube start failure by:

  1. Enabling detailed logging for troubleshooting
  2. Adding appropriate health checks
  3. Ensuring proper startup order through init containers
  4. Configuring read replica volumes correctly

Consider documenting these startup dependencies and timing requirements in the project's operational documentation to help with future troubleshooting and configuration tuning.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d872637 and ce899a4.

📒 Files selected for processing (1)
  • .github/helm/values/values-readreplica.yaml (3 hunks)
🔇 Additional comments (2)
.github/helm/values/values-readreplica.yaml (2)

89-90: Verify volume configuration consistency

The read replica volume configuration looks correct, but let's ensure:

  1. The PVC name matches any existing claims
  2. The configuration aligns with the snapshot class configuration
✅ Verification successful

Volume configuration is consistent with project standards

The volume configuration is correctly structured and aligns with the project's patterns:

  • The volume name vald-agent-readreplica-pvc follows the established naming convention, similar to other volume configurations like vald-agent-ngt-readreplica-pvc
  • The configuration is properly integrated with the snapshot system, as evidenced by the snapshot templates in charts/vald-readreplica/templates/
  • PVC and volume snapshot configurations are consistently defined across the Helm charts
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check volume and snapshot configurations

# Search for related PVC configurations
rg -g '*.yaml' -g '*.yml' 'vald-agent-readreplica-pvc'

# Check for snapshot class references
rg -g '*.yaml' -g '*.yml' 'csi-hostpath-snapclass'

Length of output: 105


Script:

#!/bin/bash
# Let's broaden the search to find any volume-related configurations

# Search for PVC patterns in yaml files
rg -g '*.yaml' -g '*.yml' 'persistentVolumeClaim|volumeClaimTemplates'

# Search for volume name patterns
rg -g '*.yaml' -g '*.yml' 'volume_name:|volumeName:'

# Search for snapshot-related configurations
rg -g '*.yaml' -g '*.yml' 'snapshot|VolumeSnapshot'

Length of output: 2659


22-29: Verify if 120s initial delay is necessary

The 120-second initial delay for both liveness and readiness probes seems quite long. This could significantly impact:

  • Pod startup time
  • Recovery time during failures
  • Overall deployment time

Let's analyze the startup times to determine if this delay is justified:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce899a4 and a2ca0c2.

📒 Files selected for processing (5)
  • .github/helm/values/values-readreplica.yaml (3 hunks)
  • .github/workflows/e2e.yaml (2 hunks)
  • Makefile.d/e2e.mk (2 hunks)
  • Makefile.d/k8s.mk (5 hunks)
  • Makefile.d/minikube.mk (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • Makefile.d/minikube.mk
  • Makefile.d/e2e.mk
  • Makefile.d/k8s.mk
  • .github/helm/values/values-readreplica.yaml
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/e2e.yaml

343-343: property "deploy_vald" is not defined in object type {deploy_vald_readreplica: {conclusion: string; outcome: string; outputs: {pod_name: string}}; setup_e2e: {conclusion: string; outcome: string; outputs: {helm_extra_options: string; image_tags: string}}}

(expression)

🔇 Additional comments (1)
.github/workflows/e2e.yaml (1)

318-318: LGTM: Infrastructure change from Minikube to Kind

The switch to Kind (Kubernetes in Docker) is a good solution to address the Minikube start failure mentioned in the PR objectives.

@@ -347,7 +340,7 @@ jobs:
E2E_TARGET_NAMESPACE=default \
e2e/readreplica
env:
POD_NAME: ${{ steps.deploy_vald_readreplica.outputs.POD_NAME }}
POD_NAME: ${{ steps.deploy_vald.outputs.POD_NAME }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Invalid step reference in POD_NAME environment variable

The POD_NAME environment variable references steps.deploy_vald.outputs.POD_NAME, but there is no deploy_vald step in this job. This will cause the workflow to fail.

Apply this diff to fix the reference:

-          POD_NAME: ${{ steps.deploy_vald.outputs.POD_NAME }}
+          POD_NAME: ${{ steps.deploy_vald_readreplica.outputs.POD_NAME }}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
POD_NAME: ${{ steps.deploy_vald.outputs.POD_NAME }}
POD_NAME: ${{ steps.deploy_vald_readreplica.outputs.POD_NAME }}
🧰 Tools
🪛 actionlint (1.7.4)

343-343: property "deploy_vald" is not defined in object type {deploy_vald_readreplica: {conclusion: string; outcome: string; outputs: {pod_name: string}}; setup_e2e: {conclusion: string; outcome: string; outputs: {helm_extra_options: string; image_tags: string}}}

(expression)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
.github/workflows/e2e.yaml (1)

343-343: ⚠️ Potential issue

Critical: Invalid step reference in POD_NAME environment variable

The POD_NAME environment variable references steps.deploy_vald.outputs.POD_NAME, but there is no deploy_vald step in this job. This will cause the workflow to fail.

Apply this diff to fix the reference:

-          POD_NAME: ${{ steps.deploy_vald.outputs.POD_NAME }}
+          POD_NAME: ${{ steps.deploy_vald_readreplica.outputs.POD_NAME }}
🧰 Tools
🪛 actionlint (1.7.4)

343-343: property "deploy_vald" is not defined in object type {deploy_vald_readreplica: {conclusion: string; outcome: string; outputs: {pod_name: string}}; setup_e2e: {conclusion: string; outcome: string; outputs: {helm_extra_options: string; image_tags: string}}}

(expression)

🧹 Nitpick comments (3)
.github/workflows/e2e.yaml (1)

344-347: Consider enhancing failure logging.

The failure logging step could be improved by:

  1. Including more context (e.g., pod status, events)
  2. Setting a retention period for the logs

Apply this diff to enhance the logging:

       - name: output logs on failure
         if: failure()
         run: |
-          cat ./gateway.log
+          echo "=== Pod Status ==="
+          kubectl get pods -o wide
+          echo "=== Recent Events ==="
+          kubectl get events --sort-by='.lastTimestamp'
+          echo "=== Gateway Logs ==="
+          cat ./gateway.log
Makefile.d/k8s.mk (2)

Line range hint 135-155: Consider adding validation before deployment.

The deployment target could benefit from pre-deployment validation to ensure manifest correctness.

Add this before the kubectl apply commands:

 k8s/vald/deploy: k8s/vald/manifests
+	@echo "Validating manifests..."
+	@find $(TEMP_DIR)/vald/templates -type f -name "*.yaml" -exec kubectl apply --dry-run=client -f {} \;
 	@echo "Permitting error because there's some cases nothing to apply"

Line range hint 239-270: Enhance pod readiness checks.

The read replica deployment includes good pod readiness checks but could be improved with timeout handling and status reporting.

Apply this diff to enhance the readiness checks:

 	kubectl wait --for=condition=ready pod -l app=vald-agent --timeout=600s
 	kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=vald-readreplica --timeout=600s
+	@echo "=== Deployment Status ==="
+	@kubectl get pods -l app=vald-agent -o wide
+	@kubectl get pods -l app.kubernetes.io/name=vald-readreplica -o wide
+	@if [ $$? -ne 0 ]; then \
+		echo "Deployment failed. Check pod events:"; \
+		kubectl get events --sort-by='.lastTimestamp' | grep -E "vald-agent|vald-readreplica"; \
+		exit 1; \
+	fi
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2ca0c2 and a0c9435.

📒 Files selected for processing (3)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml (1 hunks)
  • .github/workflows/e2e.yaml (2 hunks)
  • Makefile.d/k8s.mk (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/actions/e2e-deploy-vald-readreplica/action.yaml
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/e2e.yaml

343-343: property "deploy_vald" is not defined in object type {deploy_vald_readreplica: {conclusion: string; outcome: string; outputs: {pod_name: string}}; setup_e2e: {conclusion: string; outcome: string; outputs: {helm_extra_options: string; image_tags: string}}}

(expression)

🔇 Additional comments (2)
.github/workflows/e2e.yaml (1)

318-318: Verify Kind cluster configuration for read replica tests.

The change from require_minikube to require_kind needs proper Kind cluster configuration to support the read replica functionality.

✅ Verification successful

Based on the review of the setup-e2e action and helm values configuration, I can now provide a final response:

Kind cluster configuration is properly set up for read replica functionality

The change from require_minikube to require_kind is valid as:

  1. The setup-e2e action includes proper Kind cluster setup via make kind/vs/start
  2. The values-readreplica.yaml configures necessary components:
    • CSI hostpath storage class for persistent volumes
    • Snapshot support via csi-hostpath-snapclass
    • Read replica configuration with proper volume management
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Kind cluster configuration for read replica support.

# Check if Kind cluster configuration exists
fd -t f "kind.*\.yaml" -x cat {} \; | grep -A 5 "extraMounts\|extraPortMappings"

# Check if Kind cluster has the necessary features enabled
kubectl get nodes -o json | jq -r '.items[].spec.podCIDR'

Length of output: 243


Script:

#!/bin/bash
# Let's check the workflow configuration and related files to understand the Kind setup

# Check the workflow file for Kind setup
cat .github/workflows/e2e.yaml

# Look for any Kind configuration files in the repository
fd -t f "kind|k8s|cluster" -x cat {} \;

# Search for any documentation about Kind setup
fd -t f "README|CONTRIBUTING" -x cat {} \;

Length of output: 115445


Script:

#!/bin/bash
# Let's check the setup-e2e action and helm values to understand the Kind configuration

# Check setup-e2e action
cat .github/actions/setup-e2e/action.yaml

# Check helm values for read replica setup
cat .github/helm/values/values-readreplica.yaml

Length of output: 6583

Makefile.d/k8s.mk (1)

Line range hint 114-134: LGTM: Well-structured manifest generation.

The manifest generation target is well-organized and includes all necessary image repository configurations.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/agent/core/ngt/service/ngt.go (1)

890-892: Return a closed channel instead of nil
Returning nil can be risky if any upstream code waits to receive from the channel. Consider returning a closed channel to avoid blocking or upstream panics.

Example refactor:

890     func (n *ngt) Start(ctx context.Context) <-chan error {
891         if n.isReadReplica {
-            return nil
+            c := make(chan error)
+            close(c) // signals no operations needed
+            return c
         }
         ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee564f5 and 84c941b.

📒 Files selected for processing (1)
  • pkg/agent/core/ngt/service/ngt.go (1 hunks)

Copy link

codecov bot commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 23.93%. Comparing base (d99ad22) to head (3358169).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
pkg/agent/core/ngt/service/ngt.go 0.00% 2 Missing and 1 partial ⚠️
hack/docker/gen/main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2771      +/-   ##
==========================================
+ Coverage   23.92%   23.93%   +0.01%     
==========================================
  Files         546      546              
  Lines       54544    54547       +3     
==========================================
+ Hits        13050    13057       +7     
+ Misses      40709    40704       -5     
- Partials      785      786       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants