Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-45182: add startupProbe to csi-driver #334

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

EmilienM
Copy link
Member

@EmilienM EmilienM commented Nov 28, 2024

The csi-driver container requires an NFS socket (handled by csi-driver-nfs) to be created before it can operate.
If the socket is not available at startup, the pod will restart unnecessarily due to initialization failure.

Using a postStart hook to check for the socket is not reliable, as Kubernetes does not guarantee the hook will execute before the container's entrypoint starts.

This patch introduces a startupProbe to verify the existence of the NFS socket.
The probe will run for up to one minute, allowing sufficient time for the socket to be created and avoiding premature pod restarts.
The existing livenessProbe remains responsible for ongoing health checks.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Nov 28, 2024
@openshift-ci-robot
Copy link

@EmilienM: This pull request references Jira Issue OCPBUGS-45182, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (itbrown@redhat.com), skipping review request.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The csi-driver-nfs container needs to start before csi-driver container
otherwise the socket might not be available when the driver is looking
for it.

So in this PR we move the container into a initContainers so it'll run
before other containers and this should avoid the pod to restart once.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@EmilienM
Copy link
Member Author

/cherry-pick release-4.18

@openshift-ci openshift-ci bot requested review from bertinatto and mandre November 28, 2024 21:11
Copy link
Contributor

openshift-ci bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: EmilienM

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-cherrypick-robot

@EmilienM: once the present PR merges, I will cherry-pick it on top of release-4.18 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 28, 2024
Copy link
Contributor

openshift-ci bot commented Nov 28, 2024

@EmilienM, testwith: could not generate prow job. ERROR:

no ref for requested test included in command

@EmilienM
Copy link
Member Author

/testwith openshift/hypershift/main/e2e-openstack

@EmilienM
Copy link
Member Author

/testwith openshift/hypershift/main/e2e-openstack-csi-manila

@EmilienM EmilienM changed the title OCPBUGS-45182: move csi-driver-nfs under initContainers` OCPBUGS-45182: move csi-driver-nfs under initContainers` Nov 28, 2024
@EmilienM EmilienM changed the title OCPBUGS-45182: move csi-driver-nfs under initContainers` OCPBUGS-45182: move csi-driver-nfs under initContainers Nov 28, 2024
@EmilienM
Copy link
Member Author

/testwith openshift/hypershift/main/e2e-openstack

@EmilienM
Copy link
Member Author

/testwith openshift/hypershift/main/e2e-openstack openshift/hypershift#5202

@EmilienM
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 29, 2024
The csi-driver-nfs container requires an NFS socket to be created
before the csi-driver container can operate.
If the socket is not available at startup, the pod will restart
unnecessarily due to initialization failure.

Using a postStart hook to check for the socket is not reliable,
as Kubernetes does not guarantee the hook will execute before
the container's entrypoint starts.

This patch introduces a startupProbe to verify the existence of the NFS socket.
The probe will run for up to one minute, allowing sufficient time for the socket
to be created and avoiding premature pod restarts.
The existing livenessProbe remains responsible for ongoing health checks.
@EmilienM EmilienM changed the title OCPBUGS-45182: move csi-driver-nfs under initContainers OCPBUGS-45182: add startupProbe to csi-driver Nov 29, 2024
Copy link
Contributor

openshift-ci bot commented Dec 3, 2024

@EmilienM: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-manila-csi 73c6924 link true /test e2e-openstack-manila-csi
ci/prow/e2e-openstack 73c6924 link false /test e2e-openstack
ci/prow/e2e-azurestack-csi 73c6924 link false /test e2e-azurestack-csi
ci/prow/e2e-azure-file-csi 73c6924 link true /test e2e-azure-file-csi
ci/prow/hypershift-aws-e2e-external 73c6924 link true /test hypershift-aws-e2e-external
ci/prow/smb-operator-e2e-extended 73c6924 link false /test smb-operator-e2e-extended
ci/prow/e2e-azure-ovn-upgrade 73c6924 link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-azure-csi 73c6924 link true /test e2e-azure-csi
ci/prow/hypershift-e2e-openstack-csi-manila 73c6924 link true /test hypershift-e2e-openstack-csi-manila
ci/prow/hypershift-e2e-openstack-csi-cinder 73c6924 link true /test hypershift-e2e-openstack-csi-cinder

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants