Liveness Probe Failure #336

reddy-s · 2021-02-11T14:35:22Z

/kind bug

Issue
All pods part of the efs-csi-node demon set log the following error

efs-csi-node-76fhh liveness-probe I0211 14:22:19.953274       1 connection.go:180] GRPC call: /csi.v1.Identity/Probe
efs-csi-node-76fhh liveness-probe I0211 14:22:19.953279       1 connection.go:181] GRPC request: {}
efs-csi-node-76fhh liveness-probe I0211 14:22:19.953980       1 connection.go:183] GRPC response: {}
efs-csi-node-76fhh liveness-probe I0211 14:22:19.954350       1 connection.go:184] GRPC error: <nil>
efs-csi-node-txznc liveness-probe I0211 14:22:20.084964       1 connection.go:180] GRPC call: /csi.v1.Identity/Probe
efs-csi-node-txznc liveness-probe I0211 14:22:20.084968       1 connection.go:181] GRPC request: {}
efs-csi-node-txznc liveness-probe I0211 14:22:20.085787       1 connection.go:183] GRPC response: {}
efs-csi-node-txznc liveness-probe I0211 14:22:20.086315       1 connection.go:184] GRPC error: <nil>

I tried to ignore this and went ahead to mount an EFS volume to one of the pods and it says that the mount failed.

Reproducing the issue

Deploy the helm chart using

helm upgrade --install aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver

Default values used for the helm chart

Environment

Kubernetes version: 1.18
Chart version: aws-efs-csi-driver-1.1.1

The text was updated successfully, but these errors were encountered:

anaclaudiar · 2021-02-18T22:26:25Z

I have kind of the same problem, the mount doesn't fail, however, it keeps triggering monitoring alerts because of the "Liveness probe failed: HTTP probe failed with statuscode: 500".
Kubernetes version: 1.16
Chart version: aws-efs-csi-driver-1.1.1

timrcoulson · 2021-05-12T08:15:04Z

I had similar and increased the initial delay on the probe and things have settled. Might be worth making some of that configurable through the helm chart.

k8s-triage-robot · 2021-08-10T08:44:06Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-09-09T08:47:19Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2021-10-09T09:21:32Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2021-10-09T09:21:40Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

joegraviton · 2022-04-19T07:19:47Z

I had similar and increased the initial delay on the probe and things have settled. Might be worth making some of that configurable through the helm chart.

I get same issue, the probe only fails sometimes.
However, all my containers have already been running for one week and never restarts.
I don't understand why probe is failing with 500.
Also, increasing delay won't help in my case since my container doesn't restart, right ?

manoelhc · 2023-03-17T13:51:04Z

just got the same issue. Any ideas? k8s 1.25

micolun · 2023-09-13T15:20:34Z

same with AWS EFS CSI Driver Addon.
Kubernetes version 1.26
Addon version v1.5.8-eksbuild.1

restarted 6 times

W0913 15:04:08.076212       1 connection.go:173] Still connecting to unix:///csi/csi.sock
E0913 15:06:35.272240       1 main.go:74] health check failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E0913 15:08:17.266731       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded
E0913 15:08:21.565558       1 main.go:64] failed to establish connection to CSI driver: context deadline exceeded

Got above error message in liveness-probe container in both controller and daemon sets pods

Not sure it's important, this is the liveness probe setting. Don't know if it's backwards compatible, but looks like the image version is for 1.27, not 1.26

    - name: liveness-probe
      image: >-
        602401143452.dkr.ecr.eu-west-2.amazonaws.com/eks/livenessprobe:v2.10.0-eks-1-27-3

aleonsan · 2023-09-21T15:46:45Z

I had same issue
Kubernetes version: 1.24 (yes, I know ...)
Chart version: aws-efs-csi-driver-2.4.9 (appVersion 1.6.0)

Doing some reverse ingeneering I manage to see that there are images for each eks version family for livenessprobe, node-driver-registrar and external-provisioner. I just went to public ECR gallery https://gallery.ecr.aws/ and there you will find them.

They also provide tags -latest for each eks family (e.g eks-distro/kubernetes-csi/livenessprobe:v2.10.0-eks-1-24-latest), wchi is quite handy to do not to be forced to update image each time there is a patch.

In my case, the original problem persist

Failed to establish connection to CSI driver: context deadline exceeded

Hope this helps.

quantori-pokidovea · 2024-06-08T16:01:05Z

The same issue.
Kubernetes version 1.28
aws-efs-csi-driver v2.0.3-eksbuild.1

E0608 15:59:57.937486       1 main.go:67] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0608 16:00:01.881657       1 main.go:77] "Health check failed" err="rpc error: code = Canceled desc = context canceled"
E0608 16:00:05.061272       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Affan-7 · 2024-06-28T07:10:03Z

/reopen

Facing the same issue.
Kubernetes version 1.29
aws-efs-csi-driver v2.0.3-eksbuild.1

E0628 06:55:45.994174       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:01.404779       1 main.go:67] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0628 06:56:08.620869       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:11.561200       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:14.657515       1 main.go:77] "Health check failed" err="rpc error: code = Canceled desc = context canceled"
E0628 06:57:13.948817       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:58:11.133036       1 main.go:67] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0628 06:58:17.548904       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

k8s-ci-robot · 2024-06-28T07:10:06Z

@Affan-7: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Facing the same issue.
Kubernetes version 1.29
aws-efs-csi-driver v2.0.3-eksbuild.1

E0628 06:55:45.994174       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:01.404779       1 main.go:67] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0628 06:56:08.620869       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:11.561200       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:56:14.657515       1 main.go:77] "Health check failed" err="rpc error: code = Canceled desc = context canceled"
E0628 06:57:13.948817       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
E0628 06:58:11.133036       1 main.go:67] "Failed to establish connection to CSI driver" err="context deadline exceeded"
E0628 06:58:17.548904       1 main.go:77] "Health check failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 11, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2021

k8s-ci-robot closed this as completed Oct 9, 2021

nwsparks mentioned this issue Aug 16, 2024

make health options configurable #1428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liveness Probe Failure #336

Liveness Probe Failure #336

reddy-s commented Feb 11, 2021 •

edited

Loading

anaclaudiar commented Feb 18, 2021 •

edited

Loading

timrcoulson commented May 12, 2021

k8s-triage-robot commented Aug 10, 2021

k8s-triage-robot commented Sep 9, 2021

k8s-triage-robot commented Oct 9, 2021

k8s-ci-robot commented Oct 9, 2021

joegraviton commented Apr 19, 2022

manoelhc commented Mar 17, 2023 •

edited

Loading

micolun commented Sep 13, 2023

aleonsan commented Sep 21, 2023 •

edited

Loading

quantori-pokidovea commented Jun 8, 2024

Affan-7 commented Jun 28, 2024

k8s-ci-robot commented Jun 28, 2024

Liveness Probe Failure #336

Liveness Probe Failure #336

Comments

reddy-s commented Feb 11, 2021 • edited Loading

anaclaudiar commented Feb 18, 2021 • edited Loading

timrcoulson commented May 12, 2021

k8s-triage-robot commented Aug 10, 2021

k8s-triage-robot commented Sep 9, 2021

k8s-triage-robot commented Oct 9, 2021

k8s-ci-robot commented Oct 9, 2021

joegraviton commented Apr 19, 2022

manoelhc commented Mar 17, 2023 • edited Loading

micolun commented Sep 13, 2023

aleonsan commented Sep 21, 2023 • edited Loading

quantori-pokidovea commented Jun 8, 2024

Affan-7 commented Jun 28, 2024

k8s-ci-robot commented Jun 28, 2024

reddy-s commented Feb 11, 2021 •

edited

Loading

anaclaudiar commented Feb 18, 2021 •

edited

Loading

manoelhc commented Mar 17, 2023 •

edited

Loading

aleonsan commented Sep 21, 2023 •

edited

Loading