csi-nfs-controller pod fails #783

bitchecker · 2024-10-29T09:53:45Z

What happened:
I'm using k8s cluster on AWS eks, and I'm using spot instances for node groups. I see that randomly and not on all clusters one pod that manage the CSI NFS controller goes in crashloopback and report these logs:

csi-snapshotter E1029 09:35:37.115611       1 leaderelection.go:340] Failed to update lock optimitically: Operation cannot be fulfilled on leases.coordination.k8s.io "external-snapshotter-leader-nfs-csi-k8s-io": the object has been modified; please apply your changes to the latest version and try again, falling back to slow path

If I delete the pod, all starts without any issue:

nfs Compiler: gc
nfs Driver Name: nfs.csi.k8s.io
nfs Driver Version: v4.9.0
nfs Git Commit: ""
nfs Go Version: go1.22.3
nfs Platform: linux/amd64

It seems that every time (or mostly) that an ec2 is retired and swapped with another one, csi-nfs-controller has some lock that can be solved only with a brutal pod delete.

What you expected to happen:
No crashloopback status on a controller pod
How to reproduce it:
Try to deploy a cluster with spot instances and install nfs-csi-controller and see IF happens and WHEN.
Anything else we need to know?:

Environment:

CSI Driver version: 4.9.0
Kubernetes version (use kubectl version): 1.31
OS (e.g. from /etc/os-release): AWS Bottlerocket
Install tools: Terraform + Helm
Others:

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csi-nfs-controller pod fails #783

csi-nfs-controller pod fails #783

bitchecker commented Oct 29, 2024 •

edited

Loading

csi-nfs-controller pod fails #783

csi-nfs-controller pod fails #783

Comments

bitchecker commented Oct 29, 2024 • edited Loading

bitchecker commented Oct 29, 2024 •

edited

Loading