You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for this CSI driver — very useful indeed!
We were troubleshooting transient problem with s3 mounts inside pods.
The problem is that sometimes after a pod has started the directory inside a pod that is expected to be an s3 mount leads to some different place.
We noticed some patterns for the issue and believe that there is a bug in the geesefs mounter here (returning without waitForMount)
Chain of events to trigger the bug:
A pod is scheduled to a node
Kubelet tries to create a mountpoint through csi-s3
csi-s3 creates SystemD unit with geesefs process and starts it
geesefs is slow to create a mount for some reason (although the process is running)
csi-s3 waits for the mountpoint to appear and fails with "GRPC error: Timeout waiting for mount"
Kubelet retries the process
csi-s3 detects a running SystemD unit with proper arguments and thinks that the mountpoint exists (bug here)
Kubelet assumes that csi-s3 has done its job and creates a pod
Docker starts a container with a 'mount' volume, which points to a wrong place (a directory on a disk where /var/lib/kubelet/pods/ is located instead of and geesefs mountpoint, because the original geesefs process is slow to create a mount and the mountpoint is not yet created)
The timeout for waitForMount is 10 seconds, while geesefs slow starts can be as slow as 2 minutes:
Apr 04 15:28:03 host123 systemd[1]: Started GeeseFS mount for Kubernetes volume test/pvc-d0b3ec1f-2c7b-4e66-cc1b-2f1c5db11d87.
Apr 04 15:30:03 host123 geesefs[1946684]: 2024/04/04 15:30:03.670881 fuse.DEBUG Beginning the mounting kickoff process
The text was updated successfully, but these errors were encountered:
Hi, thanks for this CSI driver — very useful indeed!
We were troubleshooting transient problem with s3 mounts inside pods.
The problem is that sometimes after a pod has started the directory inside a pod that is expected to be an s3 mount leads to some different place.
We noticed some patterns for the issue and believe that there is a bug in the geesefs mounter here (returning without
waitForMount
)Chain of events to trigger the bug:
The timeout for
waitForMount
is 10 seconds, while geesefs slow starts can be as slow as 2 minutes:The text was updated successfully, but these errors were encountered: