Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout from geesefs leads to a wrong mount into a pod #107

Closed
berlic opened this issue Apr 5, 2024 · 3 comments
Closed

Timeout from geesefs leads to a wrong mount into a pod #107

berlic opened this issue Apr 5, 2024 · 3 comments

Comments

@berlic
Copy link

berlic commented Apr 5, 2024

Hi, thanks for this CSI driver — very useful indeed!

We were troubleshooting transient problem with s3 mounts inside pods.
The problem is that sometimes after a pod has started the directory inside a pod that is expected to be an s3 mount leads to some different place.

We noticed some patterns for the issue and believe that there is a bug in the geesefs mounter here (returning without waitForMount)

Chain of events to trigger the bug:

  • A pod is scheduled to a node
  • Kubelet tries to create a mountpoint through csi-s3
  • csi-s3 creates SystemD unit with geesefs process and starts it
  • geesefs is slow to create a mount for some reason (although the process is running)
  • csi-s3 waits for the mountpoint to appear and fails with "GRPC error: Timeout waiting for mount"
  • Kubelet retries the process
  • csi-s3 detects a running SystemD unit with proper arguments and thinks that the mountpoint exists (bug here)
  • Kubelet assumes that csi-s3 has done its job and creates a pod
  • Docker starts a container with a 'mount' volume, which points to a wrong place (a directory on a disk where /var/lib/kubelet/pods/ is located instead of and geesefs mountpoint, because the original geesefs process is slow to create a mount and the mountpoint is not yet created)

The timeout for waitForMount is 10 seconds, while geesefs slow starts can be as slow as 2 minutes:

Apr 04 15:28:03 host123 systemd[1]: Started GeeseFS mount for Kubernetes volume test/pvc-d0b3ec1f-2c7b-4e66-cc1b-2f1c5db11d87.
Apr 04 15:30:03 host123 geesefs[1946684]: 2024/04/04 15:30:03.670881 fuse.DEBUG Beginning the mounting kickoff process
@vitalif
Copy link
Collaborator

vitalif commented Apr 22, 2024

Thanks, I committed a fix in master
Even though I think GeeseFS should never start so slow, it doesn't do anything slow when it starts...

@vitalif
Copy link
Collaborator

vitalif commented Apr 22, 2024

Released 0.40.8, please check

@vitalif
Copy link
Collaborator

vitalif commented Apr 22, 2024

Please note that you should delete resources from old provisioner.yaml when upgrading if you're not using Helm
https://github.com/yandex-cloud/k8s-csi-s3/?tab=readme-ov-file#upgrading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants