Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mounting several volumes at once fails very frequently #125

Closed
tsmetana opened this issue Mar 23, 2022 · 14 comments
Closed

Mounting several volumes at once fails very frequently #125

tsmetana opened this issue Mar 23, 2022 · 14 comments

Comments

@tsmetana
Copy link
Contributor

We're encountering problems when trying to mount multiple EFS volumes at once. The mount process gets stuck, when trying to debug RPC there are occassional nfs: server 127.0.0.1 not responding, timed out errors in the log (not sure if those are related -- the mount.efs should retry AFAIK). The stunnel processes serving the mount RPC connections seem to be just waiting for for connection but nothing happens.

This problem has been observed only with Centos 8 (or Centos Streams 8) running stunnel-5.56-5.el8_3 and openssl-libs-1.1.1k-5.el8_5. When trying Amazon Linux 2 with stunnel-4.56-6.amzn2.0.3 and openssl-libs-1.0.2k-19.amzn2.2.0.10 everything works OK. I suspected this is a race in stunnel, so I've tried to recompile stunnel-5.56-5 and install it on Amazon Linux 2 but the issue is again not reproducible, so it's not stunnel (or not stunnel itself).

The issue seems to be also quite timing-sensitive. Increasing log level or changing stunnel options seems to have effect on probability of the problem to show. I've tried to remove PID file creation (since the issue #112 looked quite similar) but it doesn't seem to help -- I can still see the pending mounts. I also suspected the issue #105, but even if I fixed that (I hope -- PR #119) the mounts still get stuck.

I wonder if the problem in issue #114 is related: we're mostly encountering this problem through efs-csi-driver on kubernetes clusters when creating and removing multiple EFS volumes in the cluster in one shot.

I'm curious if somebody had more insight or encountered the problem: it looks like it's the combination of multiple factors that cause this and I failed to find any interesting debugging clues.

@Cappuccinuo
Copy link
Contributor

Cappuccinuo commented Mar 23, 2022

Hey @tsmetana ,

Are you trying to mount multiple different file system or same file system? Can you try kill -9 that hanging stunnel process and see if the issue is gone?

Also if you have the stunnel debug log, feel free to paste it here.

Thanks.

@tsmetana
Copy link
Contributor Author

Hi. Thanks for the quick response. It was EC2. I don't have the debug logs since raising the debug level makes the problem hard to reproduce (I may try...). And it didn't occur to me to try to kill the stunnel process. IIRC re-running the mount.nfs manually helped. But none of that would be a viable workaround... As mentioned -- this is originally coming from the EFS CSI driver. So just creating several EFS volumes (PVs) and deleting them has high chance of failing (since deleting them means to mount them on kubernetes master). I have straces of the waiting stunnel and RPC logs but there is nothing obvious in those.

@Cappuccinuo
Copy link
Contributor

Thanks. kvifern@ provided a workaround that launch a unit to monitor the file system in kubernetes-sigs/aws-efs-csi-driver#616 (comment).

Killing the stunnel process and watchdog will relaunch a new stunnel which will reconnect to server. You can try that and verify if that is working. Meanwhile we are looking this kind of issue right now actively.

@Cappuccinuo
Copy link
Contributor

Cappuccinuo commented Mar 25, 2022

Also I wonder if you can upgrade your stunnel version to 5.63, I left a comment in kubernetes-sigs/aws-efs-csi-driver#616 (comment), based on our finding so far, the issue could be related to the stunnel itself. We are trying to reproduce on our side as well with various version. Let me know, thanks.

@tsmetana
Copy link
Contributor Author

I forgot to mention: I have also tried stunnel-5.62 (rebuilt from latest Fedora) with Centos 8 and it didn't help: the mounts still got hung. Btw. the problem in kubernetes-sigs/aws-efs-csi-driver#616 is slightly different from what I'm observing: I don't even get to mount the filesystem. Though I also suspect they might be related.

@Cappuccinuo
Copy link
Contributor

Next time when you encountered the problem, can you paste the stunnel log here in case you have? Thanks.

@tsmetana
Copy link
Contributor Author

Hi. Sorry for the delayed responses. I got to retesting this finally. I used only an EC2 VM with Centos 8 Streams (stunnel-5.56-5.el8_3.x86_64, openssl-libs-1.1.1k-6.el8.x86_64) and latest efs-utils from git head (amazon-efs-utils-1.32.1-1.el8.noarch from make rpm).

If I try to mount 30 directories at once from an EFS volume subdirectories, almost all the time some of them fails to mount. The error from /var/log/amazon/efs/mount.log looks like this:

2022-04-12 13:50:04 UTC - ERROR - Failed to start TLS tunnel (errno=1). stdout="b''" stderr="b'[ ] Clients allowed=500\n[.] stunnel 5.56 on x86_64-redhat-linux-gnu platform\n[.] Compiled with OpenSSL 1.1.1g FIPS  21 Apr 2020\n[.] Running  with OpenSSL 1.1.1k  FIPS 25 Mar 2021\n[.] Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,PSK,SNI\n[ ] errno: (*__errno_location ())\n[.] Reading configuration from file /run/efs/stunnel-config.fs-095e0e35d06f84c38.mnt.mnt-030.20370\n[.] UTF-8 byte order mark not detected\n[.] FIPS mode disabled\n[ ] Compression disabled\n[ ] No PRNG seeding was required\n[ ] Initializing service [efs]\n[ ] Ciphers: PROFILE=SYSTEM\n[ ] TLSv1.3 ciphersuites: TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384:TLS_AES_128_GCM_SHA256\n[ ] TLS options: 0x02100004 (+0x00000000, -0x00000000)\n[ ] Loading certificate from file: /var/run/efs/fs-095e0e35d06f84c38.mnt.mnt-030.20370+/certificate.pem\n[ ] Certificate loaded from file: /var/run/efs/fs-095e0e35d06f84c38.mnt.mnt-030.20370+/certificate.pem\n[ ] Loading private key from file: /etc/amazon/efs/privateKey.pem\n[ ] Private key loaded from file: /etc/amazon/efs/privateKey.pem\n[ ] Private key check succeeded\n[.] Configuration successful\n[ ] Binding service [efs]\n[ ] Listening file descriptor created (FD=9)\n[ ] Setting accept socket options (FD=9)\n[ ] Option SO_REUSEADDR set on accept socket\n[ ] Option SO_BINDTODEVICE set on accept socket\n[.] Binding service [efs] to 127.0.0.1:20370: Address already in use (98)\n[!] Binding service [efs] failed\n[ ] Deallocating section defaults\n[ ] Unbinding service [efs]\n[ ] Service [efs] closed\n[ ] Deallocating section [efs]'"

So the culprit seems to be Address already in use. Looking at the code that selects the binding port I suspect it's racy: https://github.com/aws/efs-utils/blob/master/src/mount_efs/__init__.py#L961-L973

It goes in sequence through suitable ports, tries to bind each, if successful, closes the port and says this port should be used in the stunnel configuration file, which then gets written. And if there's several mount.efs processes running in parallel there's quite a lot of time between the port being closed and configuration written for other mount.efs to choose the same one because it's free for a moment (I found other mount from the same "batch" using the selected port). Then it seems like it never recovers from the failed stunnel startup.

@Cappuccinuo
Copy link
Contributor

Thanks @tsmetana for the logs! What you described is valid. I think the port could be already taken since there is gap between launch stunnel and verify the tlsport is used, thus it is possible for two mount command use same tlsport and one of them timeout due to tls tunnel cannot be launched. efs-utils is an ad-hoc command line tool, so internally it does not handle this kind of concurrent mount well.

Are you trying to mount them at once in /etc/fstab? Or it's just an application launch 30 threads and mount those directories. Before we suggested customer to mount the EFS sequentially in /etc/fstab: https://docs.aws.amazon.com/efs/latest/ug/troubleshooting-efs-mounting.html#automount-fix-multiple-fs

Another thought is to increase the port range in config file, currently there are 400 ports, you can bump it to 4000 ports, so it will decrease the possibility of conflict tlsport.

Will engage someone to take a look on this kind of issue.

@tsmetana
Copy link
Contributor Author

Hi. The test case is basically this:

  1. Create 30 mount points somewhere (/mnt is a good candidate): for i in $(seq -w 30); do mkdir /mnt/efs-$i; done
  2. Mount the EFS export, I had 30 subdirectories created on the EFS volume (01--30) and mounted each: for i in $(seq -w 30); do mount -t efs fs-12345678:/$i /mnt/efs-$i &; done. (Note the mounts are started in background so they'd run in parallel.) The subdirectories should not be required for the problem to show, mounting / 30-times should go through the same codepath.

Please note that the parallel mount is what Kubernetes would do (by default) in CSI driver / provisioner, so even though there is a documented workaround for people using EFS mounts in fstab, we still have quite an unreliable (or slow to mount) EFS in Kubernetes.

As for the fix I had several ideas but I think the most reliable one would be to account for this error, choose a different port and re-start stunnel. It may be also possible to close the socket only after the config file is written just berfore stunnel starts. Since the port is included in the name of the stunnel config file name, we can then include also the check for the existence of such config when selecting the port number and don't attempt to use the port for which another stunnel has been configured already. Not sure how many changes would it take though... :) I may try myself eventually.

@tsmetana
Copy link
Contributor Author

Still broken in v1.33.1.

@acdha
Copy link

acdha commented Jun 23, 2022

I've noticed this still hitting the race condition with v1.33.1 (note the duplicate “binding 20417” below). What's interesting is that while an error was logged, the mount did not fail. If it had, our automated retries would have picked it up. What instead happened was a very confusing situation where the filesystem actually mounted was the other one, so the mount appeared to be successful but all of the contents were wrong. Trapping the stunnel error and either retrying or failing the mount attempt would have made this a problem lasting only a few seconds.

2022-06-21 18:12:30 UTC - INFO - version=1.33.1 options={'ro': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, '_netdev': None}
2022-06-21 18:12:30 UTC - INFO - version=1.33.1 options={'rw': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, '_netdev': None}
2022-06-21 18:12:30 UTC - INFO - version=1.33.1 options={'rw': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, '_netdev': None}
2022-06-21 18:12:30 UTC - INFO - version=1.33.1 options={'ro': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, 'actimeo': '3600', '_netdev': None}
2022-06-21 18:12:30 UTC - INFO - version=1.33.1 options={'ro': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, '_netdev': None}
2022-06-21 18:12:30 UTC - INFO - binding 20137
2022-06-21 18:12:30 UTC - INFO - binding 20229
2022-06-21 18:12:30 UTC - INFO - binding 20205
2022-06-21 18:12:30 UTC - INFO - binding 20417
2022-06-21 18:12:30 UTC - INFO - binding 20417
2022-06-21 18:12:30 UTC - INFO - Starting TLS tunnel: "/usr/bin/stunnel /var/run/efs/stunnel-config.fs-7de23688.srv.cronjobs.20417"
2022-06-21 18:12:30 UTC - INFO - Starting TLS tunnel: "/usr/bin/stunnel /var/run/efs/stunnel-config.fs-72e23687.srv.sysadmin.20205"
2022-06-21 18:12:30 UTC - INFO - Starting TLS tunnel: "/usr/bin/stunnel /var/run/efs/stunnel-config.fs-13474a90.srv.project.20417"
2022-06-21 18:12:30 UTC - INFO - Started TLS tunnel, pid: 3308
2022-06-21 18:12:30 UTC - INFO - Starting TLS tunnel: "/usr/bin/stunnel /var/run/efs/stunnel-config.fs-c947f13c.srv.projectrsync.20137"
2022-06-21 18:12:30 UTC - INFO - Started TLS tunnel, pid: 3310
2022-06-21 18:12:30 UTC - INFO - Started TLS tunnel, pid: 3311
2022-06-21 18:12:30 UTC - INFO - Started TLS tunnel, pid: 3313
2022-06-21 18:12:30 UTC - INFO - Starting TLS tunnel: "/usr/bin/stunnel /var/run/efs/stunnel-config.fs-50e5b8a5.srv.projecticons.20229"
2022-06-21 18:12:30 UTC - INFO - Started TLS tunnel, pid: 3316
2022-06-21 18:12:30 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /srv/project -o ro,nodev,nosuid,noexec,noatime,_netdev,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20417" with 15 sec time limit.
2022-06-21 18:12:30 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /srv/cronjobs -o rw,nodev,nosuid,noexec,noatime,_netdev,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20417" with 15 sec time limit.
2022-06-21 18:12:30 UTC - ERROR - Failed to start TLS tunnel (errno=1). stdout="b''" stderr="b'Clients allowed=500\nstunnel 4.56 on x86_64-koji-linux-gnu platform\nCompiled/running with OpenSSL 1.0.2k-fips  26 Jan 2017\nThreading:PTHREAD Sockets:POLL,IPv6 SSL:ENGINE,OCSP,FIPS Auth:LIBWRAP\nReading configuration from file /var/run/efs/stunnel-config.fs-7de23688.srv.cronjobs.20417\nFIPS mode is disabled\nCompression not enabled\nSnagged 64 random bytes from /dev/urandom\nPRNG seeded successfully\nInitializing service [efs]\nCertificate: /var/run/efs/fs-7de23688.srv.cronjobs.20417+/certificate.pem\nCertificate loaded\nKey file: /etc/amazon/efs/privateKey.pem\nPrivate key loaded\nLoaded verify certificates from /etc/amazon/efs/efs-utils.crt\nLoaded /etc/amazon/efs/efs-utils.crt revocation lookup file\nSSL options set: 0x03000004\nConfiguration successful\nlisten: Address already in use (98)\nClosing service [efs]\nService [efs] closed (FD=7)\nSessions cached before flush: 0\nSessions cached after flush: 0\nService [efs] closed\nstr_stats: 15 block(s), 1132 data byte(s), 870 control byte(s)'"
2022-06-21 18:12:30 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /srv/projectrsync -o ro,nodev,nosuid,noexec,noatime,actimeo=3600,_netdev,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20137" with 15 sec time limit.
2022-06-21 18:12:30 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /srv/sysadmin -o ro,nodev,nosuid,noexec,noatime,_netdev,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20205" with 15 sec time limit.
2022-06-21 18:12:30 UTC - INFO - Executing: "/sbin/mount.nfs4 127.0.0.1:/ /srv/projecticons -o rw,nodev,nosuid,noexec,noatime,_netdev,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport,port=20229" with 15 sec time limit.
2022-06-21 18:12:30 UTC - INFO - Successfully mounted fs-13474a90.efs.us-east-1.amazonaws.com at /srv/project
2022-06-21 18:12:30 UTC - INFO - Successfully mounted fs-72e23687.efs.us-east-1.amazonaws.com at /srv/sysadmin
2022-06-21 18:12:31 UTC - INFO - Successfully mounted fs-c947f13c.efs.us-east-1.amazonaws.com at /srv/projectrsync
2022-06-21 18:12:31 UTC - INFO - Successfully mounted fs-50e5b8a5.efs.us-east-1.amazonaws.com at /srv/projecticons
2022-06-23 13:39:32 UTC - INFO - version=1.33.1 options={'ro': None, 'nodev': None, 'nosuid': None, 'noexec': None, 'noatime': None, 'tls': None, '_netdev': None}

@Cappuccinuo
Copy link
Contributor

@acdha

In the logs you are posting, I can see that the one mounting to /cronjobs is failing due to address already being taken, so that's why you can only see four Successfully mounted message, and I think the mount is indeed failed because the stunnel is not even established,

  1. What is your df output before your mount retry?
  2. Which mount do you think is swapped in this case?

@acdha
Copy link

acdha commented Jun 27, 2022

In the logs you are posting, I can see that the one mounting to /cronjobs is failing due to address already being taken, so that's why you can only see four Successfully mounted message, and I think the mount is indeed failed because the stunnel is not even established,

This is partially correct: stunnel fails to start but that error doesn't appear to be handled other than by logging it — the mount operation continues and ultimately reports success.

1. What is your `df` output before your mount retry?

df will show the space used for the other filesystem – it's otherwise highly similar since the server is always 127.0.0.1:/.

2. Which mount do you think is `swapped` in this case?

In the example above, cronjobs.20417 won the race so the same contents were visible at both its intended mountpoint and the project.20417 mountpoint.

@RyanStan
Copy link
Member

Closing as we've resolved this issue with the v1.34.4 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants