Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck at open_by_handle_at when zoom flatpak is closed #11

Closed
awerlang opened this issue Jun 25, 2022 · 3 comments
Closed

Stuck at open_by_handle_at when zoom flatpak is closed #11

awerlang opened this issue Jun 25, 2022 · 3 comments

Comments

@awerlang
Copy link

  1. /mnt is mounted from the root subvolume in a btrfs filesystem
  2. fatrace is launched as cd /mnt; sudo fatrace --current-mount -f W
  3. Open zoom flatpak
  4. Close zoom flatpak
  5. At this point fatrace becomes unresponsive
  6. Hit Ctrl+C prints an error message
  7. Hit Ctrl+C again to kill the process

Last few logs collected:

cat(18074): W /mnt/@/home/andre/.zoom/logs/zoom_stdout_stderr.log
cat(18073): W /mnt/@/home/andre/.zoom/logs/zoom_stdout_stderr.log
zoom.real(18069): CW /mnt/@/home/andre/.var/app/us.zoom.Zoom/cache/nvidia/GLCache/b8c712fd1f468bbad362238068d2f69a/d86578c586d70762/19f16eef6e5808a1.toc
zoom.real(18069): CW /mnt/@/home/andre/.var/app/us.zoom.Zoom/cache/nvidia/GLCache/b8c712fd1f468bbad362238068d2f69a/d86578c586d70762/19f16eef6e5808a1.bin
^Cfatrace: open_by_handle_at: Interrupted system call
zoom.real(18069): CW (deleted)

State at the time of the hang:

$ ps -weo pid,stat,wchan:32,args | grep [f]atrace
14490 S+   -                                sudo fatrace --current-mount -f W
14491 S+   -                                fatrace --current-mount -f W
$ sudo lsof -p 14491
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.portal file system /run/user/1000/doc
      Output information may be incomplete.
COMMAND   PID USER   FD      TYPE DEVICE SIZE/OFF    NODE NAME
fatrace 14491 root  cwd       DIR   0,31        2     256 /mnt
fatrace 14491 root  rtd       DIR   0,34      154     256 /
fatrace 14491 root  txt       REG   0,34    22992 5662434 /usr/sbin/fatrace
fatrace 14491 root  mem       REG   0,34  2563400 5593346 /usr/lib64/libc.so.6
fatrace 14491 root  mem       REG   0,34   264856 5593343 /usr/lib64/ld-linux-x86-64.so.2
fatrace 14491 root    0u      CHR  136,3      0t0       6 /dev/pts/3
fatrace 14491 root    1u      CHR  136,3      0t0       6 /dev/pts/3
fatrace 14491 root    2u      CHR  136,3      0t0       6 /dev/pts/3
fatrace 14491 root    3u  a_inode   0,14        0   11348 [fanotify]
$ lsof +D /mnt/@/home/andre/.var/app/us.zoom.Zoom/
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
      Output information may be incomplete.

Other flatpak doesn't exhibit this behavior.
Other native apps don't exhibit this behavior.

@martinpitt
Copy link
Owner

Thanks for your report -- fanotify() generally cannot handle btrfs subvolumes, see issue #3. But here it seems you found an actual kernel bug: open_by_handle_at() is not supposed to block. It should either succeed or fail with an error.

It could possibly help to open the handle in non-blocking mode:

--- fatrace.c
+++ fatrace.c
@@ -149,7 +149,7 @@ get_fid_event_fd (const struct fanotify_event_metadata *data)
 
     /* get affected file fd from fanotify_event_info_fid */
     fd = open_by_handle_at (get_mount_id ((const fsid_t *) &fid->fsid),
-                            (struct file_handle *) fid->handle, O_RDONLY);
+                            (struct file_handle *) fid->handle, O_RDONLY|O_NONBLOCK);
     /* ignore ESTALE for deleted fds between the notification and handling it */
     if (fd < 0 && errno != ESTALE)
         warn ("open_by_handle_at");

Would you be able to test that? fatrace-nonblock.tar.gz is a compiled x86_64 binary, if you use another architecture, can you try and compile it yourself? If not, tell me the architecture and I can probably build it for you.

Thanks!

@martinpitt martinpitt added the question Further information is requested label Jun 26, 2022
@awerlang
Copy link
Author

@martinpitt hi -- this patch is effective against the hang. Thanks for the quick fix!

These are the lines following the point where it would hang.

zoom.real(25222): CWO /mnt/@/home/andre/.var/app/us.zoom.Zoom/config/Unknown Organization/zoom.real .conf
zoom.real(25222): W /mnt/@/home/andre/.var/app/us.zoom.Zoom/config/Unknown Organization/zoom.real .conf.lock
zoom.real(25222): W /mnt/@/home/andre/.var/app/us.zoom.Zoom/config/Unknown Organization/zoom.real .conf.XgdzNq
zoom.real(25222): CW /mnt/@/home/andre/.var/app/us.zoom.Zoom/config/Unknown Organization/zoom.real .conf
zoom.real(25222): CW (deleted)

martinpitt added a commit that referenced this issue Jun 26, 2022
This seems to work around a kernel bug with btrfs, where
open_by_handle_at() hangs indefinitely by default. We never actually
read anything from that fd, so we don't care about the blocking mode.

Fixes #11
@martinpitt
Copy link
Owner

Nice! This was a shot into the dark, but sometimes you hit something 😉 I committed this workaround.

@martinpitt martinpitt removed the question Further information is requested label Jun 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants