Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-2.0.3-1 on 5.10.16 syscall hangs indefinitely on directory listing. #11621

Closed
majiru opened this issue Feb 20, 2021 · 5 comments
Closed

zfs-2.0.3-1 on 5.10.16 syscall hangs indefinitely on directory listing. #11621

majiru opened this issue Feb 20, 2021 · 5 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@majiru
Copy link

majiru commented Feb 20, 2021

System information

Type Version/Name
Distribution Name Artix
Distribution Version Rolling
Linux Kernel 5.10.16-artix1-1
Architecture amd64
ZFS Version 2.0.3-1
SPL Version 2.0.3-1

Describe the problem you're observing

A ls within a specific directory causes a syscall to hang forever. A scrub was done after noticing this and reported no errors. The call stack given in dmesg is below.

Describe how to reproduce the problem

$ cd to/specific/dir && ls

Include any warning/errors/backtraces from the system logs

[ 1351.456695] INFO: task find:4141 blocked for more than 1228 seconds.
[ 1351.456699]       Tainted: P           OE     5.10.16-artix1-1 #1
[ 1351.456701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1351.456703] task:find            state:D stack:    0 pid: 4141 ppid:     1 flags:0x00000004
[ 1351.456706] Call Trace:
[ 1351.456714]  __schedule+0x295/0x810
[ 1351.456717]  schedule+0x5b/0xc0
[ 1351.456724]  vcmn_err.cold+0x7e/0x80 [spl]
[ 1351.456797]  zfs_panic_recover+0x75/0x90 [zfs]
[ 1351.456860]  zfs_znode_alloc+0x6d8/0x740 [zfs]
[ 1351.456927]  zfs_zget+0x270/0x2b0 [zfs]
[ 1351.456985]  zfs_dirent_lock+0x36c/0x6c0 [zfs]
[ 1351.457040]  zfs_dirlook+0xad/0x2d0 [zfs]
[ 1351.457095]  ? zfs_zaccess+0x127/0x490 [zfs]
[ 1351.457153]  zfs_lookup+0x1e6/0x3d0 [zfs]
[ 1351.457208]  zpl_lookup+0xf2/0x210 [zfs]
[ 1351.457212]  __lookup_slow+0x85/0x140
[ 1351.457215]  walk_component+0x141/0x1b0
[ 1351.457217]  path_lookupat+0x5b/0x190
[ 1351.457219]  filename_lookup+0xbe/0x1d0
[ 1351.457223]  vfs_statx+0x86/0x140
[ 1351.457225]  __do_sys_newfstatat+0x46/0x80
[ 1351.457229]  do_syscall_64+0x33/0x40
[ 1351.457231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1351.457233] RIP: 0033:0x7f50a735babf
[ 1351.457235] RSP: 002b:00007ffd0cf37ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000106
[ 1351.457237] RAX: ffffffffffffffda RBX: 00007ffd0cf37fa0 RCX: 00007f50a735babf
[ 1351.457238] RDX: 00007ffd0cf37fa0 RSI: 000055677deb91a0 RDI: 0000000000000008
[ 1351.457239] RBP: 00007ffd0cf37fa0 R08: 0000000000000100 R09: 0000000000000001
[ 1351.457240] R10: 0000000000000100 R11: 0000000000000246 R12: 000055677deb91a0
[ 1351.457240] R13: 000055677de7f610 R14: 000000000000000a R15: 0000000000000000
@majiru majiru added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Feb 20, 2021
@aerusso
Copy link
Contributor

aerusso commented Feb 20, 2021

Hello @majiru ,

Did you run 2.0.1 or 2.0.2 (or any version of ZFS including 3d40b65, but not 8829ba1)?

And, are there any lines like PANIC: inode has invalid mode in your dmesg logs?

If so, this may be due to (correctable) metadata corruption: please take a look at #11474 , and specifically #11474 (comment) .

@majiru
Copy link
Author

majiru commented Feb 20, 2021

I had indeed run those versions and just attempted that fix mentioned in that link and it seemed to have fixed the error I was hitting. Thank you so much. I apologize for the duplicate issue. I'll close this issue then, is there anything else I should do to correct this(besides setting zfs_recover).

@majiru majiru closed this as completed Feb 20, 2021
@aerusso
Copy link
Contributor

aerusso commented Feb 20, 2021

@majiru If you are indeed experiencing that bug, you should reset the permissions (chmod the offending file), then disable zfs_recover---this should fully correct the metadata (but not in any snapshots with the error---easiest solution is to just remove any corrupted snapshots).

I would not leave zfs_recover enabled, it might mask other problems.

@ipaqmaster
Copy link

Got this on Archlinux with zfs 2.0.3 while rm / ls / any-file-based-interaction with a Windows game's directory which ran via Wine. Doing echo 1 > /sys/module/zfs/parameters/zfs_recover allowed me to delete the contents while not getting hung on node ###### has invalid mode: 0x0 and move on. Set it back to 0 afterwards.

@AlexandreBonneau
Copy link

AlexandreBonneau commented Feb 24, 2021

Same as @ipaqmaster, the problem happened under 2.0.2 as well as under 2.0.3-1 (with Debian testing) with error messages like: PANIC: inode 395550 has invalid mode: 0x7100.
This happens indeed in Windows game's directory created by Steam and ran via Proton/Wine. Specifically, it happens with that file drive_c/ProgramData/Origin/SelfUpdate/Staged/OriginThinSetupInternal.dxvk-cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

4 participants