Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS 0.8: kworker sucking up 100% CPU, various messages in logs mentioning ZFS, system eventually hangs. #9430

Closed
ReimuHakurei opened this issue Oct 8, 2019 · 34 comments
Labels
Status: Stale No recent activity for issue

Comments

@ReimuHakurei
Copy link

ReimuHakurei commented Oct 8, 2019

System information

Type Version/Name
Distribution Name Ubuntu Server
Distribution Version 19.10 Eoan Ermine
Linux Kernel 5.3.0-13-generic #14-Ubuntu SMP Tue Sep 24 02:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Architecture amd64
ZFS Version 0.8.1-1ubuntu11

Describe the problem you're observing

After some hours of system uptime, a kworker process will suck up 100% of a core. The system load averages will climb constantly. This kworker process appears to be ZFS related (as of this writing, the current CPU hog is named kworker/u33:3+flush-zfs-19).

This system was originally on Ubuntu Server 18.04 with ZFS 0.8.1 that I had compiled some time before. It was at several months of uptime with no issues. Out of nowhere, this issue came up.

Things that have been tried:

  • Rebooting several times.
  • Upgrading to the Ubuntu HWE kernel
  • Updating to the development release of Ubuntu 19.10 (which comes with ZFS 0.8) to rule out problems with the version I compiled.

I've tried everything I can think of, and since various ZFS-related terms come up in the kworker name and messages printed to dmesg, I am left to assume it is a problem in ZFS.

Describe how to reproduce the problem

Unknown. The system this is occurring on has been stable with minimum in the way of changes for about a year. The system is a host to various LXC containers. The root filesystem is ZFS, on a single 500GB Crucial SSD. All data for the LXC containers is stored on an encrypted storage pool. Output of zpool status:

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:02:31 with 0 errors on Mon Oct  7 22:35:32 2019
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          sda1      ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:55:30 with 0 errors on Mon Oct  7 23:35:51 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        storage                     ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            scsi-35000cca24300d8f4  ONLINE       0     0     0
            scsi-35000cca243014464  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            scsi-35000cca243015924  ONLINE       0     0     0
            scsi-35000cca24400a9a4  ONLINE       0     0     0

errors: No known data errors

Include any warning/errors/backtraces from the system logs

Not sure what would be helpful for debugging, but here's some stuff that was printed to dmesg:

[47488.869573] INFO: task pip:19670 blocked for more than 120 seconds.
[47488.869627]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47488.869670] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869718] pip             D    0 19670  19606 0x00000120
[47488.869722] Call Trace:
[47488.869732]  __schedule+0x2b9/0x6c0
[47488.869736]  ? bit_wait_io+0x50/0x50
[47488.869738]  schedule+0x42/0xb0
[47488.869741]  bit_wait+0x11/0x50
[47488.869743]  __wait_on_bit+0x33/0xa0
[47488.869749]  __inode_wait_for_writeback+0xae/0xe0
[47488.869753]  ? var_wake_function+0x30/0x30
[47488.869756]  inode_wait_for_writeback+0x27/0x40
[47488.869759]  evict+0xbd/0x1b0
[47488.869761]  iput+0x141/0x1f0
[47488.869764]  do_unlinkat+0x1c5/0x2d0
[47488.869767]  __x64_sys_unlinkat+0x38/0x60
[47488.869770]  do_syscall_64+0x5a/0x130
[47488.869774]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869777] RIP: 0033:0x7f7a4ac39d77
[47488.869784] Code: Bad RIP value.
[47488.869786] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47488.869788] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47488.869790] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47488.869791] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47488.869792] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47488.869793] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47488.869797] INFO: task sync:20009 blocked for more than 120 seconds.
[47488.869840]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47488.869883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47488.869933] sync            D    0 20009  20008 0x00004120
[47488.869936] Call Trace:
[47488.869940]  __schedule+0x2b9/0x6c0
[47488.869943]  schedule+0x42/0xb0
[47488.869945]  wb_wait_for_completion+0x64/0x90
[47488.869948]  ? wait_woken+0x80/0x80
[47488.869950]  sync_inodes_sb+0xd7/0x290
[47488.869954]  sync_inodes_one_sb+0x15/0x20
[47488.869957]  iterate_supers+0xa3/0x110
[47488.869959]  ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47488.869962]  ksys_sync+0x42/0xb0
[47488.869964]  __ia32_sys_sync+0xe/0x20
[47488.869967]  do_syscall_64+0x5a/0x130
[47488.869970]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47488.869972] RIP: 0033:0x7f56f6bd7287
[47488.869976] Code: Bad RIP value.
[47488.869977] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47488.869979] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47488.869981] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47488.869982] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47488.869984] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47488.869985] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47609.702068] INFO: task pip:19670 blocked for more than 241 seconds.
[47609.702114]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47609.702150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702191] pip             D    0 19670  19606 0x00000120
[47609.702194] Call Trace:
[47609.702203]  __schedule+0x2b9/0x6c0
[47609.702206]  ? bit_wait_io+0x50/0x50
[47609.702209]  schedule+0x42/0xb0
[47609.702210]  bit_wait+0x11/0x50
[47609.702212]  __wait_on_bit+0x33/0xa0
[47609.702218]  __inode_wait_for_writeback+0xae/0xe0
[47609.702221]  ? var_wake_function+0x30/0x30
[47609.702223]  inode_wait_for_writeback+0x27/0x40
[47609.702226]  evict+0xbd/0x1b0
[47609.702227]  iput+0x141/0x1f0
[47609.702230]  do_unlinkat+0x1c5/0x2d0
[47609.702232]  __x64_sys_unlinkat+0x38/0x60
[47609.702235]  do_syscall_64+0x5a/0x130
[47609.702238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702240] RIP: 0033:0x7f7a4ac39d77
[47609.702247] Code: Bad RIP value.
[47609.702248] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47609.702251] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47609.702252] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47609.702253] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47609.702254] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47609.702255] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47609.702258] INFO: task sync:20009 blocked for more than 241 seconds.
[47609.702293]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47609.702328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47609.702368] sync            D    0 20009  20008 0x00004120
[47609.702370] Call Trace:
[47609.702373]  __schedule+0x2b9/0x6c0
[47609.702376]  schedule+0x42/0xb0
[47609.702378]  wb_wait_for_completion+0x64/0x90
[47609.702379]  ? wait_woken+0x80/0x80
[47609.702381]  sync_inodes_sb+0xd7/0x290
[47609.702384]  sync_inodes_one_sb+0x15/0x20
[47609.702386]  iterate_supers+0xa3/0x110
[47609.702388]  ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47609.702391]  ksys_sync+0x42/0xb0
[47609.702393]  __ia32_sys_sync+0xe/0x20
[47609.702395]  do_syscall_64+0x5a/0x130
[47609.702398]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47609.702399] RIP: 0033:0x7f56f6bd7287
[47609.702402] Code: Bad RIP value.
[47609.702403] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47609.702405] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47609.702406] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47609.702407] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47609.702408] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47609.702409] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47730.534430] INFO: task journal-offline:32058 blocked for more than 120 seconds.
[47730.534481]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47730.534516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.534558] journal-offline D    0 32058  27128 0x00000120
[47730.534561] Call Trace:
[47730.534570]  __schedule+0x2b9/0x6c0
[47730.534573]  schedule+0x42/0xb0
[47730.534575]  schedule_preempt_disabled+0xe/0x10
[47730.534578]  __mutex_lock.isra.0+0x182/0x4f0
[47730.534581]  __mutex_lock_slowpath+0x13/0x20
[47730.534583]  mutex_lock+0x2e/0x40
[47730.534698]  zil_commit_impl+0x148/0xd60 [zfs]
[47730.534766]  ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47730.534769]  ? _cond_resched+0x19/0x30
[47730.534771]  ? mutex_lock+0x13/0x40
[47730.534840]  zil_commit+0x40/0x60 [zfs]
[47730.534908]  zpl_writepages+0xec/0x180 [zfs]
[47730.534912]  do_writepages+0x43/0xd0
[47730.534916]  __filemap_fdatawrite_range+0xc4/0x100
[47730.534919]  filemap_write_and_wait_range+0x36/0x90
[47730.534987]  zpl_fsync+0x3d/0xa0 [zfs]
[47730.534991]  vfs_fsync_range+0x49/0x80
[47730.534993]  ? __fget_light+0x57/0x70
[47730.534995]  do_fsync+0x3d/0x70
[47730.534997]  __x64_sys_fsync+0x14/0x20
[47730.535001]  do_syscall_64+0x5a/0x130
[47730.535003]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535006] RIP: 0033:0x7f872e0af237
[47730.535012] Code: Bad RIP value.
[47730.535014] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47730.535016] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47730.535017] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47730.535018] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47730.535019] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47730.535020] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47730.535120] INFO: task pip:19670 blocked for more than 362 seconds.
[47730.535155]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47730.535190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535230] pip             D    0 19670  19606 0x00000120
[47730.535232] Call Trace:
[47730.535235]  __schedule+0x2b9/0x6c0
[47730.535238]  ? bit_wait_io+0x50/0x50
[47730.535240]  schedule+0x42/0xb0
[47730.535242]  bit_wait+0x11/0x50
[47730.535243]  __wait_on_bit+0x33/0xa0
[47730.535246]  __inode_wait_for_writeback+0xae/0xe0
[47730.535249]  ? var_wake_function+0x30/0x30
[47730.535252]  inode_wait_for_writeback+0x27/0x40
[47730.535254]  evict+0xbd/0x1b0
[47730.535256]  iput+0x141/0x1f0
[47730.535258]  do_unlinkat+0x1c5/0x2d0
[47730.535260]  __x64_sys_unlinkat+0x38/0x60
[47730.535262]  do_syscall_64+0x5a/0x130
[47730.535265]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535266] RIP: 0033:0x7f7a4ac39d77
[47730.535270] Code: Bad RIP value.
[47730.535271] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47730.535273] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47730.535274] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47730.535275] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47730.535276] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47730.535277] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47730.535280] INFO: task sync:20009 blocked for more than 362 seconds.
[47730.535314]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47730.535349] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47730.535389] sync            D    0 20009  20008 0x00004120
[47730.535391] Call Trace:
[47730.535394]  __schedule+0x2b9/0x6c0
[47730.535396]  schedule+0x42/0xb0
[47730.535398]  wb_wait_for_completion+0x64/0x90
[47730.535400]  ? wait_woken+0x80/0x80
[47730.535402]  sync_inodes_sb+0xd7/0x290
[47730.535405]  sync_inodes_one_sb+0x15/0x20
[47730.535407]  iterate_supers+0xa3/0x110
[47730.535409]  ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47730.535411]  ksys_sync+0x42/0xb0
[47730.535413]  __ia32_sys_sync+0xe/0x20
[47730.535416]  do_syscall_64+0x5a/0x130
[47730.535418]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47730.535420] RIP: 0033:0x7f56f6bd7287
[47730.535423] Code: Bad RIP value.
[47730.535424] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47730.535425] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47730.535427] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47730.535428] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47730.535429] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47730.535429] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[47851.366840] INFO: task journal-offline:32058 blocked for more than 241 seconds.
[47851.366900]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47851.366944] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.366994] journal-offline D    0 32058  27128 0x00000124
[47851.366998] Call Trace:
[47851.367007]  __schedule+0x2b9/0x6c0
[47851.367011]  schedule+0x42/0xb0
[47851.367014]  schedule_preempt_disabled+0xe/0x10
[47851.367017]  __mutex_lock.isra.0+0x182/0x4f0
[47851.367021]  __mutex_lock_slowpath+0x13/0x20
[47851.367023]  mutex_lock+0x2e/0x40
[47851.367145]  zil_commit_impl+0x148/0xd60 [zfs]
[47851.367224]  ? zpl_write_common_iovec+0x100/0x100 [zfs]
[47851.367227]  ? _cond_resched+0x19/0x30
[47851.367230]  ? mutex_lock+0x13/0x40
[47851.367309]  zil_commit+0x40/0x60 [zfs]
[47851.367387]  zpl_writepages+0xec/0x180 [zfs]
[47851.367392]  do_writepages+0x43/0xd0
[47851.367396]  __filemap_fdatawrite_range+0xc4/0x100
[47851.367400]  filemap_write_and_wait_range+0x36/0x90
[47851.367478]  zpl_fsync+0x3d/0xa0 [zfs]
[47851.367483]  vfs_fsync_range+0x49/0x80
[47851.367486]  ? __fget_light+0x57/0x70
[47851.367489]  do_fsync+0x3d/0x70
[47851.367491]  __x64_sys_fsync+0x14/0x20
[47851.367495]  do_syscall_64+0x5a/0x130
[47851.367498]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367501] RIP: 0033:0x7f872e0af237
[47851.367508] Code: Bad RIP value.
[47851.367510] RSP: 002b:00007f8729766cb0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[47851.367512] RAX: ffffffffffffffda RBX: 0000000000000014 RCX: 00007f872e0af237
[47851.367514] RDX: 0000000000000000 RSI: 00007f872dcb762c RDI: 0000000000000014
[47851.367515] RBP: 00007f872dcb8da0 R08: 00007f8729767700 R09: 00007f8729767700
[47851.367516] R10: 000000000000062c R11: 0000000000000293 R12: 0000000000000002
[47851.367518] R13: 0000000000000000 R14: 00005647819af200 R15: 00007ffce0398f10
[47851.367618] INFO: task pip:19670 blocked for more than 483 seconds.
[47851.367660]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47851.367704] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367753] pip             D    0 19670  19606 0x00000120
[47851.367756] Call Trace:
[47851.367760]  __schedule+0x2b9/0x6c0
[47851.367764]  ? bit_wait_io+0x50/0x50
[47851.367766]  schedule+0x42/0xb0
[47851.367769]  bit_wait+0x11/0x50
[47851.367771]  __wait_on_bit+0x33/0xa0
[47851.367775]  __inode_wait_for_writeback+0xae/0xe0
[47851.367779]  ? var_wake_function+0x30/0x30
[47851.367782]  inode_wait_for_writeback+0x27/0x40
[47851.367785]  evict+0xbd/0x1b0
[47851.367787]  iput+0x141/0x1f0
[47851.367790]  do_unlinkat+0x1c5/0x2d0
[47851.367793]  __x64_sys_unlinkat+0x38/0x60
[47851.367795]  do_syscall_64+0x5a/0x130
[47851.367799]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367801] RIP: 0033:0x7f7a4ac39d77
[47851.367804] Code: Bad RIP value.
[47851.367806] RSP: 002b:00007fff4c816858 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[47851.367808] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f7a4ac39d77
[47851.367809] RDX: 0000000000000000 RSI: 00007f7a44c4bda0 RDI: 0000000000000005
[47851.367811] RBP: 0000555a9b59b240 R08: 0000000000000000 R09: 00007fff4c816510
[47851.367812] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7a4aad01b0
[47851.367814] R13: 00007f7a4aac8e08 R14: 0000555a9a878240 R15: 0000555a9b9eb460
[47851.367817] INFO: task sync:20009 blocked for more than 483 seconds.
[47851.367859]       Tainted: P           O      5.3.0-13-generic #14-Ubuntu
[47851.367902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47851.367952] sync            D    0 20009  20008 0x00004120
[47851.367954] Call Trace:
[47851.367958]  __schedule+0x2b9/0x6c0
[47851.367961]  schedule+0x42/0xb0
[47851.367963]  wb_wait_for_completion+0x64/0x90
[47851.367966]  ? wait_woken+0x80/0x80
[47851.367968]  sync_inodes_sb+0xd7/0x290
[47851.367972]  sync_inodes_one_sb+0x15/0x20
[47851.367975]  iterate_supers+0xa3/0x110
[47851.367977]  ? page_cache_pipe_buf_steal.cold+0x1c/0x1c
[47851.367980]  ksys_sync+0x42/0xb0
[47851.367983]  __ia32_sys_sync+0xe/0x20
[47851.367985]  do_syscall_64+0x5a/0x130
[47851.367989]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[47851.367991] RIP: 0033:0x7f56f6bd7287
[47851.367994] Code: Bad RIP value.
[47851.367995] RSP: 002b:00007fff2c486078 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[47851.367998] RAX: ffffffffffffffda RBX: 00007fff2c4861a8 RCX: 00007f56f6bd7287
[47851.367999] RDX: 00007f56f6eae001 RSI: 0000000000000000 RDI: 00007f56f6c754b0
[47851.368000] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[47851.368002] R10: 000055d597318010 R11: 0000000000000246 R12: 0000000000000001
[47851.368003] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

I can grab any other debugging information that would be helpful, just let me know what commands to run to grab it.

EDIT: Oh, and I suspect it may have SOMETHING to do with one particular LXC container. The container is stored in a ZFS dataset. Inside the container, the only thing running is Nextcloud, installed from Snapcraft.

@devZer0
Copy link

devZer0 commented Oct 9, 2019

check smart info for all disks if some of those is about to fail, also watch out for scsi/disk errors in dmesg, maybe this is an error occuring after some disk hiccup. also make sure there is enough free space on both pools

@ReimuHakurei
Copy link
Author

Both pools are nowhere near full. SMART data on all disks is fine. If the problem was caused by the disks, I would expect to see high iowait. However, there is no iowait.

@alek-p
Copy link
Contributor

alek-p commented Oct 9, 2019

I would suggest trying v0.8.2 since there was a kworker/rollback deadlock fixed in that release.
Your posted stacks don't confirm its the same issue but it sounds like it could be from the description.
See #9203 and related issues for details on the deadlock.

@ReimuHakurei
Copy link
Author

Alright, I'll give that a try. I see #7038 was referenced there, and it looks to me like this may be the same issue as that.

@ReimuHakurei
Copy link
Author

I've updated the system to 0.8.2. I think I've pinpointed the thing that's triggering the problem (snapfuse in an LXC container spinning, doing lots of I/O, sucking up an entire core) since that seems to have popped up around the same time, but I'll forgo fixing that for a few days to see if I can confirm if 0.8.2 fixes the problem.

@ReimuHakurei
Copy link
Author

Alright. Prior to updating to 0.8.2, it took 12-24 hours for this issue to crop up, and it's been 2 days with no sign of it, so I'm inclined to believe that this is now fixed.

Thanks!

@Ninpo
Copy link

Ninpo commented Oct 13, 2019

I've run into this with 0.8.2.

@ReimuHakurei
Copy link
Author

Actually, yep, looks like last night it occurred again. Based on my sample size of 1, 0.8.2 seems to have improved it, since it took 4 days to occur rather than less than one day, but it has occurred again.

The system is currently still responsive. Before it completely locks up and forces me to reboot, and before I just disable the VM that I am fairly sure is causing the problem, is there any debugging info I can gather to assist in determining the root cause of the problem?

@ReimuHakurei ReimuHakurei reopened this Oct 13, 2019
@Ninpo
Copy link

Ninpo commented Oct 13, 2019

Perhaps attach zpool status and zfs get all output?

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:01:18 with 0 errors on Fri Oct 11 01:09:10 2019
config:

        NAME           STATE     READ WRITE CKSUM
        rpool          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme0n1p4  ONLINE       0     0     0
            nvme1n1p4  ONLINE       0     0     0

errors: No known data errors

lz4 compression is enabled, defaults otherwise.

@Ninpo
Copy link

Ninpo commented Oct 14, 2019

This has just happened again:

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                 
120819 root      20   0       0      0      0 R 100.0   0.0   9:46.79 kworker/u256:2+flush-zfs-1

While attempting to compile libcap.

[188621.756418] INFO: task emerge:9987 blocked for more than 614 seconds.
[188621.756419]       Tainted: P           OE     5.2.16-gentoo #4
[188621.756419] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[188621.756420] emerge          D    0  9987      1 0x00004004
[188621.756421] Call Trace:
[188621.756423]  ? __schedule+0x250/0x660
[188621.756426]  ? trace_hardirqs_on+0x31/0xd0
[188621.756427]  schedule+0x39/0xa0
[188621.756428]  wb_wait_for_completion+0x5e/0x90
[188621.756430]  ? wait_woken+0x70/0x70
[188621.756431]  sync_inodes_sb+0xd8/0x280
[188621.756433]  sync_filesystem+0x5e/0x90
[188621.756434]  __se_sys_syncfs+0x38/0x70
[188621.756435]  do_syscall_64+0x5a/0x130
[188621.756436]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[188621.756437] RIP: 0033:0x7f6f9f5e1d97
[188621.756439] Code: Bad RIP value.
[188621.756439] RSP: 002b:00007ffd7c463e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000132
[188621.756440] RAX: ffffffffffffffda RBX: 00007ffd7c463e40 RCX: 00007f6f9f5e1d97
[188621.756441] RDX: 00007f6f9f4b0b80 RSI: 0000000073ffa500 RDI: 000000000000000d
[188621.756442] RBP: 00007ffd7c463e40 R08: 00007f6f9eb73cc8 R09: 000055643a62ef68
[188621.756442] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd7c463f88
[188621.756443] R13: 0000000000000000 R14: 00007ffd7c463ef0 R15: 00007ffd7c463f68

I'm tempted to go back to using git master, when I was on zfs-9999 just prior to 0.8.2 release I didn't see this (was using git build as 0.8.x didn't build against 5.2.x kernel at the time) and I did note that when I switched to stable 0.8.2, I had unsupported "future" features on my pool (I rebuilt it so I had 0.8.2 clean).

@ReimuHakurei
Copy link
Author

Might it be worth comparing some system configurations?

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:02:31 with 0 errors on Mon Oct  7 22:35:32 2019
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          sda1      ONLINE       0     0     0

errors: No known data errors

  pool: storage
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:55:30 with 0 errors on Mon Oct  7 23:35:51 2019
config:

        NAME                        STATE     READ WRITE CKSUM
        storage                     ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            scsi-35000cca24300d8f4  ONLINE       0     0     0
            scsi-35000cca243014464  ONLINE       0     0     0
          mirror-1                  ONLINE       0     0     0
            scsi-35000cca243015924  ONLINE       0     0     0
            scsi-35000cca24400a9a4  ONLINE       0     0     0

errors: No known data errors

The disks are connected to a generic LSI SAS2 HBA. The rpool is a single Crucial MX500 SSD connected to SAS port 0 on the HBA. The entire storage pool is encrypted. The disks in the storage pool are Hitachi Ultrastar 7K4000 SAS HDDs. They are connected to ports 4-7.

The system is a Dell PowerEdge R415, with 2x AMD Opteron(tm) Processor 4365 EE processors, and 80 GB of RAM (4x4GB + 4x16GB).

I've narrowed down what I believe to be the cause (a broken snap package inside of an LXC container; container is on storage pool. snapfuse is sucking up an entire core and reading from disk at a constant roughly 6 MB/sec), but I have avoided fixing the issue to both verify the issue was fixed (it's not), and to assist in diagnosing the root cause.

The next time this system completely locks up and I have to hard reboot it, I'm fixing the problem that I believe is triggering it, so if there are any commands I can run or logs I can fetch that would assist the ZFS developers in determining the root cause, I need to know what those are sooner, rather than later.

@Ninpo
Copy link

Ninpo commented Oct 15, 2019

What brought you to the snap package? I ask as I have one on my entire system that as it happens, is active when the issue occurs. At this point I'm sure it's coincidence, buuut...

$ zpool status
  pool: misc
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:01:06 with 0 errors on Fri Oct 11 01:09:02 2019
config:

        NAME                                                                 STATE     READ WRITE CKSUM
        misc                                                                 ONLINE       0     0     0
          raidz1-0                                                           ONLINE       0     0     0
            ata-WDC_WD1001FALS-00J7B1_WD-WMATV2740378                        ONLINE       0     0     0
            ata-WDC_WD1001FALS-00J7B1_WD-WMATV2432116                        ONLINE       0     0     0
            ata-WDC_WD1001FALS-00J7B1_WD-WMATV2432663                        ONLINE       0     0     0
        logs
          mirror-1                                                           ONLINE       0     0     0
            ata-LITEONIT_LCS-256M6S_2.5_7mm_256GB-part1TW0XFJWX5508539D1600  ONLINE       0     0     0
            ata-KINGSTON_SH103S3240G_50026B72390010E4-part1                  ONLINE       0     0     0
        cache
          ata-LITEONIT_LCS-256M6S_2.5_7mm_256GB-part2TW0XFJWX5508539D1600    ONLINE       0     0     0
          ata-KINGSTON_SH103S3240G_50026B72390010E4-part2                    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:01:18 with 0 errors on Fri Oct 11 01:09:10 2019
config:

        NAME           STATE     READ WRITE CKSUM
        rpool          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p4  ONLINE       0     0     0
            nvme0n1p4  ONLINE       0     0     0

errors: No known data errors

System is a Threadripper 1950X, 64GB RAM, ASRock mobo. Samsung NVME disks in rpool, ten year old WD Black and a couple of SSDs in misc. Kernel 5.2.16.

It's rpool that's active and hangs when the issue occurs.

@ReimuHakurei
Copy link
Author

snapfuse associated with that snap package (Nextcloud) has been sucking up an entire core and doing the constant I/O since an auto-update of the snap package in question occurred. The date of this auto update lines up very closely to the date the issue started appearing.

@devZer0
Copy link

devZer0 commented Oct 15, 2019

could you describe what snapfuse is/does? there seems not too much information around on the net... i don't find a manpage or a git repo...

@ReimuHakurei
Copy link
Author

snap packages are distributed as squashfs images. Normally, those are just mounted using the native kernel support as loopback devices, but in my case, I have snap running inside of an LXC container. You can't use loop devices inside of LXC. As a workaround for this, snap comes with snapfuse, which is an implementation of squashfs in userspace using FUSE, which is supported inside of LXC containers.

@Ninpo
Copy link

Ninpo commented Oct 15, 2019

Hmm, fuse you say...

franz-5.3.3-x86_64.AppImage on /tmp/.mount_franz-gpQL6L type fuse.franz-5.3.3-x86_64.AppImage (ro,nosuid,nodev,relatime,user_id=1000,group_id=1000)

@Ninpo Ninpo mentioned this issue Oct 15, 2019
@ReimuHakurei
Copy link
Author

ReimuHakurei commented Oct 15, 2019

It looks to me like the only things actually hanging here are sync commands. Is it the same for you? I think this is the cause of my load averages spiking like they do.

# ps axu | grep " D " 
165536     302  0.0  0.0   6172   732 ?        D    02:00   0:00 sync
165536     307  0.0  0.0   6172   736 ?        D    10:15   0:00 sync
165536     647  0.0  0.0   6172   736 ?        D    Oct14   0:00 sync
165536     745  0.0  0.0   6172   628 ?        D    00:20   0:00 sync
165536     760  0.0  0.0   6172   736 ?        D    Oct14   0:00 sync
[snip]
165536   32052  0.0  0.0   6172   736 ?        D    Oct14   0:00 sync
165536   32084  0.0  0.0   6172   624 ?        D    Oct13   0:00 sync
165536   32104  0.0  0.0   6172   732 ?        D    Oct13   0:00 sync
165536   32273  0.0  0.0   6172   732 ?        D    Oct14   0:00 sync
165536   32307  0.0  0.0   6172   624 ?        D    03:35   0:00 sync
165536   32320  0.0  0.0   6172   728 ?        D    Oct13   0:00 sync
165536   32337  0.0  0.0   6172   628 ?        D    Oct14   0:00 sync
165536   32364  0.0  0.0   6172   628 ?        D    Oct13   0:00 sync
165536   32365  0.0  0.0   6172   732 ?        D    05:20   0:00 sync
165536   32435  0.0  0.0   6172   732 ?        D    Oct14   0:00 sync
165536   32474  0.0  0.0   6172   628 ?        D    Oct14   0:00 sync
165536   32477  0.0  0.0   6172   628 ?        D    Oct13   0:00 sync
165536   32603  0.0  0.0   6172   628 ?        D    Oct14   0:00 sync
165536   32647  0.0  0.0   6172   624 ?        D    08:30   0:00 sync
165536   32710  0.0  0.0   6172   628 ?        D    05:10   0:00 sync

@ReimuHakurei
Copy link
Author

I'm going to replace sync with a copy of true on the container in question to see if this fixes my load average issues and makes this more manageable.

@Ninpo
Copy link

Ninpo commented Oct 15, 2019

I'm unsure, I only saw emerge and gzip "hang" last time but forgot to grab a full process list before rebooting (needed system back ASAP). When it happens again I'll check.

Does it matter re: sync if it's just replaced with true or something when it comes to ZFS?

@ReimuHakurei
Copy link
Author

I'm not sure. It's certainly not optimal but I would assume it can't really get any worse than the process just hanging in uninterruptible sleep.

@devZer0
Copy link

devZer0 commented Oct 16, 2019

Hmm, fuse you say...

franz-5.3.3-x86_64.AppImage on /tmp/.mount_franz-gpQL6L type fuse.franz-5.3.3-x86_64.AppImage (ro,nosuid,nodev,relatime,user_id=1000,group_id=1000)

mind that franz has severe memory leaks

@devZer0
Copy link

devZer0 commented Oct 16, 2019

The system load averages will climb constantly

i think this is normal as with every hanging sync there is one more process waiting for I/O and every process in the run-queue waiting for completion will add up to the system load (it does not need to burn cpu). you can easily demonstrate this with stale nfs mounts

@devZer0
Copy link

devZer0 commented Oct 16, 2019

instead of rebooting please try stopping all lxc stuff and try to unmount all fuse-mounts, if that doesn't work normally , try -f (force)

then have a look if the sync processes go away or try to kill those

maybe the sync is hanging because of fuse...

@ReimuHakurei
Copy link
Author

When sync hangs in D state inside the container, you cannot stop the LXC container. Any attempt to do so will just freeze.

Anyway, I will say, 0.8.2 has certainly at the very least made this problem slightly better. On 0.8.1 it was consistently occurring on a daily basis. 0.8.2 it went about 4 days the first time, and I'm at almost 5 days since then now and it's still working fine. I'll update the issue when it next occurs.

@devZer0
Copy link

devZer0 commented Oct 22, 2019

@ReimuHakurei
Copy link
Author

Nope.

By definition, when a task is in D state (uninterruptible sleep), absolutely nothing (short of a hard reboot) will end it. Because restarting an LXC container requires killing all processes, and you cannot kill a process in D state, you cannot restart an LXC container with any child processes in D state.

@Ninpo
Copy link

Ninpo commented Jan 10, 2020

This has hit me again.

Anyone experiencing this on kernels <5.0.0?

@ukd1
Copy link

ukd1 commented Feb 1, 2020

I'm experiencing this on Linux storagelols 5.0.0-16-generic #17~18.04.1-Ubuntu SMP, but no real way to debug.

@alexanderhaensch
Copy link

We were hit by the problem on 4.19
looks like the problem accumulates deferred threads after some time.
some threads can still continue

relevant thread shows after sysrq:

[6756902.750815] CPU: 12 PID: 37541 Comm: kworker/u49:4 Tainted: P W O 4.19.86-gentoo #1
[6756902.750816] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
[6756902.750817] Workqueue: writeback wb_workfn (flush-zfs-30)
[6756902.750818] RIP: 0010:dbuf_verify+0x158/0x990 [zfs]
[6756902.750819] Code: 4d 85 e4 74 15 49 3b 5c 24 20 0f 85 86 01 00 00 4d 8b 64 24 28 4d 85 e4 75 eb 80 7b 50 00 75 0a 48 83 3b 00 0f 84 36 02 00 00 <4c> 8b 43 48 4d 85 c0 74 6f 48 8b 43 30 49 3b 45 68 0f 84 0e 03 00
[6756902.750819] RSP: 0018:ffffb11b55b57728 EFLAGS: 00000246
[6756902.750820] RAX: 0000000000000000 RBX: ffff9f3d298d93f8 RCX: 0000000000000000
[6756902.750821] RDX: 0000000000000001 RSI: ffffffffc0c688f8 RDI: ffff9f3aa7ea6448
[6756902.750821] RBP: ffffb11b55b57750 R08: 0000000001bec000 R09: 0000000001bec000
[6756902.750822] R10: ffff9f09229a86c0 R11: 00000000000011ce R12: 0000000000000000
[6756902.750822] R13: ffff9f3aa7ea5fa0 R14: 00000000000006fb R15: ffff9f3aa7ea5fa0
[6756902.750823] FS: 0000000000000000(0000) GS:ffff9f217f980000(0000) knlGS:0000000000000000
[6756902.750823] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[6756902.750824] CR2: 00007fc31495601e CR3: 0000003e7240c005 CR4: 00000000001626e0
[6756902.750824] Call Trace:
[6756902.750824] dbuf_hold_impl_arg+0x12e/0xb70 [zfs]
[6756902.750825] ? spl_kmem_alloc+0xf0/0x170 [spl]
[6756902.750825] dbuf_hold_impl+0x95/0xc0 [zfs]
[6756902.750826] dbuf_hold+0x33/0x60 [zfs]
[6756902.750826] dnode_hold_impl+0x17a/0x10c0 [zfs]
[6756902.750827] ? dbuf_rele_and_unlock+0x336/0x800 [zfs]
[6756902.750827] ? zfs_refcount_remove_many+0x215/0x290 [zfs]
[6756902.750827] ? zfs_refcount_destroy_many+0x159/0x200 [zfs]
[6756902.750828] dnode_hold+0x1b/0x20 [zfs]
[6756902.750828] dmu_bonus_hold+0x36/0x90 [zfs]
[6756902.750829] sa_buf_hold+0xe/0x10 [zfs]
[6756902.750829] zfs_zget+0x10d/0x310 [zfs]
[6756902.750829] zfs_get_data+0x7e/0x520 [zfs]
[6756902.750830] zil_commit_impl+0x1235/0x1c90 [zfs]
[6756902.750830] zil_commit+0x59/0x160 [zfs]
[6756902.750830] zpl_writepages+0xec/0x180 [zfs]
[6756902.750831] do_writepages+0x41/0xd0
[6756902.750831] ? tracing_record_taskinfo_skip+0x3f/0x50
[6756902.750832] ? tracing_record_taskinfo+0x13/0xb0
[6756902.750833] ? fprop_fraction_percpu+0x2f/0x80
[6756902.750834] __writeback_single_inode+0x4d/0x390
[6756902.750834] writeback_sb_inodes+0x23c/0x570
[6756902.750835] wb_writeback+0xff/0x320
[6756902.750835] wb_workfn+0xf3/0x450
[6756902.750836] ? _raw_spin_unlock_irq+0x13/0x20
[6756902.750836] ? finish_task_switch+0x83/0x290
[6756902.750836] process_one_work+0x18a/0x3a0
[6756902.750837] worker_thread+0x50/0x3b0
[6756902.750837] kthread+0x106/0x140
[6756902.750837] ? process_one_work+0x3a0/0x3a0
[6756902.750838] ? kthread_park+0x80/0x80
[6756902.750838] ret_from_fork+0x24/0x50

@alexanderhaensch
Copy link

Screenshot_2020-05-07 Node - ZFS - Grafana

Looks like the kworker thread is spinning and reading metadata, while not committing to ZIL. ZIL commit count is slightly reduced since the event at 5/3 2:20 A.M. After a while more and more programs hang behind this kworker thread as one can see in the load graph.

The gap in the graphs is not correlated with this particular system. There was a reconfiguration of the monitoring database.

@InsanePrawn
Copy link
Contributor

Mhm, maybe I should've posted here instead of #11754 (comment)

Did everyone else see this resolved somehow? @alexanderhaensch your reported setup was very similar to mine.

@WhittlesJr
Copy link

WhittlesJr commented Mar 31, 2022

Encountered this today. Very similar traceback as reported above.

> zfs --version
zfs-2.1.4-1
zfs-kmod-2.1.4-1

> cat /proc/version 
Linux version 5.10.108 (nixbld@localhost) (gcc (GCC) 10.3.0, GNU ld (GNU Binutils) 2.35.2) #1-NixOS SMP Wed Mar 23 08:13:30 UTC 2022

> cat /etc/issue 
<<< Welcome to NixOS 21.11.336755.efea022d6fe (\m) - \l >>>

log: https://gist.github.com/WhittlesJr/e15487bf5b2a9835249c655ca424faf0

top:

USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                     
root      20   0       0      0      0 R 100.0   0.0 162:02.59 kworker/u64:0+flush-zfs-2                                   

@stale
Copy link

stale bot commented Apr 2, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Apr 2, 2023
@stale stale bot closed this as completed Aug 12, 2023
@devZer0
Copy link

devZer0 commented Aug 14, 2023

this does not look like being resolved. please reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue
Projects
None yet
Development

No branches or pull requests

8 participants