-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
task z_zvol blocked for more than 120 seconds. #6890
Comments
Seems very similar to #6888.... newer kernel though. |
usb is slow, this might be normal |
even USB3.0? I don't know, this is my first encounter with a USB3.0 HDD dock... I almost feel this is close to a duplicate of #6330. Wonder why it didn't show up in my searches, feeling a bit ashamed now. Anyway, I did try the same with a slightly older kernel 4.13.12 (or 13, not sure) and got OOPSes instead of timeouts but back than I just assumed it was a faulty drive and/or dock so I don't have any useful logs from back then. |
Granted that usb 3.0 is good for 5 Gbit/s (625 MB/s) when plugged into a usb 3 port (the usb ports with the blue connector), but its also up to the drive and the controller as to whether it can actually run that fast. I haven't seen this myself, but I don't use my externals very often. You might want to see how fast the drive can actually run at with something like dd. NVME is really fast and it might just be too fast to keep up with the recv end. ps: don't feel bad, only one way to find out. 😄 |
So this is still a thing - slow destinations definitely make it worse. Im replacing a 2T 5400 2.5" with a 4T of the same kind, and getting this gem just during znapzend one-shot ops:
Shot in the dark, but, how "aware" are zpools of the underlying performance characteristics of their vdevs? Moreover, how "aware" are they when these vdevs are actually dm-crypt volumes? "Aware", in this case, referring to "the amount of burst and sustained IOPs delineated by characteristics such as reads, seeks, writes, and stratified by the number of bytes for each." @behlendorf: Any chance you and Santa have zvol stability (and performance, if we've been good) in a burlap sack? Or at least the elf who knows how to get there... ;-) |
The good news is @klkblake provided us a nice reproducer in
This is the kind of information which ZFS has but only takes maximum advantage of in a few specific circumstances, for example reads to a mirror vdev. If there are individual devices which are much slowed they can drag down the performance of the entire pool.
So both #6989 and #6926 (merged to master) have the potential to to improve zvol performance. And for some reason which isn't yet clear disabling the dynamic taskqs has been reported to improve stability and may help with performance. |
During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6330 Closes #6890 Closes #7343
During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#6330 Closes openzfs#6890 Closes openzfs#7343
During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#6330 Closes openzfs#6890 Closes openzfs#7343
During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6330 Closes #6890 Closes #7343
System information
Describe the problem you're observing
ZFS hangs on zfs send | receive combo on local machine from LUKS-encrypted nvme partition (tank) to LUKS-encrypted SATA HDD connected via USB3.0 dock (dozer). The HDD is possibly faulty, however badblocks performed before the send|receive combo did not yield any problems.
Describe how to reproduce the problem
zpool tank has about 300 snapshots of 10 (out of 18) datasets made by zfs-auto-snapshot.
Include any warning/errors/backtraces from the system logs
dmesg:
zdb (anonymised):
Happy to provide any additional info, just tell me what you want to know.
The text was updated successfully, but these errors were encountered: