task z_zvol blocked for more than 120 seconds. #6890

jficz · 2017-11-21T00:33:55Z

System information

Type	Version/Name
Distribution Name	Arch Linux
Distribution Version	latest
Linux Kernel	linux-ck 4.13.14.1-ck
Architecture	64
ZFS Version	0.7.3-1 (AUR, dkms)
SPL Version	0.7.3-1 (AUR, dkms)

Describe the problem you're observing

ZFS hangs on zfs send | receive combo on local machine from LUKS-encrypted nvme partition (tank) to LUKS-encrypted SATA HDD connected via USB3.0 dock (dozer). The HDD is possibly faulty, however badblocks performed before the send|receive combo did not yield any problems.

Describe how to reproduce the problem

zpool tank has about 300 snapshots of 10 (out of 18) datasets made by zfs-auto-snapshot.

zfs send -RLc tank@send | zfs receive -Fusvd dozer

Include any warning/errors/backtraces from the system logs

dmesg:

[  204.780233] usb 2-4: new SuperSpeed USB device number 2 using xhci_hcd
[  204.802181] usb-storage 2-4:1.0: USB Mass Storage device detected
[  204.802222] usb-storage 2-4:1.0: Quirks match for vid 174c pid 55aa: 400000
[  204.802236] scsi host3: usb-storage 2-4:1.0
[  205.830501] scsi 3:0:0:0: Direct-Access     ASMedia  ASM1153          0    PQ: 0 ANSI: 6
[  205.881441] sd 3:0:0:0: [sdb] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[  205.881675] sd 3:0:0:0: [sdb] Write Protect is off
[  205.881677] sd 3:0:0:0: [sdb] Mode Sense: 43 00 00 00
[  205.881876] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  205.930655]  sdb: sdb1
[  205.931489] sd 3:0:0:0: [sdb] Attached SCSI disk
[  858.070057] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[  858.070061]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[  858.070062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.070063] z_zvol          D    0 17045      2 0x00000000
[  858.070065] Call Trace:
[  858.070070]  __schedule+0x6eb/0xcd0
[  858.070072]  schedule+0x3d/0xd0
[  858.070076]  taskq_wait_outstanding+0x87/0xd0 [spl]
[  858.070078]  ? wait_woken+0x80/0x80
[  858.070094]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[  858.070097]  taskq_thread+0x254/0x480 [spl]
[  858.070100]  ? wake_up_q+0x80/0x80
[  858.070102]  kthread+0x124/0x140
[  858.070104]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[  858.070106]  ? kthread_create_on_node+0x70/0x70
[  858.070108]  ret_from_fork+0x25/0x30
[  980.960061] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[  980.960066]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[  980.960067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  980.960068] z_zvol          D    0 17045      2 0x00000000
[  980.960070] Call Trace:
[  980.960076]  __schedule+0x6eb/0xcd0
[  980.960079]  schedule+0x3d/0xd0
[  980.960083]  taskq_wait_outstanding+0x87/0xd0 [spl]
[  980.960086]  ? wait_woken+0x80/0x80
[  980.960102]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[  980.960105]  taskq_thread+0x254/0x480 [spl]
[  980.960108]  ? wake_up_q+0x80/0x80
[  980.960111]  kthread+0x124/0x140
[  980.960114]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[  980.960115]  ? kthread_create_on_node+0x70/0x70
[  980.960118]  ret_from_fork+0x25/0x30
[ 1103.830057] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1103.830061]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1103.830062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1103.830063] z_zvol          D    0 17045      2 0x00000000
[ 1103.830066] Call Trace:
[ 1103.830071]  __schedule+0x6eb/0xcd0
[ 1103.830073]  schedule+0x3d/0xd0
[ 1103.830077]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1103.830080]  ? wait_woken+0x80/0x80
[ 1103.830095]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1103.830099]  taskq_thread+0x254/0x480 [spl]
[ 1103.830102]  ? wake_up_q+0x80/0x80
[ 1103.830104]  kthread+0x124/0x140
[ 1103.830106]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1103.830108]  ? kthread_create_on_node+0x70/0x70
[ 1103.830110]  ret_from_fork+0x25/0x30
[ 1226.710103] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1226.710106]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1226.710107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1226.710108] z_zvol          D    0 17045      2 0x00000000
[ 1226.710110] Call Trace:
[ 1226.710115]  __schedule+0x6eb/0xcd0
[ 1226.710117]  schedule+0x3d/0xd0
[ 1226.710121]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1226.710123]  ? wait_woken+0x80/0x80
[ 1226.710138]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1226.710141]  taskq_thread+0x254/0x480 [spl]
[ 1226.710143]  ? wake_up_q+0x80/0x80
[ 1226.710145]  kthread+0x124/0x140
[ 1226.710147]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1226.710149]  ? kthread_create_on_node+0x70/0x70
[ 1226.710151]  ret_from_fork+0x25/0x30
[ 1349.590140] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1349.590144]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1349.590145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1349.590146] z_zvol          D    0 17045      2 0x00000000
[ 1349.590148] Call Trace:
[ 1349.590153]  __schedule+0x6eb/0xcd0
[ 1349.590155]  schedule+0x3d/0xd0
[ 1349.590159]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1349.590162]  ? wait_woken+0x80/0x80
[ 1349.590178]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1349.590180]  taskq_thread+0x254/0x480 [spl]
[ 1349.590183]  ? wake_up_q+0x80/0x80
[ 1349.590185]  kthread+0x124/0x140
[ 1349.590187]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1349.590189]  ? kthread_create_on_node+0x70/0x70
[ 1349.590191]  ret_from_fork+0x25/0x30
[ 1436.501551] zfs[27076]: segfault at 10 ip 00007f0bd40474b1 sp 00007fff83d42008 error 4 in libc-2.26.so[7f0bd3ef0000+1ae000]
[ 1472.470050] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1472.470053]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1472.470054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1472.470056] z_zvol          D    0 17045      2 0x00000000
[ 1472.470058] Call Trace:
[ 1472.470063]  __schedule+0x6eb/0xcd0
[ 1472.470065]  schedule+0x3d/0xd0
[ 1472.470069]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1472.470071]  ? wait_woken+0x80/0x80
[ 1472.470087]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1472.470090]  taskq_thread+0x254/0x480 [spl]
[ 1472.470092]  ? wake_up_q+0x80/0x80
[ 1472.470095]  kthread+0x124/0x140
[ 1472.470097]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1472.470098]  ? kthread_create_on_node+0x70/0x70
[ 1472.470100]  ret_from_fork+0x25/0x30
[ 1595.350031] INFO: task z_zvol:389 blocked for more than 120 seconds.
[ 1595.350034]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1595.350035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1595.350036] z_zvol          D    0   389      2 0x00000000
[ 1595.350039] Call Trace:
[ 1595.350044]  __schedule+0x6eb/0xcd0
[ 1595.350046]  schedule+0x3d/0xd0
[ 1595.350050]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1595.350052]  ? wait_woken+0x80/0x80
[ 1595.350068]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1595.350071]  taskq_thread+0x254/0x480 [spl]
[ 1595.350073]  ? wake_up_q+0x80/0x80
[ 1595.350076]  kthread+0x124/0x140
[ 1595.350078]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1595.350079]  ? kthread_create_on_node+0x70/0x70
[ 1595.350081]  ? kthread_create_on_node+0x70/0x70
[ 1595.350082]  ret_from_fork+0x25/0x30
[ 1595.350106] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1595.350107]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1595.350108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1595.350109] z_zvol          D    0 17045      2 0x00000000
[ 1595.350110] Call Trace:
[ 1595.350112]  __schedule+0x6eb/0xcd0
[ 1595.350113]  schedule+0x3d/0xd0
[ 1595.350116]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1595.350117]  ? wait_woken+0x80/0x80
[ 1595.350130]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1595.350133]  taskq_thread+0x254/0x480 [spl]
[ 1595.350135]  ? wake_up_q+0x80/0x80
[ 1595.350137]  kthread+0x124/0x140
[ 1595.350139]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1595.350140]  ? kthread_create_on_node+0x70/0x70
[ 1595.350142]  ret_from_fork+0x25/0x30
[ 1718.230091] INFO: task z_zvol:389 blocked for more than 120 seconds.
[ 1718.230094]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1718.230095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1718.230097] z_zvol          D    0   389      2 0x00000000
[ 1718.230099] Call Trace:
[ 1718.230103]  __schedule+0x6eb/0xcd0
[ 1718.230105]  schedule+0x3d/0xd0
[ 1718.230109]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1718.230112]  ? wait_woken+0x80/0x80
[ 1718.230127]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1718.230130]  taskq_thread+0x254/0x480 [spl]
[ 1718.230133]  ? wake_up_q+0x80/0x80
[ 1718.230135]  kthread+0x124/0x140
[ 1718.230137]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1718.230139]  ? kthread_create_on_node+0x70/0x70
[ 1718.230140]  ? kthread_create_on_node+0x70/0x70
[ 1718.230142]  ret_from_fork+0x25/0x30
[ 1718.230164] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1718.230166]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1718.230167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1718.230167] z_zvol          D    0 17045      2 0x00000000
[ 1718.230169] Call Trace:
[ 1718.230170]  __schedule+0x6eb/0xcd0
[ 1718.230172]  schedule+0x3d/0xd0
[ 1718.230175]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1718.230176]  ? wait_woken+0x80/0x80
[ 1718.230190]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1718.230192]  taskq_thread+0x254/0x480 [spl]
[ 1718.230195]  ? wake_up_q+0x80/0x80
[ 1718.230196]  kthread+0x124/0x140
[ 1718.230198]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1718.230200]  ? kthread_create_on_node+0x70/0x70
[ 1718.230201]  ret_from_fork+0x25/0x30
[ 2342.854530] zfs[9384]: segfault at 10 ip 00007f43a3ebb4b1 sp 00007ffd2f425268 error 4 in libc-2.26.so[7f43a3d64000+1ae000]
[ 2343.138402] zfs[9544]: segfault at 10 ip 00007f505f1434b1 sp 00007ffdcb1d1e48 error 4 in libc-2.26.so[7f505efec000+1ae000]
[ 3239.855885] zfs[14132]: segfault at 10 ip 00007f75c5b7b4b1 sp 00007ffec4e010d8 error 4 in libc-2.26.so[7f75c5a24000+1ae000]

zdb (anonymised):

root@g ~ # zdb 
dozer:
    version: 5000
    name: 'dozer'
    state: 0
    txg: 4
    pool_guid: <number>
    errata: 0
    hostid: <number>
    hostname: 'g'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: <number>
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: <number>
            path: '/dev/mapper/g'
            devid: 'dm-uuid-CRYPT-LUKS1-<uuid>-g'
            whole_disk: 0
            metaslab_array: 256
            metaslab_shift: 31
            ashift: 9
            asize: 320062095360
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 129
            com.delphix:vdev_zap_top: 130
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
tank:
    version: 5000
    name: 'tank'
    state: 0
    txg: 893048
    pool_guid: <number>
    errata: 0
    hostid: <number>
    hostname: 'glock'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: <number>
        children[0]:
            type: 'disk'
            id: 0
            guid: <number>
            path: '/dev/mapper/h'
            devid: 'dm-uuid-CRYPT-LUKS1-<uuid>-h'
            whole_disk: 0
            metaslab_array: 256
            metaslab_shift: 30
            ashift: 12
            asize: 254978818048
            is_log: 0
            DTL: 120
            create_txg: 4
            com.delphix:vdev_zap_leaf: 58
            com.delphix:vdev_zap_top: 59
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

Happy to provide any additional info, just tell me what you want to know.

The text was updated successfully, but these errors were encountered:

sempervictus · 2017-11-21T00:56:40Z

Seems very similar to #6888.... newer kernel though.
All the different stack traces seem to include the zvol_task_cb, taskq_thread, and some wait for the taskqs.

bunder2015 · 2017-11-22T20:17:28Z

usb is slow, this might be normal

jficz · 2017-11-22T21:10:25Z

even USB3.0? I don't know, this is my first encounter with a USB3.0 HDD dock...

I almost feel this is close to a duplicate of #6330. Wonder why it didn't show up in my searches, feeling a bit ashamed now.

Anyway, I did try the same with a slightly older kernel 4.13.12 (or 13, not sure) and got OOPSes instead of timeouts but back than I just assumed it was a faulty drive and/or dock so I don't have any useful logs from back then.

bunder2015 · 2017-11-22T21:23:49Z

Granted that usb 3.0 is good for 5 Gbit/s (625 MB/s) when plugged into a usb 3 port (the usb ports with the blue connector), but its also up to the drive and the controller as to whether it can actually run that fast. I haven't seen this myself, but I don't use my externals very often. You might want to see how fast the drive can actually run at with something like dd. NVME is really fast and it might just be too fast to keep up with the recv end. ps: don't feel bad, only one way to find out. 😄

sempervictus · 2017-12-21T20:40:41Z

So this is still a thing - slow destinations definitely make it worse. Im replacing a 2T 5400 2.5" with a 4T of the same kind, and getting this gem just during znapzend one-shot ops:

[Thu Dec 21 14:32:54 2017] INFO: task z_zvol:1370 blocked for more than 120 seconds.
[Thu Dec 21 14:32:54 2017]       Tainted: P           OE   4.9.68 #1
[Thu Dec 21 14:32:54 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Dec 21 14:32:54 2017] z_zvol          D    0  1370      2 0x00080000
[Thu Dec 21 14:32:54 2017]  ffffffff810a8b00 0000000000000000 ffff880831c16000 ffff88082961ee00
[Thu Dec 21 14:32:54 2017]  ffff88083f2d6b00 ffffc9000a4b7c48 ffffffff81834dcd 0000000000000282
[Thu Dec 21 14:32:54 2017]  ffff88083ac52c00 0000000000000000 ffff88082961ee00 000000000000017b
[Thu Dec 21 14:32:54 2017] Call Trace:
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a8b00>] ? switched_to_idle+0x20/0x20
[Thu Dec 21 14:32:54 2017]  [<ffffffff81834dcd>] ? __schedule+0x24d/0x6e0
[Thu Dec 21 14:32:54 2017]  [<ffffffff818352a4>] schedule+0x44/0x90
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe7edd>] taskq_wait_outstanding+0x8d/0xd0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810bcba0>] ? prepare_to_wait_event+0x110/0x110
[Thu Dec 21 14:32:54 2017]  [<ffffffffa13c1581>] zvol_task_cb+0x201/0x5a0 [zfs]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a8b00>] ? switched_to_idle+0x20/0x20
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe8ca1>] taskq_thread+0x2b1/0x4d0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a0310>] ? wake_up_q+0x90/0x90
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe89f0>] ? task_done+0xa0/0xa0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff81095e0d>] kthread+0xfd/0x120
[Thu Dec 21 14:32:54 2017]  [<ffffffff81095d10>] ? kthread_parkme+0x40/0x40
[Thu Dec 21 14:32:54 2017]  [<ffffffff8183a6e7>] ret_from_fork+0x37/0x50

Shot in the dark, but, how "aware" are zpools of the underlying performance characteristics of their vdevs? Moreover, how "aware" are they when these vdevs are actually dm-crypt volumes? "Aware", in this case, referring to "the amount of burst and sustained IOPs delineated by characteristics such as reads, seeks, writes, and stratified by the number of bytes for each."
Without such "awareness," i'm pretty sure that the zvol is getting some sort of access when its initialized by the recv operation, and whatever IOs are required during the mapping of a block device (partition reads, what have you), are not being serviced in time because the pool is striving to keep consuming the send stream, not knowing it doesnt actually have the capacity to dispatch those IOs.

@behlendorf: Any chance you and Santa have zvol stability (and performance, if we've been good) in a burlap sack? Or at least the elf who knows how to get there... ;-)

behlendorf · 2017-12-21T21:52:23Z

slow destinations definitely make it worse.

The good news is @klkblake provided us a nice reproducer in
#6330 (comment) which should make this easier to get to the root cause.

how "aware" are zpools of the underlying performance characteristics of their vdevs

This is the kind of information which ZFS has but only takes maximum advantage of in a few specific circumstances, for example reads to a mirror vdev. If there are individual devices which are much slowed they can drag down the performance of the entire pool.

have zvol stability (and performance)

So both #6989 and #6926 (merged to master) have the potential to to improve zvol performance. And for some reason which isn't yet clear disabling the dynamic taskqs has been reported to improve stability and may help with performance.

During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6330 Closes #6890 Closes #7343

During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#6330 Closes openzfs#6890 Closes openzfs#7343

During a receive operation zvol_create_minors_impl() can wait needlessly for the prefetch thread because both share the same tasks queue. This results in hung tasks: <3>INFO: task z_zvol:5541 blocked for more than 120 seconds. <3> Tainted: P O 3.16.0-4-amd64 <3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. The first z_zvol:5541 (zvol_task_cb) is waiting for the long running traverse_prefetch_thread:260 root@linux:~# cat /proc/spl/taskq taskq act nthr spwn maxt pri mina spl_system_taskq/0 1 2 0 64 100 1 active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40) wait: 5541 spl_delay_taskq/0 0 1 0 4 100 1 delay: spa_deadman [zfs](0xffff880039924000) z_zvol/1 1 1 0 1 120 1 active: [5541]zvol_task_cb [zfs](0xffff88001fde6400) pend: zvol_task_cb [zfs](0xffff88001fde6800) This change adds a dedicated, per-pool, prefetch taskq to prevent the traverse code from monopolizing the global (and limited) system_taskq by inappropriately scheduling long running tasks on it. Reviewed-by: Albert Lee <trisk@forkgnu.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes #6330 Closes #6890 Closes #7343

sempervictus mentioned this issue Nov 21, 2017

ZVOL ops hanging #6888

Closed

behlendorf added the Component: ZVOL ZFS Volumes label Dec 21, 2017

behlendorf added this to the 0.8.0 milestone Dec 21, 2017

loli10K mentioned this issue Mar 25, 2018

Fix hung z_zvol tasks during 'zfs receive' #7343

Merged

13 tasks

behlendorf closed this as completed in #7343 Mar 30, 2018

akschu mentioned this issue Aug 13, 2018

Investigate effects of Linux WBT on ZFS #7787

Closed

behlendorf mentioned this issue Oct 12, 2018

ZFS v0.7.2 call trace - zfs_range_lock issue - zpool hungs #6781

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task z_zvol blocked for more than 120 seconds. #6890

task z_zvol blocked for more than 120 seconds. #6890

jficz commented Nov 21, 2017

sempervictus commented Nov 21, 2017

bunder2015 commented Nov 22, 2017

jficz commented Nov 22, 2017

bunder2015 commented Nov 22, 2017 •

edited

Loading

sempervictus commented Dec 21, 2017

behlendorf commented Dec 21, 2017

task z_zvol blocked for more than 120 seconds. #6890

task z_zvol blocked for more than 120 seconds. #6890

Comments

jficz commented Nov 21, 2017

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

sempervictus commented Nov 21, 2017

bunder2015 commented Nov 22, 2017

jficz commented Nov 22, 2017

bunder2015 commented Nov 22, 2017 • edited Loading

sempervictus commented Dec 21, 2017

behlendorf commented Dec 21, 2017

bunder2015 commented Nov 22, 2017 •

edited

Loading