Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task z_zvol blocked for more than 120 seconds. #6890

Closed
jficz opened this issue Nov 21, 2017 · 7 comments · Fixed by #7343
Closed

task z_zvol blocked for more than 120 seconds. #6890

jficz opened this issue Nov 21, 2017 · 7 comments · Fixed by #7343
Labels
Component: ZVOL ZFS Volumes
Milestone

Comments

@jficz
Copy link

jficz commented Nov 21, 2017

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version latest
Linux Kernel linux-ck 4.13.14.1-ck
Architecture 64
ZFS Version 0.7.3-1 (AUR, dkms)
SPL Version 0.7.3-1 (AUR, dkms)

Describe the problem you're observing

ZFS hangs on zfs send | receive combo on local machine from LUKS-encrypted nvme partition (tank) to LUKS-encrypted SATA HDD connected via USB3.0 dock (dozer). The HDD is possibly faulty, however badblocks performed before the send|receive combo did not yield any problems.

Describe how to reproduce the problem

zpool tank has about 300 snapshots of 10 (out of 18) datasets made by zfs-auto-snapshot.

zfs send -RLc tank@send | zfs receive -Fusvd dozer

Include any warning/errors/backtraces from the system logs

dmesg:

[  204.780233] usb 2-4: new SuperSpeed USB device number 2 using xhci_hcd
[  204.802181] usb-storage 2-4:1.0: USB Mass Storage device detected
[  204.802222] usb-storage 2-4:1.0: Quirks match for vid 174c pid 55aa: 400000
[  204.802236] scsi host3: usb-storage 2-4:1.0
[  205.830501] scsi 3:0:0:0: Direct-Access     ASMedia  ASM1153          0    PQ: 0 ANSI: 6
[  205.881441] sd 3:0:0:0: [sdb] 625134827 512-byte logical blocks: (320 GB/298 GiB)
[  205.881675] sd 3:0:0:0: [sdb] Write Protect is off
[  205.881677] sd 3:0:0:0: [sdb] Mode Sense: 43 00 00 00
[  205.881876] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  205.930655]  sdb: sdb1
[  205.931489] sd 3:0:0:0: [sdb] Attached SCSI disk
[  858.070057] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[  858.070061]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[  858.070062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  858.070063] z_zvol          D    0 17045      2 0x00000000
[  858.070065] Call Trace:
[  858.070070]  __schedule+0x6eb/0xcd0
[  858.070072]  schedule+0x3d/0xd0
[  858.070076]  taskq_wait_outstanding+0x87/0xd0 [spl]
[  858.070078]  ? wait_woken+0x80/0x80
[  858.070094]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[  858.070097]  taskq_thread+0x254/0x480 [spl]
[  858.070100]  ? wake_up_q+0x80/0x80
[  858.070102]  kthread+0x124/0x140
[  858.070104]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[  858.070106]  ? kthread_create_on_node+0x70/0x70
[  858.070108]  ret_from_fork+0x25/0x30
[  980.960061] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[  980.960066]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[  980.960067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  980.960068] z_zvol          D    0 17045      2 0x00000000
[  980.960070] Call Trace:
[  980.960076]  __schedule+0x6eb/0xcd0
[  980.960079]  schedule+0x3d/0xd0
[  980.960083]  taskq_wait_outstanding+0x87/0xd0 [spl]
[  980.960086]  ? wait_woken+0x80/0x80
[  980.960102]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[  980.960105]  taskq_thread+0x254/0x480 [spl]
[  980.960108]  ? wake_up_q+0x80/0x80
[  980.960111]  kthread+0x124/0x140
[  980.960114]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[  980.960115]  ? kthread_create_on_node+0x70/0x70
[  980.960118]  ret_from_fork+0x25/0x30
[ 1103.830057] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1103.830061]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1103.830062] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1103.830063] z_zvol          D    0 17045      2 0x00000000
[ 1103.830066] Call Trace:
[ 1103.830071]  __schedule+0x6eb/0xcd0
[ 1103.830073]  schedule+0x3d/0xd0
[ 1103.830077]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1103.830080]  ? wait_woken+0x80/0x80
[ 1103.830095]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1103.830099]  taskq_thread+0x254/0x480 [spl]
[ 1103.830102]  ? wake_up_q+0x80/0x80
[ 1103.830104]  kthread+0x124/0x140
[ 1103.830106]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1103.830108]  ? kthread_create_on_node+0x70/0x70
[ 1103.830110]  ret_from_fork+0x25/0x30
[ 1226.710103] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1226.710106]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1226.710107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1226.710108] z_zvol          D    0 17045      2 0x00000000
[ 1226.710110] Call Trace:
[ 1226.710115]  __schedule+0x6eb/0xcd0
[ 1226.710117]  schedule+0x3d/0xd0
[ 1226.710121]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1226.710123]  ? wait_woken+0x80/0x80
[ 1226.710138]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1226.710141]  taskq_thread+0x254/0x480 [spl]
[ 1226.710143]  ? wake_up_q+0x80/0x80
[ 1226.710145]  kthread+0x124/0x140
[ 1226.710147]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1226.710149]  ? kthread_create_on_node+0x70/0x70
[ 1226.710151]  ret_from_fork+0x25/0x30
[ 1349.590140] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1349.590144]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1349.590145] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1349.590146] z_zvol          D    0 17045      2 0x00000000
[ 1349.590148] Call Trace:
[ 1349.590153]  __schedule+0x6eb/0xcd0
[ 1349.590155]  schedule+0x3d/0xd0
[ 1349.590159]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1349.590162]  ? wait_woken+0x80/0x80
[ 1349.590178]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1349.590180]  taskq_thread+0x254/0x480 [spl]
[ 1349.590183]  ? wake_up_q+0x80/0x80
[ 1349.590185]  kthread+0x124/0x140
[ 1349.590187]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1349.590189]  ? kthread_create_on_node+0x70/0x70
[ 1349.590191]  ret_from_fork+0x25/0x30
[ 1436.501551] zfs[27076]: segfault at 10 ip 00007f0bd40474b1 sp 00007fff83d42008 error 4 in libc-2.26.so[7f0bd3ef0000+1ae000]
[ 1472.470050] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1472.470053]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1472.470054] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1472.470056] z_zvol          D    0 17045      2 0x00000000
[ 1472.470058] Call Trace:
[ 1472.470063]  __schedule+0x6eb/0xcd0
[ 1472.470065]  schedule+0x3d/0xd0
[ 1472.470069]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1472.470071]  ? wait_woken+0x80/0x80
[ 1472.470087]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1472.470090]  taskq_thread+0x254/0x480 [spl]
[ 1472.470092]  ? wake_up_q+0x80/0x80
[ 1472.470095]  kthread+0x124/0x140
[ 1472.470097]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1472.470098]  ? kthread_create_on_node+0x70/0x70
[ 1472.470100]  ret_from_fork+0x25/0x30
[ 1595.350031] INFO: task z_zvol:389 blocked for more than 120 seconds.
[ 1595.350034]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1595.350035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1595.350036] z_zvol          D    0   389      2 0x00000000
[ 1595.350039] Call Trace:
[ 1595.350044]  __schedule+0x6eb/0xcd0
[ 1595.350046]  schedule+0x3d/0xd0
[ 1595.350050]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1595.350052]  ? wait_woken+0x80/0x80
[ 1595.350068]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1595.350071]  taskq_thread+0x254/0x480 [spl]
[ 1595.350073]  ? wake_up_q+0x80/0x80
[ 1595.350076]  kthread+0x124/0x140
[ 1595.350078]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1595.350079]  ? kthread_create_on_node+0x70/0x70
[ 1595.350081]  ? kthread_create_on_node+0x70/0x70
[ 1595.350082]  ret_from_fork+0x25/0x30
[ 1595.350106] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1595.350107]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1595.350108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1595.350109] z_zvol          D    0 17045      2 0x00000000
[ 1595.350110] Call Trace:
[ 1595.350112]  __schedule+0x6eb/0xcd0
[ 1595.350113]  schedule+0x3d/0xd0
[ 1595.350116]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1595.350117]  ? wait_woken+0x80/0x80
[ 1595.350130]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1595.350133]  taskq_thread+0x254/0x480 [spl]
[ 1595.350135]  ? wake_up_q+0x80/0x80
[ 1595.350137]  kthread+0x124/0x140
[ 1595.350139]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1595.350140]  ? kthread_create_on_node+0x70/0x70
[ 1595.350142]  ret_from_fork+0x25/0x30
[ 1718.230091] INFO: task z_zvol:389 blocked for more than 120 seconds.
[ 1718.230094]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1718.230095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1718.230097] z_zvol          D    0   389      2 0x00000000
[ 1718.230099] Call Trace:
[ 1718.230103]  __schedule+0x6eb/0xcd0
[ 1718.230105]  schedule+0x3d/0xd0
[ 1718.230109]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1718.230112]  ? wait_woken+0x80/0x80
[ 1718.230127]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1718.230130]  taskq_thread+0x254/0x480 [spl]
[ 1718.230133]  ? wake_up_q+0x80/0x80
[ 1718.230135]  kthread+0x124/0x140
[ 1718.230137]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1718.230139]  ? kthread_create_on_node+0x70/0x70
[ 1718.230140]  ? kthread_create_on_node+0x70/0x70
[ 1718.230142]  ret_from_fork+0x25/0x30
[ 1718.230164] INFO: task z_zvol:17045 blocked for more than 120 seconds.
[ 1718.230166]       Tainted: P     U     O    4.13.14-1-ck-skylake #1
[ 1718.230167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1718.230167] z_zvol          D    0 17045      2 0x00000000
[ 1718.230169] Call Trace:
[ 1718.230170]  __schedule+0x6eb/0xcd0
[ 1718.230172]  schedule+0x3d/0xd0
[ 1718.230175]  taskq_wait_outstanding+0x87/0xd0 [spl]
[ 1718.230176]  ? wait_woken+0x80/0x80
[ 1718.230190]  zvol_task_cb+0x1fa/0x5a0 [zfs]
[ 1718.230192]  taskq_thread+0x254/0x480 [spl]
[ 1718.230195]  ? wake_up_q+0x80/0x80
[ 1718.230196]  kthread+0x124/0x140
[ 1718.230198]  ? taskq_thread_should_stop+0x70/0x70 [spl]
[ 1718.230200]  ? kthread_create_on_node+0x70/0x70
[ 1718.230201]  ret_from_fork+0x25/0x30
[ 2342.854530] zfs[9384]: segfault at 10 ip 00007f43a3ebb4b1 sp 00007ffd2f425268 error 4 in libc-2.26.so[7f43a3d64000+1ae000]
[ 2343.138402] zfs[9544]: segfault at 10 ip 00007f505f1434b1 sp 00007ffdcb1d1e48 error 4 in libc-2.26.so[7f505efec000+1ae000]
[ 3239.855885] zfs[14132]: segfault at 10 ip 00007f75c5b7b4b1 sp 00007ffec4e010d8 error 4 in libc-2.26.so[7f75c5a24000+1ae000]

zdb (anonymised):

root@g ~ # zdb 
dozer:
    version: 5000
    name: 'dozer'
    state: 0
    txg: 4
    pool_guid: <number>
    errata: 0
    hostid: <number>
    hostname: 'g'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: <number>
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: <number>
            path: '/dev/mapper/g'
            devid: 'dm-uuid-CRYPT-LUKS1-<uuid>-g'
            whole_disk: 0
            metaslab_array: 256
            metaslab_shift: 31
            ashift: 9
            asize: 320062095360
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 129
            com.delphix:vdev_zap_top: 130
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
tank:
    version: 5000
    name: 'tank'
    state: 0
    txg: 893048
    pool_guid: <number>
    errata: 0
    hostid: <number>
    hostname: 'glock'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: <number>
        children[0]:
            type: 'disk'
            id: 0
            guid: <number>
            path: '/dev/mapper/h'
            devid: 'dm-uuid-CRYPT-LUKS1-<uuid>-h'
            whole_disk: 0
            metaslab_array: 256
            metaslab_shift: 30
            ashift: 12
            asize: 254978818048
            is_log: 0
            DTL: 120
            create_txg: 4
            com.delphix:vdev_zap_leaf: 58
            com.delphix:vdev_zap_top: 59
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

Happy to provide any additional info, just tell me what you want to know.

@sempervictus
Copy link
Contributor

Seems very similar to #6888.... newer kernel though.
All the different stack traces seem to include the zvol_task_cb, taskq_thread, and some wait for the taskqs.

@bunder2015
Copy link
Contributor

usb is slow, this might be normal

@jficz
Copy link
Author

jficz commented Nov 22, 2017

even USB3.0? I don't know, this is my first encounter with a USB3.0 HDD dock...

I almost feel this is close to a duplicate of #6330. Wonder why it didn't show up in my searches, feeling a bit ashamed now.

Anyway, I did try the same with a slightly older kernel 4.13.12 (or 13, not sure) and got OOPSes instead of timeouts but back than I just assumed it was a faulty drive and/or dock so I don't have any useful logs from back then.

@bunder2015
Copy link
Contributor

bunder2015 commented Nov 22, 2017

Granted that usb 3.0 is good for 5 Gbit/s (625 MB/s) when plugged into a usb 3 port (the usb ports with the blue connector), but its also up to the drive and the controller as to whether it can actually run that fast. I haven't seen this myself, but I don't use my externals very often. You might want to see how fast the drive can actually run at with something like dd. NVME is really fast and it might just be too fast to keep up with the recv end. ps: don't feel bad, only one way to find out. 😄

@sempervictus
Copy link
Contributor

So this is still a thing - slow destinations definitely make it worse. Im replacing a 2T 5400 2.5" with a 4T of the same kind, and getting this gem just during znapzend one-shot ops:

[Thu Dec 21 14:32:54 2017] INFO: task z_zvol:1370 blocked for more than 120 seconds.
[Thu Dec 21 14:32:54 2017]       Tainted: P           OE   4.9.68 #1
[Thu Dec 21 14:32:54 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Dec 21 14:32:54 2017] z_zvol          D    0  1370      2 0x00080000
[Thu Dec 21 14:32:54 2017]  ffffffff810a8b00 0000000000000000 ffff880831c16000 ffff88082961ee00
[Thu Dec 21 14:32:54 2017]  ffff88083f2d6b00 ffffc9000a4b7c48 ffffffff81834dcd 0000000000000282
[Thu Dec 21 14:32:54 2017]  ffff88083ac52c00 0000000000000000 ffff88082961ee00 000000000000017b
[Thu Dec 21 14:32:54 2017] Call Trace:
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a8b00>] ? switched_to_idle+0x20/0x20
[Thu Dec 21 14:32:54 2017]  [<ffffffff81834dcd>] ? __schedule+0x24d/0x6e0
[Thu Dec 21 14:32:54 2017]  [<ffffffff818352a4>] schedule+0x44/0x90
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe7edd>] taskq_wait_outstanding+0x8d/0xd0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810bcba0>] ? prepare_to_wait_event+0x110/0x110
[Thu Dec 21 14:32:54 2017]  [<ffffffffa13c1581>] zvol_task_cb+0x201/0x5a0 [zfs]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a8b00>] ? switched_to_idle+0x20/0x20
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe8ca1>] taskq_thread+0x2b1/0x4d0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff810a0310>] ? wake_up_q+0x90/0x90
[Thu Dec 21 14:32:54 2017]  [<ffffffffa0fe89f0>] ? task_done+0xa0/0xa0 [spl]
[Thu Dec 21 14:32:54 2017]  [<ffffffff81095e0d>] kthread+0xfd/0x120
[Thu Dec 21 14:32:54 2017]  [<ffffffff81095d10>] ? kthread_parkme+0x40/0x40
[Thu Dec 21 14:32:54 2017]  [<ffffffff8183a6e7>] ret_from_fork+0x37/0x50

Shot in the dark, but, how "aware" are zpools of the underlying performance characteristics of their vdevs? Moreover, how "aware" are they when these vdevs are actually dm-crypt volumes? "Aware", in this case, referring to "the amount of burst and sustained IOPs delineated by characteristics such as reads, seeks, writes, and stratified by the number of bytes for each."
Without such "awareness," i'm pretty sure that the zvol is getting some sort of access when its initialized by the recv operation, and whatever IOs are required during the mapping of a block device (partition reads, what have you), are not being serviced in time because the pool is striving to keep consuming the send stream, not knowing it doesnt actually have the capacity to dispatch those IOs.

@behlendorf: Any chance you and Santa have zvol stability (and performance, if we've been good) in a burlap sack? Or at least the elf who knows how to get there... ;-)

@behlendorf behlendorf added the Component: ZVOL ZFS Volumes label Dec 21, 2017
@behlendorf behlendorf added this to the 0.8.0 milestone Dec 21, 2017
@behlendorf
Copy link
Contributor

slow destinations definitely make it worse.

The good news is @klkblake provided us a nice reproducer in
#6330 (comment) which should make this easier to get to the root cause.

how "aware" are zpools of the underlying performance characteristics of their vdevs

This is the kind of information which ZFS has but only takes maximum advantage of in a few specific circumstances, for example reads to a mirror vdev. If there are individual devices which are much slowed they can drag down the performance of the entire pool.

have zvol stability (and performance)

So both #6989 and #6926 (merged to master) have the potential to to improve zvol performance. And for some reason which isn't yet clear disabling the dynamic taskqs has been reported to improve stability and may help with performance.

behlendorf pushed a commit that referenced this issue Mar 30, 2018
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue.  This results in hung tasks:

<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3>      Tainted: P           O  3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260

root@linux:~# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina
spl_system_taskq/0            1     2     0    64   100     1
	active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
	wait: 5541
spl_delay_taskq/0             0     1     0     4   100     1
	delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1                      1     1     0     1   120     1
	active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
	pend: zvol_task_cb [zfs](0xffff88001fde6800)

This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6330 
Closes #6890 
Closes #7343
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Apr 16, 2018
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue.  This results in hung tasks:

<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3>      Tainted: P           O  3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260

root@linux:~# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina
spl_system_taskq/0            1     2     0    64   100     1
	active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
	wait: 5541
spl_delay_taskq/0             0     1     0     4   100     1
	delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1                      1     1     0     1   120     1
	active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
	pend: zvol_task_cb [zfs](0xffff88001fde6800)

This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6330 
Closes openzfs#6890 
Closes openzfs#7343
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue May 4, 2018
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue.  This results in hung tasks:

<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3>      Tainted: P           O  3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260

root@linux:~# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina
spl_system_taskq/0            1     2     0    64   100     1
	active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
	wait: 5541
spl_delay_taskq/0             0     1     0     4   100     1
	delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1                      1     1     0     1   120     1
	active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
	pend: zvol_task_cb [zfs](0xffff88001fde6800)

This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes openzfs#6330 
Closes openzfs#6890 
Closes openzfs#7343
tonyhutter pushed a commit that referenced this issue May 10, 2018
During a receive operation zvol_create_minors_impl() can wait
needlessly for the prefetch thread because both share the same tasks
queue.  This results in hung tasks:

<3>INFO: task z_zvol:5541 blocked for more than 120 seconds.
<3>      Tainted: P           O  3.16.0-4-amd64
<3>"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

The first z_zvol:5541 (zvol_task_cb) is waiting for the long running
traverse_prefetch_thread:260

root@linux:~# cat /proc/spl/taskq
taskq                       act  nthr  spwn  maxt   pri  mina
spl_system_taskq/0            1     2     0    64   100     1
	active: [260]traverse_prefetch_thread [zfs](0xffff88003347ae40)
	wait: 5541
spl_delay_taskq/0             0     1     0     4   100     1
	delay: spa_deadman [zfs](0xffff880039924000)
z_zvol/1                      1     1     0     1   120     1
	active: [5541]zvol_task_cb [zfs](0xffff88001fde6400)
	pend: zvol_task_cb [zfs](0xffff88001fde6800)

This change adds a dedicated, per-pool, prefetch taskq to prevent the
traverse code from monopolizing the global (and limited) system_taskq by
inappropriately scheduling long running tasks on it.

Reviewed-by: Albert Lee <trisk@forkgnu.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes #6330
Closes #6890
Closes #7343
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: ZVOL ZFS Volumes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@behlendorf @sempervictus @jficz @bunder2015 and others