Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unloading zfs stack causes a BUG #828

Closed
Ukko-Ylijumala opened this issue Jul 13, 2012 · 8 comments
Closed

Unloading zfs stack causes a BUG #828

Ukko-Ylijumala opened this issue Jul 13, 2012 · 8 comments
Milestone

Comments

@Ukko-Ylijumala
Copy link

While testing the slow zvol issue #824 I often encounter this bug while doing a
modprobe -r zfs zcommon znvpair zavl zunicode spl. Stack trace follows:

storagetest kernel: Pid: 11046, comm: modprobe Tainted: P           O 3.2.13-damicon-scst #2 Supermicro X8DT6/X8DT6
storagetest kernel: RIP: 0010:[]  [] spl_kmem_cache_free+0x74/0x130 [spl]
storagetest kernel: RSP: 0018:ffff8807baa03d18  EFLAGS: 00000202
storagetest kernel: RAX: 0000000000000081 RBX: 0000000000000020 RCX: 0000000000000000
storagetest kernel: RDX: 0000000000000080 RSI: ffff880c167f2890 RDI: ffff880c167f2490
storagetest kernel: RBP: ffff8807baa03d48 R08: 0000000000000000 R09: dead000000200200
storagetest kernel: R10: ffff880c167f2488 R11: ffff8802a2f2c178 R12: 000000018100006e
storagetest kernel: R13: 00000008073be6a0 R14: 0000000180800043 R15: 0000000000000001
storagetest kernel: FS:  00007f490853e700(0000) GS:ffff880c3fc00000(0000) knlGS:0000000000000000
storagetest kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
storagetest kernel: CR2: 00007f84f3e562a0 CR3: 00000003b39b3000 CR4: 00000000000006e0
storagetest kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
storagetest kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
storagetest kernel: Process modprobe (pid: 11046, threadinfo ffff8807baa02000, task ffff880bf0010000)
storagetest kernel: Stack:
storagetest kernel:  ffffffffa061d380 ffff88075f050ee0 0000000000000000 ffff88075f050ee0
storagetest kernel:  ffffffffa061f630 ffff8805f15c2e50 ffff8807baa03d78 ffffffffa05538fd
storagetest kernel:  ffff88034a44a3e8 ffff88075f050ee0 ffff88034a44a3e8 ffffffffa061f630
storagetest kernel: Call Trace:
storagetest kernel:  [] arc_hdr_destroy+0x1bd/0x250 [zfs]
storagetest kernel:  [] l2arc_evict+0x2bb/0x310 [zfs]
storagetest kernel:  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
storagetest kernel:  [] spa_l2cache_drop+0x78/0xa0 [zfs]
storagetest kernel:  [] spa_unload+0x9f/0x210 [zfs]
storagetest kernel:  [] spa_evict_all+0x6f/0xb0 [zfs]
storagetest kernel:  [] spa_fini+0x13/0x130 [zfs]
storagetest kernel:  [] _fini+0x73/0x100 [zfs]
storagetest kernel:  [] spl__fini+0x13/0x40 [zfs]
storagetest kernel:  [] sys_delete_module+0x19e/0x270
storagetest kernel:  [] ? do_munmap+0x2ae/0x370
storagetest kernel:  [] system_call_fastpath+0x16/0x1b
storagetest kernel: Code: ca 00 00 48 98 4c 8b 6c c3 10 41 8b 45 04 41 3b 45 08 0f 83 ab 00 00 00 89 c2 ff c0 4d 89 a4 d5 90 00 00 00 41 89 45 04 41 56 9d  ff 8b 60 01 00 00 f6 05 16 0f 01 00 01 74 09 f6 05 06 0f 01
storagetest kernel: Call Trace:
storagetest kernel:  [] arc_hdr_destroy+0x1bd/0x250 [zfs]
storagetest kernel:  [] l2arc_evict+0x2bb/0x310 [zfs]
storagetest kernel:  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
storagetest kernel:  [] spa_l2cache_drop+0x78/0xa0 [zfs]
storagetest kernel:  [] spa_unload+0x9f/0x210 [zfs]
storagetest kernel:  [] spa_evict_all+0x6f/0xb0 [zfs]
storagetest kernel:  [] spa_fini+0x13/0x130 [zfs]
storagetest kernel:  [] _fini+0x73/0x100 [zfs]
storagetest kernel:  [] spl__fini+0x13/0x40 [zfs]
storagetest kernel:  [] sys_delete_module+0x19e/0x270
storagetest kernel:  [] ? do_munmap+0x2ae/0x370
storagetest kernel:  [] system_call_fastpath+0x16/0x1b
storagetest kernel: ZFS: Unloaded module v0.6.0-rc9
storagetest kernel: SPL: Unloaded module v0.6.0-rc9

However, system stays responsive and loading ZFS stack again doesn't cause issues.

This bug only manifests itself after stressing volblocksize=8k zvols with 8k iosize writes for a few hours. After that stresstest, every zfs/zpool command is slow (slab fragmentation?), taking several seconds to complete with just 10 zfs filesystems. echo 3 > /proc/sys/vm/drop_caches doesn't return responsiveness, only modprobe -r followed by reloading ZFS stack returns things back to normal.

@behlendorf
Copy link
Contributor

Thanks for filing the bug, can you include the exact spl/zfs versions, kernel version, and architecture your seeing the failures on.

@Ukko-Ylijumala
Copy link
Author

Whoops, sorry about forgetting that. Here comes:

Linux storagetest 3.2.13-damicon-scst #2 SMP Mon Mar 26 16:11:22 EEST 2012 x86_64 GNU/Linux

Vanilla sources + scst (http://scst.sourceforge.net/) patches.

spl at openzfs/spl@33f507c
zfs at ba9b542

@Ukko-Ylijumala
Copy link
Author

Got another one after stressing ZoL with more 8k testing, this time with reads. 4 hours seems sufficient to manifest this bug around 50% of the time.

Oh, both SPL and ZFS are now at latest git master as of 17.7.

[4865699.739735] Pid: 2267, comm: modprobe Tainted: P           O 3.2.13-damicon-scst #2 Supermicro X8DT6/X8DT6
[4865699.739788] RIP: 0010:[]  [] l2arc_evict+0x11c/0x310 [zfs]
[4865699.739865] RSP: 0018:ffff8804bca0fd88  EFLAGS: 00000202
[4865699.739891] RAX: 0000000000000220 RBX: ffffffffa02684cb RCX: 7f29d8ef13f8b391
[4865699.739934] RDX: 0000000000000064 RSI: 00000000001c8217 RDI: ffffffffa061f290
[4865699.739977] RBP: ffff8804bca0fdd8 R08: ffff88007b9b7640 R09: 0000000000000001
[4865699.740020] R10: ffff88013c826400 R11: ffff88003bf0e8c8 R12: ffffffffa061fed0
[4865699.740062] R13: 0000000000000020 R14: ffff8800488b0260 R15: ffff8803544b9410
[4865699.740105] FS:  00007f07145ec700(0000) GS:ffff880627c60000(0000) knlGS:0000000000000000
[4865699.740150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4865699.740177] CR2: 00007fd9d030f000 CR3: 0000000ab0d76000 CR4: 00000000000006e0
[4865699.740220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4865699.740263] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[4865699.740306] Process modprobe (pid: 2267, threadinfo ffff8804bca0e000, task ffff880614a2ada0)
[4865699.740350] Stack:
[4865699.740370]  0000000000000064 0000000c2d139200 ffff8804cf4a6ba0 0000000100000009
[4865699.740423]  ffff8809717db000 ffff8804cf4a6ba0 ffff8805f540f000 ffff8805f540f000
[4865699.740474]  ffff8804bca0fe10 00000000015e0170 ffff8804bca0fdf8 ffffffffa0553d67
[4865699.740525] Call Trace:
[4865699.740558]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[4865699.740608]  [] spa_l2cache_drop+0x78/0xa0 [zfs]
[4865699.740655]  [] spa_unload+0x9f/0x210 [zfs]
[4865699.740701]  [] spa_evict_all+0x6f/0xb0 [zfs]
[4865699.740749]  [] spa_fini+0x13/0x130 [zfs]
[4865699.740797]  [] _fini+0x73/0x100 [zfs]
[4865699.740843]  [] spl__fini+0x13/0x40 [zfs]
[4865699.740873]  [] sys_delete_module+0x19e/0x270
[4865699.740903]  [] ? do_munmap+0x2ae/0x370
[4865699.740932]  [] system_call_fastpath+0x16/0x1b
[4865699.740959] Code: 21 fa 48 89 d0 48 89 55 b0 48 c1 e0 06 4c 8d a8 90 d9 61 a0 4c 89 ef e8 33 23 00 e1 48 8b 55 b0 85 c0 0f 84 b9 01 00 00 8b 43 60  00 00 04 00 0f 85 f9 00 00 00 44 8b 4d cc 45 85 c9 75 28 48
[4865699.741163] Call Trace:
[4865699.741194]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[4865699.741241]  [] spa_l2cache_drop+0x78/0xa0 [zfs]
[4865699.741287]  [] spa_unload+0x9f/0x210 [zfs]
[4865699.741332]  [] spa_evict_all+0x6f/0xb0 [zfs]
[4865699.741379]  [] spa_fini+0x13/0x130 [zfs]
[4865699.741425]  [] _fini+0x73/0x100 [zfs]
[4865699.741471]  [] spl__fini+0x13/0x40 [zfs]
[4865699.741498]  [] sys_delete_module+0x19e/0x270
[4865699.741527]  [] ? do_munmap+0x2ae/0x370
[4865699.741554]  [] system_call_fastpath+0x16/0x1b
[4865713.462927] ZFS: Unloaded module v0.6.0-rc9
[4865714.430720] SPL: Unloaded module v0.6.0-rc9

This time the pool was configured with 2x 120GB SSDs. Did a test earlier with just 1 SSD, and unloading ZoL stack that time didn't bug out. Maybe one needs more than 1 SSD in the pool to bug it out like this, I dunno.

@behlendorf
Copy link
Contributor

It seems like the arc hdr_cache cache is somehow being destroyed before everything gets evicted from the l2arc. How exactly this can happen isn't clear.

@Ukko-Ylijumala
Copy link
Author

Another one, this after stressing SLOG writes (sync). 2x 120GB SSD in pool as cache. spl+zfs from 19.7. git master.

So far I tried to produce this bug today and yesterday with other blocksize zvols, but only 8k volblocksize zvol with 8k writes/reads causes enough stress to reliably reproduce this. @behlendorf - anything you want me to try to narrow this down? So far it's not straight out crashed the machine, but it's annoying.

Also, 8k zvols are horribly slow. Looks like 8k zvols (and 8k+below reads/writes) really stress some part of ZoL implementation. Literally a simple zfs list takes around 7 seconds after a 8k stresstest, while right after loading zfs stack it's pretty much instantaneous.

[4955958.710255] Pid: 30347, comm: modprobe Tainted: P           O 3.2.13-damicon-scst #2 Supermicro X8DT6/X8DT6
[4955958.710313] RIP: 0010:[]  [] l2arc_evict+0x11c/0x310 [zfs]
[4955958.710394] RSP: 0018:ffff880a87dffd88  EFLAGS: 00000202
[4955958.710422] RAX: 0000000000000220 RBX: ffffffffa02684cb RCX: 34cf6a2b93ddb7a3
[4955958.710468] RDX: 00000000000000ee RSI: 00000000001d7b75 RDI: ffffffffa06214f0
[4955958.710514] RBP: ffff880a87dffdd8 R08: ffff8802704c23f8 R09: 0000000000000000
[4955958.710559] R10: ffff880956468400 R11: ffff8805b3946178 R12: ffffffffa061e5b0
[4955958.710605] R13: 0000000000000020 R14: ffff880258c8d7e0 R15: ffff8807ec904000
[4955958.710652] FS:  00007f6f0da63700(0000) GS:ffff880c3fc20000(0000) knlGS:0000000000000000
[4955958.710700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4955958.710729] CR2: 00007f84f32a5730 CR3: 00000004c7a51000 CR4: 00000000000006e0
[4955958.710776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4955958.710822] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[4955958.710868] Process modprobe (pid: 30347, threadinfo ffff880a87dfe000, task ffff8807effb16d0)
[4955958.710916] Stack:
[4955958.710938]  00000000000000ee 00000006e9172400 ffff8801796a3de0 0000000100000009
[4955958.710996]  ffff8807eb8bb000 ffff8801796a3de0 ffff8803f7103000 ffff8803f7103000
[4955958.711052]  ffff880a87dffe10 000000000218d170 ffff880a87dffdf8 ffffffffa0553d67
[4955958.711109] Call Trace:
[4955958.711151]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[4955958.711216]  [] spa_l2cache_drop+0x78/0xa0 [zfs]
[4955958.711281]  [] spa_unload+0x9f/0x210 [zfs]
[4955958.711344]  [] spa_evict_all+0x6f/0xb0 [zfs]
[4955958.711409]  [] spa_fini+0x13/0x130 [zfs]
[4955958.711473]  [] _fini+0x73/0x100 [zfs]
[4955958.711537]  [] spl__fini+0x13/0x40 [zfs]
[4955958.711571]  [] sys_delete_module+0x19e/0x270
[4955958.711605]  [] ? do_munmap+0x2ae/0x370
[4955958.711640]  [] system_call_fastpath+0x16/0x1b
[4955958.711669] Code: 21 fa 48 89 d0 48 89 55 b0 48 c1 e0 06 4c 8d a8 70 d9 61 a0 4c 89 ef e8 33 23 00 e1 48 8b 55 b0 85 c0 0f 84 b9 01 00 00 8b 43 60  00 00 04 00 0f 85 f9 00 00 00 44 8b 4d cc 45 85 c9 75 28 48
[4955958.711904] Call Trace:
[4955958.711945]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[4955958.712009]  [] spa_l2cache_drop+0x78/0xa0 [zfs]
[4955958.712073]  [] spa_unload+0x9f/0x210 [zfs]
[4955958.712136]  [] spa_evict_all+0x6f/0xb0 [zfs]
[4955958.712200]  [] spa_fini+0x13/0x130 [zfs]
[4955958.712265]  [] _fini+0x73/0x100 [zfs]
[4955958.712329]  [] spl__fini+0x13/0x40 [zfs]
[4955958.712360]  [] sys_delete_module+0x19e/0x270
[4955958.712392]  [] ? do_munmap+0x2ae/0x370
[4955958.712422]  [] system_call_fastpath+0x16/0x1b
[4955969.733825] ZFS: Unloaded module v0.6.0-rc9
[4955970.541101] SPL: Unloaded module v0.6.0-rc9

@behlendorf
Copy link
Contributor

As a temporary workaround you might try removing the l2arc device from the pool before unloading the modules.

@Ukko-Ylijumala
Copy link
Author

@behlendorf - got another one, this time while removing the L2ARC device from pool after a stresstest. Here's the stack trace:

[5608739.177785] Pid: 30117, comm: zpool Tainted: P           O 3.2.13-damicon-scst #2 Supermicro X8DT6/X8DT6
[5608739.177843] RIP: 0010:[]  [] arc_change_state+0xee/0x2f0 [zfs]
[5608739.177930] RSP: 0018:ffff880177affc68  EFLAGS: 00000207
[5608739.177958] RAX: 0000000000000000 RBX: ffffffffa018f4cb RCX: ffffffffffffe000
[5608739.178004] RDX: 0000000000000001 RSI: dead000000200200 RDI: dead000000100100
[5608739.178049] RBP: ffff880177affca8 R08: ffff880b56ce5c58 R09: 0000000000000001
[5608739.178095] R10: ffff8803ab505400 R11: ffff880be54bd8a0 R12: ffff8803b3199600
[5608739.178140] R13: ffff880177affc38 R14: ffff8803ab505400 R15: ffff880177affc78
[5608739.178186] FS:  00007fd5e6fc3be0(0000) GS:ffff880627c00000(0000) knlGS:0000000000000000
[5608739.178234] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[5608739.178263] CR2: 00007f022b9a24f8 CR3: 0000000af7344000 CR4: 00000000000006e0
[5608739.178309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[5608739.178354] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[5608739.178400] Process zpool (pid: 30117, threadinfo ffff880177afe000, task ffff880616ed0000)
[5608739.178448] Stack:
[5608739.178469]  ffffffff00000001 0000000000000000 ffff880177affca8 ffff880b56ce5c48
[5608739.178527]  ffff8808fc9308c8 ffffffffa0726570 ffff8801f4e97a70 ffff8801f4e97a60
[5608739.178583]  ffff880177affd08 ffffffffa065ac43 0000000000000070 000000184c7de600
[5608739.178639] Call Trace:
[5608739.178682]  [] l2arc_evict+0x2b3/0x310 [zfs]
[5608739.178730]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[5608739.178797]  [] spa_load_l2cache+0x1c3/0x4d0 [zfs]
[5608739.178863]  [] spa_vdev_remove+0x335/0x3a0 [zfs]
[5608739.178932]  [] zfs_ioc_vdev_remove+0x37/0x60 [zfs]
[5608739.179000]  [] ? pool_status_check+0x62/0xa0 [zfs]
[5608739.179068]  [] zfsdev_ioctl+0xfd/0x1c0 [zfs]
[5608739.179101]  [] do_vfs_ioctl+0x9b/0x4f0
[5608739.179132]  [] ? call_rcu_sched+0x15/0x20
[5608739.179162]  [] sys_ioctl+0xa1/0xb0
[5608739.179194]  [] system_call_fastpath+0x16/0x1b
[5608739.179223] Code: 00 00 4c 3b 2d 1c b1 0c 00 0f 84 66 01 00 00 4c 3b 2d 17 b1 0c 00 0f 84 59 01 00 00 4d 89 f7 4c 89 f9 48 f7 d9 f0 49 01 4c c5 40 <85> d2 0f 85 c2 01 00 00 48 8b 05 cb b0 0c 00 48 39 c3 0f 84 f2 
[5608739.179456] Call Trace:
[5608739.179498]  [] l2arc_evict+0x2b3/0x310 [zfs]
[5608739.179546]  [] l2arc_remove_vdev+0xc7/0x140 [zfs]
[5608739.179610]  [] spa_load_l2cache+0x1c3/0x4d0 [zfs]
[5608739.179675]  [] spa_vdev_remove+0x335/0x3a0 [zfs]
[5608739.179743]  [] zfs_ioc_vdev_remove+0x37/0x60 [zfs]
[5608739.179811]  [] ? pool_status_check+0x62/0xa0 [zfs]
[5608739.179878]  [] zfsdev_ioctl+0xfd/0x1c0 [zfs]
[5608739.179908]  [] do_vfs_ioctl+0x9b/0x4f0
[5608739.179938]  [] ? call_rcu_sched+0x15/0x20
[5608739.179967]  [] sys_ioctl+0xa1/0xb0
[5608739.179997]  [] system_call_fastpath+0x16/0x1b

@Ukko-Ylijumala
Copy link
Author

I haven't been able to replicate this issue any more, so I guess this can be closed as well.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
The zettacache does aggregation of contiguous reads, meaning that we
will do one large `pread()` syscall instead of several small, contiguous
`pread()`'s.  This increases performance by decreasing the iops load.
However, if there are small gaps between blocks that we
need to read, aggregation will not be performed, leading to decreased
performance both due to increasing the iops load, and also the
implementation issues subsequent almost-contiguous i/os (up to
`DISK_AGG_CHUNK`) serially (i.e. one after another, rather than in
parallel)

This commit improves performance by allowing small gaps (up to
`DISK_READ_GAP_LIMIT`=8KB) to be spanned over to create a larger
aggregate read.

On a workload of 1MB reads from 4 files (with recordsize=8k), which were
written concurrently, performance doubles (on disks that can do 15K
IOPS, we go from 14K IOPS to 33K IOPS).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants