-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce module parameters for handling IO timeouts (WIP) #2387
Conversation
The low level formatting of magnetic drives contains ECC that permits error correction. When a drive reads a bad sector, it will keep trying to read it is reached in the hope that a different read attempt will allow the ECC to reconstruct the bad data until an internal time out occurs. This is useful on filesystems that cannot recover from drive errors. However, it is unnecessary as ZFS can easily self heal when errors occur. We introduce a tunable called zfs_vdev_disk_timeout_ms that allows us to tell the block layer to send the errors to ZFS earlier than normal. We also introduce zfs_vdev_disk_timeout_ticks, which is the maximum number of consecutive timeouts before a disk is marked as being bad. Signed-off-by: Richard Yao <ryao@gentoo.org>
@ryao This sounds like something that could be useful in all cases - have ZFS deal with errors/problems instead of the disk? |
I disagree with calling the number of failures "ticks" as that implies something time related and is often associated with the system clock or the HZ value. |
@ryao I like the idea of failing drives more aggressively and using the block layer error handlers to do it. However, I agree with the above comments about using "ticks" in the name of the failure counter. That typically implies scheduler timing ticks which is misleading in this context. I also think we could make the To test you change you may want to look at the scsi_debug module. It provides a handy way to introduce delayed to a simulated scsi device. That should allow you to trigger this code at will and make sure it works properly. |
This is a really good idea. I'm actually amazed that this wasn't implemented in zfs from the start. I suspect that the assumption from the beginning was that everyone will use raid-edition storage devices which has a rather short timeout (~7s IIRC). However, after having had to debug io-hang issues on several systems with expensive raid-edition HDDs I can for sure say that you can't rely on the timeout to actually work when shit hits the fan. In most cases there was a flakey HDD with buggy firmware, so instead of failing the request after ~7s it did recovery and eventually succeeded. So, I fully approve of this idea. It would make the world a better place, etc :-) However, to be perfect you would also have to handle the case of a storage device repeatedly doing error recovery that returns success rather quickly. Such a device would cause an IO slowdown, but not severe enough for the system to appear hung. Failing a drive because of this might be a bit harsh, but perhaps a pre/soft-fail state or just a separate slow-io-counter that increases just like the read/write/checksum error counters? Detection is also a bit of an issue, instead of hard-coded timeouts you might want to do some light statistics instead... |
@ZNikke If the error recovery is being done on a root pool, the system will have terrible responsiveness issues. In that case, the drive should be considered defective. That being said, we need a reliable way to set CCTL/TLER/ERC on drives that support it when ZFS has the whole disk. That is something that I would like to explicitly add to Linux's block layer when I find time. |
It seems the |
A few days ago @ryao and I were discussing how to actually test this. I suggested an iSCSI target followed by yanking the network cable (if figuratively). ryao suggested a device mapper target which could be suspended with And then this happened: (zfs not present in backtrace) connection8:0: detected conn error (1021) connection8:0: detected conn error (1002) ------------[ cut here ]------------ kernel BUG at block/blk-softirq.c:112! invalid opcode: 0000 [#1] SMP Modules linked in: sd_mod crc_t10dif autofs4 ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) zlib_deflate ppdev microcode pcspkr parport_pc parport virtio_net i2c_piix4 i2c_core virtio_blk(F) dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: floppy] CPU: 1 PID: 822 Comm: iscsid Tainted: PF O 3.10.53 #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88003cea90c0 ti: ffff88003daf6000 task.ti: ffff88003daf6000 RIP: 0010:[<ffffffff812882fc>] [<ffffffff812882fc>] __blk_complete_request+0x11c/0x130 RSP: 0018:ffff88003daf7888 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880039d7cba8 RCX: 0000000000000000 RDX: ffff88003c8dc7f8 RSI: ffff88003daf7908 RDI: ffff88003c8ea1d0 RBP: ffff88003daf78a8 R08: ffff880020408dc0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff880020408dc0 R13: ffff88003c8ea1d0 R14: 0000000000000005 R15: 000000000000000e FS: 00007ffa6b473700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e8001 CR3: 000000003dae9000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880039d7cba8 ffff880020408dc0 ffff88003d5182d0 0000000000000005 ffff88003daf78b8 ffffffff81288335 ffff88003daf78d8 ffffffff813b0a76 ffff880039d7cba8 ffff880020408dc0 ffff88003daf7928 ffffffffa02c6052 Call Trace: [<ffffffff81288335>] blk_complete_request+0x25/0x30 [<ffffffff813b0a76>] scsi_done+0x26/0x60 [<ffffffffa02c6052>] __iscsi_put_task+0xb2/0x110 [libiscsi] [<ffffffff8105cb14>] ? wake_up_worker+0x24/0x30 [<ffffffffa02c614c>] iscsi_complete_task+0x9c/0x140 [libiscsi] [<ffffffff8105ed60>] ? __queue_work+0x160/0x360 [<ffffffffa02c626c>] fail_scsi_task+0x7c/0xb0 [libiscsi] [<ffffffffa02c632e>] fail_scsi_tasks+0x8e/0xe0 [libiscsi] [<ffffffffa02c7bac>] iscsi_conn_stop+0x11c/0x2c0 [libiscsi] [<ffffffffa02dcc18>] iscsi_sw_tcp_conn_stop+0x78/0x90 [iscsi_tcp] [<ffffffffa02b1a12>] iscsi_if_rx+0x342/0x11e0 [scsi_transport_iscsi] [<ffffffff8114e96c>] ? __kmalloc_node_track_caller+0x3c/0x50 [<ffffffff8146089b>] ? __kmalloc_reserve+0x3b/0xa0 [<ffffffff8149b4d9>] netlink_unicast+0x169/0x1e0 [<ffffffff8149c33b>] netlink_sendmsg+0x23b/0x340 [<ffffffff814587e6>] sock_sendmsg+0xa6/0xd0 [<ffffffff8145667e>] ? move_addr_to_kernel+0x4e/0x90 [<ffffffff814655fd>] ? verify_iovec+0x8d/0x110 [<ffffffff81459b41>] ___sys_sendmsg+0x421/0x440 [<ffffffff8117b8d9>] ? mntput_no_expire+0x49/0x150 [<ffffffff8117ba03>] ? mntput+0x23/0x40 [<ffffffff8115f001>] ? __fput+0x191/0x270 [<ffffffff8102fd3f>] ? kvm_clock_read+0x1f/0x30 [<ffffffff8102fd59>] ? kvm_clock_get_cycles+0x9/0x10 [<ffffffff81459d79>] __sys_sendmsg+0x49/0x90 [<ffffffff81459dd9>] SyS_sendmsg+0x19/0x20 [<ffffffff815475d9>] system_call_fastpath+0x16/0x1b Code: af 66 0f 1f 44 00 00 89 de 44 89 f7 e8 5e 6c de ff e9 65 ff ff ff 66 0f 1f 84 00 00 00 00 00 bf 04 00 00 00 e8 f6 10 dc ff eb 85 <0f> 0b 66 90 eb fc 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 RIP [<ffffffff812882fc>] __blk_complete_request+0x11c/0x130 RSP <ffff88003daf7888> ---[ end trace 0fa9bfdd11a09fb5 ]--- Kernel panic - not syncing: Fatal exception in interrupt I don't know if my testing method is flawed or if the patch isn't working. Regardless I am having trouble validating that this works. |
Sure, now it's working... # cp randomdata2 /dev/null cp: overwrite `/dev/null'? y vdev_disk: timeout tick vdev_disk: timeout tick vdev_disk: timeout tick vdev_disk: timeout tick vdev_disk: too many ticks vdev_disk: too many ticks vdev_disk: too many ticks vdev_disk: too many ticks connection1:0: detected conn error (1021) The vdev_disk errors are just printk() calls I added to the patch in the I've had more stack dumps but it looks like it's iSCSI bugs. :/ |
@DeHackEd I've used the scsi_debug kernel module in the past for this sort of testing. It allows you to easily simulate a misbehaving device by doing thing like acknowledging reads/write as good and then throwing out the data, Thanks for taking the time to look in to testing this. If @ryao gets a chance to refresh the patch we'll be able to get the build results for various kernels and see how portable this is. A quick look at the kernel source suggests the core functionality goes all the way back to 2.6.28. |
# zpool import pool: mailstore00 id: 13039248741391054330 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: mailstore00 ONLINE sda ONLINE This is a real (bare metal) machine and this disk is a piece of crap and hangs when it feels like it. It happened while I was scrubbing. # zpool events TIME CLASS Aug 27 2014 15:08:44.116844970 resource.fs.zfs.statechange Aug 27 2014 15:08:44.478534312 ereport.fs.zfs.config.sync Aug 27 2014 15:08:44.621353165 ereport.fs.zfs.config.sync Aug 27 2014 15:50:44.353613618 ereport.fs.zfs.scrub.start Aug 27 2014 15:51:56.539063380 ereport.fs.zfs.scrub.finish Aug 29 2014 08:11:53.960893686 ereport.fs.zfs.scrub.start Aug 29 2014 08:14:14.630557396 ereport.fs.zfs.io Aug 29 2014 08:14:14.630588655 ereport.fs.zfs.io Aug 29 2014 08:14:14.630633113 ereport.fs.zfs.io Aug 29 2014 08:14:14.630659189 ereport.fs.zfs.io Aug 29 2014 08:30:51.875099140 ereport.fs.zfs.delay Aug 29 2014 08:47:35.395208849 ereport.fs.zfs.delay Other info # echo w > /proc/sysrq-trigger # dmesg ... ... sd 0:0:20:0: [sda] Sense Key : No Sense [current] sd 0:0:20:0: [sda] Add. Sense: No additional sense information sd 0:0:20:0: [sda] Sense Key : No Sense [current] sd 0:0:20:0: [sda] Add. Sense: No additional sense information sd 0:0:20:0: [sda] Sense Key : No Sense [current] sd 0:0:20:0: [sda] Add. Sense: No additional sense information sd 0:0:20:0: [sda] Sense Key : No Sense [current] sd 0:0:20:0: [sda] Add. Sense: No additional sense information SysRq : Show Blocked State task PC stack pid father txg_sync D ffff88047fc31940 0 6749 2 0x00000080 ffff8804695d4ad0 0000000000000046 ffff8804374bc280 0000000000011940 0000000000011940 ffff8800736b2010 0000000000000202 ffff8800735f8790 ffff8800736b3c18 ffff8800735f8760 0000000000000002 0000000000000000 Call Trace: [] ? cv_wait_common+0x125/0x1b0 [spl] [] ? bit_waitqueue+0xc0/0xc0 [] ? ktime_get_ts+0x3d/0xe0 [] ? dsl_scan_sync+0x5d3/0x760 [zfs] [] ? spa_sync+0x398/0x840 [zfs] [] ? __wake_up+0x43/0x70 [] ? ktime_get_ts+0x3d/0xe0 [] ? txg_sync_thread+0x35b/0x4c0 [zfs] [] ? set_user_nice+0xfb/0x190 [] ? txg_dispatch_callbacks+0x150/0x150 [zfs] [] ? __thread_create+0x360/0x360 [spl] [] ? thread_generic_wrapper+0x75/0x90 [spl] [] ? kthread+0xce/0xf0 [] ? kthread_freezable_should_stop+0x60/0x60 [] ? ret_from_fork+0x7c/0xb0 [] ? kthread_freezable_should_stop+0x60/0x60 # echo b > /proc/sysrq-trigger Is scrub/resilver I/O somehow exempt from this? I've added printk around the Who needs debugging software when you have real defective hardware? 👎 |
@DeHackEd In this case it looks like |
@ryao what state is this pull request in? |
@ryao Just tried compiling this on a 3.2.0-4-amd64 kernel and I get:
Works just fine on my production 3.17.0-rc5... |
|
BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 IP: [<ffffffffa02452d8>] vdev_disk_time_out_handler+0x8/0x60 [zfs] PGD 0 Oops: 0000 [#1] SMP Modules linked in: netconsole bridge stp llc configfs cpufreq_ondemand nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc dm_mirror dm_region_hash dm_log iTCO_wdt iTCO_vendor_support microcode serio_raw zfs(PO) zcommon(PO) znvpair(PO) zavl(PO) zunicode(PO) spl(O) sr_mod cdrom i2c_i801 lpc_ich mfd_core sg tpm_tis tpm ioatdma i7core_edac edac_core igb dca i2c_algo_bit i2c_core ptp pps_core hwmon acpi_cpufreq usb_storage ext3 jbd megaraid_sas [last unloaded: netconsole] CPU: 7 PID: 0 Comm: swapper/7 Tainted: P O 3.14.19 #1 Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.0a 09/28/10 task: ffff88042953f710 ti: ffff880429576000 task.ti: ffff880429576000 RIP: 0010:[<ffffffffa02452d8>] [<ffffffffa02452d8>] vdev_disk_time_out_handler+0x8/0x60 [zfs] RSP: 0018:ffff88043fdc3e28 EFLAGS: 00010096 RAX: 0000000000000000 RBX: ffff8803f688cc00 RCX: ffff880427fa9f70 RDX: ffff880427fa9f70 RSI: ffff88043fdc3e58 RDI: ffff8803f688cc00 RBP: ffff880427fa9a80 R08: 0000000000000000 R09: 0000000000000100 R10: ffffffff814b0365 R11: 0000000000000010 R12: ffff880427fa9e20 R13: ffff880427fa9f70 R14: ffff8804295f1428 R15: ffff88043fdc3e64 FS: 0000000000000000(0000) GS:ffff88043fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000050 CR3: 0000000001619000 CR4: 00000000000007e0 Stack: 0000000000000007 ffffffff81234cd3 ffff88043fdd1040 ffff880427fa9a80 0000000000000292 ffffffff81234ece 0000000000000000 000000002f00ab10 ffff88043fdc3e68 0000000000000100 ffffffff81234e60 00000000000006b0 Call Trace: <IRQ> [<ffffffff81234cd3>] ? blk_rq_timed_out+0x23/0xe0 [<ffffffff81234ece>] ? blk_rq_timed_out_timer+0x6e/0xd0 [<ffffffff81234e60>] ? blk_rq_check_expired+0x80/0x80 [<ffffffff81062644>] ? call_timer_fn+0x24/0x90 [<ffffffff81062881>] ? run_timer_softirq+0x1d1/0x1e0 [<ffffffff8105cb9d>] ? __do_softirq+0xed/0x210 [<ffffffff8105cdbd>] ? irq_exit+0xbd/0xd0 [<ffffffff8102e22b>] ? smp_apic_timer_interrupt+0x3b/0x50 [<ffffffff814a104a>] ? apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff813b2ec8>] ? cpuidle_enter_state+0x48/0xd0 [<ffffffff813b2ec4>] ? cpuidle_enter_state+0x44/0xd0 [<ffffffff813b2fff>] ? cpuidle_idle_call+0xaf/0x130 [<ffffffff8100bec9>] ? arch_cpu_idle+0x9/0x30 [<ffffffff81098492>] ? cpu_idle_loop+0x92/0x170 Code: 89 ee 4c 89 e7 e8 89 f8 ff ff 48 83 c4 08 4c 89 ef 5b 5d 41 5c 41 5d e9 f7 83 25 e1 90 90 90 90 90 90 90 48 83 ec 08 48 8b 47 70 <48> 8b 40 50 48 8b 50 28 48 8b 92 10 02 00 00 48 8b 52 60 c7 40 RIP [<ffffffffa02452d8>] vdev_disk_time_out_handler+0x8/0x60 [zfs] RSP <ffff88043fdc3e28> CR2: 0000000000000050 ---[ end trace 2c2b459c672893c1 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) My tree: https://github.com/DeHackEd/zfs/commits/dehacked Events leading up to the crash: # zpool status pool: testsan state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: scrub repaired 0 in 0h16m with 0 errors on Fri Aug 1 14:21:16 2014 config: NAME STATE READ WRITE CKSUM testsan DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafb0a2e1708 ONLINE 0 0 0 scsi-1IET_00010001 UNAVAIL 0 0 0 mirror-1 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafb0a346114 ONLINE 0 0 0 scsi-1IET_00010002 UNAVAIL 0 0 0 mirror-2 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafc0a3b56b3 ONLINE 0 0 0 scsi-1IET_00010003 UNAVAIL 0 0 0 mirror-3 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafc0a429e92 ONLINE 0 0 0 scsi-1IET_00010004 UNAVAIL 0 0 0 mirror-4 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafd0a4a8100 ONLINE 0 0 0 scsi-1IET_00010005 UNAVAIL 0 0 0 mirror-5 DEGRADED 0 0 0 scsi-36003048000848ed01b17aafd0a530c4c ONLINE 0 0 0 scsi-1IET_00010006 UNAVAIL 0 0 0 errors: No known data errors # zpool online testsan scsi-1IET_00010001 # zpool online testsan scsi-1IET_00010002 [crash] |
I added some # zpool import testsan VERIFY(req->bio != NULL) failed SPLError: 0:0:(vdev_disk.c:115:vdev_disk_time_out_handler()) SPL PANIC Kernel panic - not syncing: SPL PANIC in interrupt. CPU: 7 PID: 0 Comm: swapper/7 Tainted: P O 3.14.19 #1 Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.0a 09/28/10 ffffffffa05fa370 ffffffffa060ee78 ffffffff8149b943 ffffffffa0147c32 ffffffff8149b6f0 0000000000000008 ffff88043fdc3c68 ffff88043fdc3c08 0000000000000073 ffffffffa05fa370 0000000000000082 0000000000000007 Call Trace: <IRQ> [<ffffffff8149b943>] ? dump_stack+0x41/0x56 [<ffffffff8149b6f0>] ? panic+0xbc/0x1c4 [<ffffffffa0133f55>] ? spl_debug_bug+0xb5/0xd0 [spl] [<ffffffffa0140481>] ? spl_PANIC+0xb1/0xe0 [spl] [<ffffffff8108bf51>] ? enqueue_entity+0x1c1/0x8c0 [<ffffffff81088624>] ? update_rq_runnable_avg+0xd4/0x1d0 [<ffffffff8108c993>] ? enqueue_task_fair+0x343/0x450 [<ffffffff8106dedc>] ? wq_worker_waking_up+0xc/0x60 [<ffffffff8107d7f5>] ? check_preempt_curr+0x65/0x90 [<ffffffff8107d831>] ? ttwu_do_wakeup+0x11/0x80 [<ffffffffa05a14e6>] ? vdev_disk_time_out_handler+0xf6/0x130 [zfs] [<ffffffff81234cd3>] ? blk_rq_timed_out+0x23/0xe0 [<ffffffff81234ece>] ? blk_rq_timed_out_timer+0x6e/0xd0 [<ffffffff81234e60>] ? blk_rq_check_expired+0x80/0x80 [<ffffffff81062644>] ? call_timer_fn+0x24/0x90 [<ffffffff81062881>] ? run_timer_softirq+0x1d1/0x1e0 [<ffffffff8105cb9d>] ? __do_softirq+0xed/0x210 [<ffffffff8105cdbd>] ? irq_exit+0xbd/0xd0 [<ffffffff8102e22b>] ? smp_apic_timer_interrupt+0x3b/0x50 [<ffffffff814a104a>] ? apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff813b2ec8>] ? cpuidle_enter_state+0x48/0xd0 [<ffffffff813b2ec4>] ? cpuidle_enter_state+0x44/0xd0 [<ffffffff813b2fff>] ? cpuidle_idle_call+0xaf/0x130 [<ffffffff8100bec9>] ? arch_cpu_idle+0x9/0x30 [<ffffffff81098492>] ? cpu_idle_loop+0x92/0x170 [<ffffffff81098360>] ? SyS_syslog+0x20/0x20 Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) |
Yes, that makes sense. It's entirely possible for requests not to have bios associated with them. The patch will need to be updated to handle this case. |
Do you know how that could be done? I'm kinda pressed to get this feature added, even if it's in a private repository. There doesn't seem to be an obvious workaround since I don't have access to the dio_request_t associated. Would making a hard time limit of 10 seconds an eliminating the "tick" system work? I'll be playing with this tomorrow but any insight would be appreciated before I start. |
@DeHackEd my suggestion would be to move this state tracking in to the private |
@dweeezil it occurs to me that if you're chasing unexplained disk timeouts this patch may be of use to you. It needs a little bit of work still but it does allow quickly timing out otherwise hung IO. |
@behlendorf Thanks for the heads-up. I had forgotten about this. |
I had as well. Thanks for commenting here. |
While hijacking the block devices timeout callback to notify ZFS of a problem looks desirable on the surface in practice implementing this is a safe, generic, reliable manor has proven to be problematic. The reality is we have minimal control over the SCSI layer error handling. If that changes we can revisit doing something better. In practice, production deployments should be using TLER drives anyway so this is less of an issue. |
The low level formatting of magnetic drives contains ECC that permits
error correction. When a drive reads a bad sector, it will keep trying
to read it is reached in the hope that a different read attempt will
allow the ECC to reconstruct the bad data until an internal time out
occurs. This is useful on filesystems that cannot recover from drive
errors. However, it is unnecessary as ZFS can easily self heal when
errors occur.
We introduce a tunable called zfs_vdev_disk_timeout_ms that allows us to
tell the block layer to send the errors to ZFS earlier than normal. We
also introduce zfs_vdev_disk_timeout_ticks, which is the maximum number
of consecutive timeouts before a disk is marked as being bad.
Signed-off-by: Richard Yao ryao@gentoo.org