Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the return of "Unaligned write command" errors #10094

Open
dd1dd1 opened this issue Mar 2, 2020 · 113 comments
Open

the return of "Unaligned write command" errors #10094

dd1dd1 opened this issue Mar 2, 2020 · 113 comments

Comments

@dd1dd1
Copy link

dd1dd1 commented Mar 2, 2020

Reporting an unusual situation. Have ZFS mirror array across two 1TB SSDs. It regularly spews "Unaligned write command" errors. From reading reports here and elsewhere, this problem used to exist, was fixed years ago, not supposed to happen today. So, a puzzle.

It turns out that the two SSDs report different physical sector size, one reports 512 bytes, one reports 4096 bytes. Same vendor, same model, same firmware. (WTH?!?)

zpool reports default ashift 0 (autodetect_.
zdb reports ashift 12 (correct for 4096 sectors)

So everything seems to be correct, but the errors are there.

The "unaligned write command" errors only some from the "4096 bytes" SSD. After these write errors, "zpool scrub" runs without errors ("repaired 0B"). Two other zfs mirror pools on the same machine run without errors (2x10TB and 2x6TB disks, all report 4096 physical sector size).

K.O.

@dd1dd1
Copy link
Author

dd1dd1 commented Mar 2, 2020

Some details. OS is CentOS-7, kernel 3.10.0-1062.12.1.el7.x86_64, zfs-0.8.3-1. Problem disks:

# ./smart-status.perl 
      Disk                  model               serial     temperature  realloc  pending   uncorr  CRC err     RRER
  /dev/sda WDC  WDS100T2B0A-00SM50         195004A00B9C              26        .        ?        ?        .        ?
  /dev/sdg WDC  WDS100T2B0A-00SM50         195008A008F8              26        .        ?        ?        .        ?
# more /sys/class/block/sda/queue/physical_block_size 
512
# more /sys/class/block/sdg/queue/physical_block_size 
4096
# zpool status zssd1tb
  pool: zssd1tb
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:05:55 with 0 errors on Mon Mar  2 13:24:15 2020
config:

        NAME                                         STATE     READ WRITE CKSUM
        zssd1tb                                      ONLINE       0     0     0
          mirror-0                                   ONLINE       0     0     0
            ata-WDC_WDS100T2B0A-00SM50_195004A00B9C  ONLINE       0     0     0
            ata-WDC_WDS100T2B0A-00SM50_195008A008F8  ONLINE       0     0     0

errors: No known data errors

Typical error from "dmesg"

[1974990.399004] ata8.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
[1974990.399009] ata8.00: failed command: WRITE FPDMA QUEUED
[1974990.399013] ata8.00: cmd 61/08:d8:e0:27:70/00:00:74:00:00/40 tag 27 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[1974990.399015] ata8.00: status: { DRDY }
[1974990.399018] ata8: hard resetting link
[1974990.707014] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[1974990.710637] ata8.00: configured for UDMA/133
[1974990.710690] sd 7:0:0:0: [sdg] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[1974990.710693] sd 7:0:0:0: [sdg] tag#27 Sense Key : Illegal Request [current] [descriptor] 
[1974990.710695] sd 7:0:0:0: [sdg] tag#27 Add. Sense: Unaligned write command
[1974990.710698] sd 7:0:0:0: [sdg] tag#27 CDB: Write(10) 2a 00 74 70 27 e0 00 00 08 00
[1974990.710699] blk_update_request: I/O error, dev sdg, sector 1953507296
[1974990.710703] zio pool=zssd1tb vdev=/dev/disk/by-id/ata-WDC_WDS100T2B0A-00SM50_195008A008F8-part1 error=5 type=2 offset=1000194686976 size=4096 flags=180ac0
[1974990.710708] ata8: EH complete

@shodanshok
Copy link
Contributor

Such errors almost always denote a problem due to a failing SSD/HDD. The system even tried to reset the SATA link, with the write still failing.

@dd1dd1
Copy link
Author

dd1dd1 commented Mar 6, 2020

shodashok - failing hdd: no (this is an ssd), failing ssd: unlikely: (a) ssd is brand new, (b) failing devices usually throw read/write i/o errors, not "illegal request - unaligned write command", (c) bum ssd firmware (hello, Intel!) - unlikely, both ssds have the same firmware, only one throws the errors. (d) overheated ssd malfunction: no, ssd is at room temperature, cool to the touch. K.O.

@dd1dd1
Copy link
Author

dd1dd1 commented Mar 6, 2020

additional information: moved the two problem ssds to different sata ports, now both report physical sector size 512. (one was reporting 4096). Now let's see if the errors go away. K.O.

@shodanshok
Copy link
Contributor

@dd1dd1 I had an identical problem on a Crucial MX500 SSD. After some days, the SSD controller failed catastrophically (with the SSD no more discovered on BIOS screen). So, I really doubt it is a ZFS problem. The fact that simply swapping SATA port changed the reported sector size is really suspicious.

@c0d3z3r0
Copy link
Contributor

I'm experiencing the same issues. These errors appeared out of nothing. I highly doubt that all 4 disks die at exact the same time.

[  122.404043] ata6.00: exception Emask 0x10 SAct 0x20080 SErr 0x4050000 action 0xe frozen
[  122.412047] ata6.00: irq_stat 0x00000040, connection status changed
[  122.418332] ata6: SError: { PHYRdyChg CommWake DevExch }
[  122.423651] ata6.00: failed command: READ FPDMA QUEUED
[  122.428792] ata6.00: cmd 60/08:38:d8:99:76/00:00:31:00:00/40 tag 7 ncq dma 4096 in
                        res 50/00:18:30:90:6b/00:00:31:00:00/40 Emask 0x10 (ATA bus error)
[  122.444332] ata6.00: status: { DRDY }
[  122.447995] ata6.00: failed command: READ FPDMA QUEUED
[  122.453136] ata6.00: cmd 60/18:88:30:90:6b/00:00:31:00:00/40 tag 17 ncq dma 12288 in
                        res 50/00:18:30:90:6b/00:00:31:00:00/40 Emask 0x10 (ATA bus error)
[  122.468849] ata6.00: status: { DRDY }
[  122.472515] ata6: hard resetting link
[  123.184748] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  123.216709] ata6.00: configured for UDMA/133
[  123.218739] sd 5:0:0:0: [sdd] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  123.218741] sd 5:0:0:0: [sdd] tag#7 Sense Key : Illegal Request [current] 
[  123.218742] sd 5:0:0:0: [sdd] tag#7 Add. Sense: Unaligned write command
[  123.218744] sd 5:0:0:0: [sdd] tag#7 CDB: Read(10) 28 00 31 76 99 d8 00 00 08 00
[  123.218745] print_req_error: I/O error, dev sdd, sector 829856216 flags 700
[  123.225719] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=424882188288 size=4096 flags=1808a0
[  123.225726] sd 5:0:0:0: [sdd] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  123.225728] sd 5:0:0:0: [sdd] tag#17 Sense Key : Illegal Request [current] 
[  123.225729] sd 5:0:0:0: [sdd] tag#17 Add. Sense: Unaligned write command
[  123.225730] sd 5:0:0:0: [sdd] tag#17 CDB: Read(10) 28 00 31 6b 90 30 00 00 18 00
[  123.225731] print_req_error: I/O error, dev sdd, sector 829132848 flags 700
[  123.232717] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=424511823872 size=12288 flags=1808a0
[  123.232719] ata6: EH complete
[  156.953173] ata6.00: exception Emask 0x0 SAct 0x3 SErr 0x40000 action 0x6 frozen
[  156.960574] ata6: SError: { CommWake }
[  156.964329] ata6.00: failed command: READ FPDMA QUEUED
[  156.969472] ata6.00: cmd 60/20:00:98:08:03/00:00:06:00:00/40 tag 0 ncq dma 16384 in
                        res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  156.984493] ata6.00: status: { DRDY }
[  156.988158] ata6.00: failed command: READ FPDMA QUEUED
[  156.993299] ata6.00: cmd 60/40:08:b8:48:03/00:00:06:00:00/40 tag 1 ncq dma 32768 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  157.008319] ata6.00: status: { DRDY }
[  157.011984] ata6: hard resetting link
[  157.327868] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  157.901907] ata6.00: configured for UDMA/133
[  157.902071] sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  157.902073] sd 5:0:0:0: [sdd] tag#0 Sense Key : Illegal Request [current] 
[  157.902086] sd 5:0:0:0: [sdd] tag#0 Add. Sense: Unaligned write command
[  157.902087] sd 5:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 06 03 08 98 00 00 20 00
[  157.902089] print_req_error: I/O error, dev sdd, sector 100862104 flags 700
[  157.909063] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=51637202944 size=16384 flags=1808a0
[  157.909086] sd 5:0:0:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  157.909087] sd 5:0:0:0: [sdd] tag#1 Sense Key : Illegal Request [current] 
[  157.909087] sd 5:0:0:0: [sdd] tag#1 Add. Sense: Unaligned write command
[  157.909088] sd 5:0:0:0: [sdd] tag#1 CDB: Read(10) 28 00 06 03 48 b8 00 00 40 00
[  157.909089] print_req_error: I/O error, dev sdd, sector 100878520 flags 700
[  157.916051] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=51645607936 size=32768 flags=40080ca0
[  157.916053] ata6: EH complete
[  158.945000] ata6.00: exception Emask 0x10 SAct 0x40000001 SErr 0x4040000 action 0xe frozen
[  158.953264] ata6.00: irq_stat 0x00000040, connection status changed
[  158.959531] ata6: SError: { CommWake DevExch }
[  158.963983] ata6.00: failed command: READ FPDMA QUEUED
[  158.969123] ata6.00: cmd 60/18:00:a8:48:00/00:00:16:00:00/40 tag 0 ncq dma 12288 in
                        res 50/00:18:a8:48:00/00:00:16:00:00/40 Emask 0x10 (ATA bus error)
[  158.984749] ata6.00: status: { DRDY }
[  158.988416] ata6.00: failed command: READ FPDMA QUEUED
[  158.993557] ata6.00: cmd 60/18:f0:68:41:00/00:00:16:00:00/40 tag 30 ncq dma 12288 in
                        res 50/00:18:a8:48:00/00:00:16:00:00/40 Emask 0x10 (ATA bus error)
[  159.009267] ata6.00: status: { DRDY }
[  159.012935] ata6: hard resetting link
[  159.729269] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  159.745053] ata6.00: configured for UDMA/133
[  159.746879] sd 5:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  159.746880] sd 5:0:0:0: [sdd] tag#0 Sense Key : Illegal Request [current] 
[  159.746881] sd 5:0:0:0: [sdd] tag#0 Add. Sense: Unaligned write command
[  159.746882] sd 5:0:0:0: [sdd] tag#0 CDB: Read(10) 28 00 16 00 48 a8 00 00 18 00
[  159.746884] print_req_error: I/O error, dev sdd, sector 369117352 flags 700
[  159.753859] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=188983889920 size=12288 flags=1808a0
[  159.753882] sd 5:0:0:0: [sdd] tag#30 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  159.753882] sd 5:0:0:0: [sdd] tag#30 Sense Key : Illegal Request [current] 
[  159.753883] sd 5:0:0:0: [sdd] tag#30 Add. Sense: Unaligned write command
[  159.753884] sd 5:0:0:0: [sdd] tag#30 CDB: Read(10) 28 00 16 00 41 68 00 00 18 00
[  159.753885] print_req_error: I/O error, dev sdd, sector 369115496 flags 700
[  159.760845] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=188982939648 size=12288 flags=1808a0
[  159.760848] ata6: EH complete
[  163.273140] ata8.00: exception Emask 0x10 SAct 0x8080 SErr 0x4050000 action 0xe frozen
[  163.281060] ata8.00: irq_stat 0x00000040, connection status changed
[  163.287342] ata8: SError: { PHYRdyChg CommWake DevExch }
[  163.292670] ata8.00: failed command: READ FPDMA QUEUED
[  163.297813] ata8.00: cmd 60/18:38:d8:a9:5c/00:00:61:00:00/40 tag 7 ncq dma 12288 in
                        res 50/00:18:d8:a9:5c/00:00:61:00:00/40 Emask 0x10 (ATA bus error)
[  163.313455] ata8.00: status: { DRDY }
[  163.317141] ata8.00: failed command: READ FPDMA QUEUED
[  163.322284] ata8.00: cmd 60/18:78:30:59:76/00:00:61:00:00/40 tag 15 ncq dma 12288 in
                        res 50/00:18:d8:a9:5c/00:00:61:00:00/40 Emask 0x10 (ATA bus error)
[  163.338001] ata8.00: status: { DRDY }
[  163.341683] ata8: hard resetting link
[  164.053460] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  164.081111] ata8.00: configured for UDMA/133
[  164.081341] sd 7:0:0:0: [sdf] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  164.081343] sd 7:0:0:0: [sdf] tag#7 Sense Key : Illegal Request [current] 
[  164.081344] sd 7:0:0:0: [sdf] tag#7 Add. Sense: Unaligned write command
[  164.081346] sd 7:0:0:0: [sdf] tag#7 CDB: Read(10) 28 00 61 5c a9 d8 00 00 18 00
[  164.081347] print_req_error: I/O error, dev sdf, sector 1633462744 flags 700
[  164.088405] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=836328730624 size=12288 flags=1808a0
[  164.088425] sd 7:0:0:0: [sdf] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  164.088426] sd 7:0:0:0: [sdf] tag#15 Sense Key : Illegal Request [current] 
[  164.088427] sd 7:0:0:0: [sdf] tag#15 Add. Sense: Unaligned write command
[  164.088428] sd 7:0:0:0: [sdf] tag#15 CDB: Read(10) 28 00 61 76 59 30 00 00 18 00
[  164.088428] print_req_error: I/O error, dev sdf, sector 1635146032 flags 700
[  164.095480] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=837190574080 size=12288 flags=1808a0
[  164.095482] ata8: EH complete
[  167.065416] ata7.00: exception Emask 0x10 SAct 0xc00000 SErr 0x4040000 action 0xe frozen
[  167.073513] ata7.00: irq_stat 0x00000040, connection status changed
[  167.079788] ata7: SError: { CommWake DevExch }
[  167.084249] ata7.00: failed command: READ FPDMA QUEUED
[  167.089389] ata7.00: cmd 60/60:b0:d8:d8:0b/00:00:59:00:00/40 tag 22 ncq dma 49152 in
                        res 50/00:60:60:b7:0c/00:00:59:00:00/40 Emask 0x10 (ATA bus error)
[  167.105099] ata7.00: status: { DRDY }
[  167.108764] ata7.00: failed command: READ FPDMA QUEUED
[  167.113907] ata7.00: cmd 60/60:b8:60:b7:0c/00:00:59:00:00/40 tag 23 ncq dma 49152 in
                        res 50/00:60:60:b7:0c/00:00:59:00:00/40 Emask 0x10 (ATA bus error)
[  167.129621] ata7.00: status: { DRDY }
[  167.133285] ata7: hard resetting link
[  167.845466] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  167.855367] ata7.00: configured for UDMA/133
[  167.855548] sd 6:0:0:0: [sde] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  167.855562] sd 6:0:0:0: [sde] tag#22 Sense Key : Illegal Request [current] 
[  167.855563] sd 6:0:0:0: [sde] tag#22 Add. Sense: Unaligned write command
[  167.855565] sd 6:0:0:0: [sde] tag#22 CDB: Read(10) 28 00 59 0b d8 d8 00 00 60 00
[  167.855566] print_req_error: I/O error, dev sde, sector 1493948632 flags 700
[  167.862622] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=764897505280 size=49152 flags=40080ca0
[  167.862628] sd 6:0:0:0: [sde] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  167.862629] sd 6:0:0:0: [sde] tag#23 Sense Key : Illegal Request [current] 
[  167.862645] sd 6:0:0:0: [sde] tag#23 Add. Sense: Unaligned write command
[  167.862645] sd 6:0:0:0: [sde] tag#23 CDB: Read(10) 28 00 59 0c b7 60 00 00 60 00
[  167.862646] print_req_error: I/O error, dev sde, sector 1494005600 flags 700
[  167.869712] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=764926672896 size=49152 flags=40080ca0
[  167.869715] ata7: EH complete
[  195.866293] ata6: limiting SATA link speed to 3.0 Gbps
[  195.866295] ata6.00: exception Emask 0x0 SAct 0x102 SErr 0x50000 action 0x6 frozen
[  195.873874] ata6: SError: { PHYRdyChg CommWake }
[  195.878512] ata6.00: failed command: READ FPDMA QUEUED
[  195.883653] ata6.00: cmd 60/18:08:f0:59:50/00:00:51:00:00/40 tag 1 ncq dma 12288 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  195.898670] ata6.00: status: { DRDY }
[  195.902337] ata6.00: failed command: READ FPDMA QUEUED
[  195.907480] ata6.00: cmd 60/38:40:40:52:14/00:00:51:00:00/40 tag 8 ncq dma 28672 in
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[  195.922496] ata6.00: status: { DRDY }
[  195.926165] ata6: hard resetting link
[  196.240878] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  196.802601] ata6.00: configured for UDMA/133
[  196.804425] ata6.00: device reported invalid CHS sector 0
[  196.804443] sd 5:0:0:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  196.804444] sd 5:0:0:0: [sdd] tag#1 Sense Key : Illegal Request [current] 
[  196.804445] sd 5:0:0:0: [sdd] tag#1 Add. Sense: Unaligned write command
[  196.804447] sd 5:0:0:0: [sdd] tag#1 CDB: Read(10) 28 00 51 50 59 f0 00 00 18 00
[  196.804448] print_req_error: I/O error, dev sdd, sector 1364220400 flags 700
[  196.811513] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=698476650496 size=12288 flags=1808a0
[  196.811536] sd 5:0:0:0: [sdd] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  196.811537] sd 5:0:0:0: [sdd] tag#8 Sense Key : Illegal Request [current] 
[  196.811537] sd 5:0:0:0: [sdd] tag#8 Add. Sense: Unaligned write command
[  196.811538] sd 5:0:0:0: [sdd] tag#8 CDB: Read(10) 28 00 51 14 52 40 00 00 38 00
[  196.811539] print_req_error: I/O error, dev sdd, sector 1360286272 flags 700
[  196.818588] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=696462376960 size=28672 flags=40080ca0
[  196.818590] ata6: EH complete
[  200.690450] ata6.00: exception Emask 0x10 SAct 0x100008 SErr 0x4050000 action 0xe frozen
[  200.698545] ata6.00: irq_stat 0x00000040, connection status changed
[  200.704828] ata6: SError: { PHYRdyChg CommWake DevExch }
[  200.710156] ata6.00: failed command: READ FPDMA QUEUED
[  200.715298] ata6.00: cmd 60/18:18:28:70:4e/00:00:59:00:00/40 tag 3 ncq dma 12288 in
                        res 50/00:18:28:70:4e/00:00:59:00:00/40 Emask 0x10 (ATA bus error)
[  200.730928] ata6.00: status: { DRDY }
[  200.734595] ata6.00: failed command: READ FPDMA QUEUED
[  200.739734] ata6.00: cmd 60/18:a0:b8:38:38/00:00:59:00:00/40 tag 20 ncq dma 12288 in
                        res 50/00:18:28:70:4e/00:00:59:00:00/40 Emask 0x10 (ATA bus error)
[  200.755449] ata6.00: status: { DRDY }
[  200.759115] ata6: hard resetting link
[  201.478431] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  201.550589] ata6.00: configured for UDMA/133
[  201.550772] sd 5:0:0:0: [sdd] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  201.550774] sd 5:0:0:0: [sdd] tag#3 Sense Key : Illegal Request [current] 
[  201.550776] sd 5:0:0:0: [sdd] tag#3 Add. Sense: Unaligned write command
[  201.550778] sd 5:0:0:0: [sdd] tag#3 CDB: Read(10) 28 00 59 4e 70 28 00 00 18 00
[  201.550779] print_req_error: I/O error, dev sdd, sector 1498312744 flags 700
[  201.557839] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=767131930624 size=12288 flags=1808a0
[  201.557845] sd 5:0:0:0: [sdd] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  201.557846] sd 5:0:0:0: [sdd] tag#20 Sense Key : Illegal Request [current] 
[  201.557847] sd 5:0:0:0: [sdd] tag#20 Add. Sense: Unaligned write command
[  201.557848] sd 5:0:0:0: [sdd] tag#20 CDB: Read(10) 28 00 59 38 38 b8 00 00 18 00
[  201.557848] print_req_error: I/O error, dev sdd, sector 1496856760 flags 700
[  201.564911] zio pool=dpool vdev=/dev/mapper/slot4crypt error=5 type=1 offset=766386466816 size=12288 flags=1808a0
[  201.564914] ata6: EH complete
[  204.694296] ata8.00: exception Emask 0x10 SAct 0x10040 SErr 0x4050000 action 0xe frozen
[  204.702312] ata8.00: irq_stat 0x00000040, connection status changed
[  204.708597] ata8: SError: { PHYRdyChg CommWake DevExch }
[  204.713929] ata8.00: failed command: READ FPDMA QUEUED
[  204.719077] ata8.00: cmd 60/08:30:c0:59:9f/00:00:12:00:00/40 tag 6 ncq dma 4096 in
                        res 50/00:58:08:59:9f/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
[  204.734625] ata8.00: status: { DRDY }
[  204.738288] ata8.00: failed command: READ FPDMA QUEUED
[  204.743445] ata8.00: cmd 60/58:80:08:59:9f/00:00:12:00:00/40 tag 16 ncq dma 45056 in
                        res 50/00:58:08:59:9f/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
[  204.759159] ata8.00: status: { DRDY }
[  204.762828] ata8: hard resetting link
[  205.474571] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  205.512273] ata8.00: configured for UDMA/133
[  205.512576] sd 7:0:0:0: [sdf] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  205.512577] sd 7:0:0:0: [sdf] tag#6 Sense Key : Illegal Request [current] 
[  205.512578] sd 7:0:0:0: [sdf] tag#6 Add. Sense: Unaligned write command
[  205.512580] sd 7:0:0:0: [sdf] tag#6 CDB: Read(10) 28 00 12 9f 59 c0 00 00 08 00
[  205.512581] print_req_error: I/O error, dev sdf, sector 312433088 flags 700
[  205.519550] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=159961546752 size=4096 flags=1808a0
[  205.519574] sd 7:0:0:0: [sdf] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  205.519575] sd 7:0:0:0: [sdf] tag#16 Sense Key : Illegal Request [current] 
[  205.519575] sd 7:0:0:0: [sdf] tag#16 Add. Sense: Unaligned write command
[  205.519576] sd 7:0:0:0: [sdf] tag#16 CDB: Read(10) 28 00 12 9f 59 08 00 00 58 00
[  205.519577] print_req_error: I/O error, dev sdf, sector 312432904 flags 700
[  205.526538] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=159961452544 size=45056 flags=40080ca0
[  205.526541] ata8: EH complete
[  205.638343] ata7.00: exception Emask 0x10 SAct 0x8080 SErr 0x4040000 action 0xe frozen
[  205.646267] ata7.00: irq_stat 0x00000040, connection status changed
[  205.652539] ata7: SError: { CommWake DevExch }
[  205.656991] ata7.00: failed command: READ FPDMA QUEUED
[  205.662150] ata7.00: cmd 60/20:38:e8:b7:00/00:00:12:00:00/40 tag 7 ncq dma 16384 in
                        res 50/00:20:70:82:00/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
[  205.677775] ata7.00: status: { DRDY }
[  205.681441] ata7.00: failed command: READ FPDMA QUEUED
[  205.686594] ata7.00: cmd 60/20:78:70:82:00/00:00:12:00:00/40 tag 15 ncq dma 16384 in
                        res 50/00:20:70:82:00/00:00:12:00:00/40 Emask 0x10 (ATA bus error)
[  205.702310] ata7.00: status: { DRDY }
[  205.705993] ata7: hard resetting link
[  206.418610] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  206.432832] ata7.00: configured for UDMA/133
[  206.433009] sd 6:0:0:0: [sde] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  206.433010] sd 6:0:0:0: [sde] tag#7 Sense Key : Illegal Request [current] 
[  206.433011] sd 6:0:0:0: [sde] tag#7 Add. Sense: Unaligned write command
[  206.433012] sd 6:0:0:0: [sde] tag#7 CDB: Read(10) 28 00 12 00 b7 e8 00 00 20 00
[  206.433013] print_req_error: I/O error, dev sde, sector 302036968 flags 700
[  206.439985] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=154638733312 size=16384 flags=1808a0
[  206.440007] sd 6:0:0:0: [sde] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  206.440008] sd 6:0:0:0: [sde] tag#15 Sense Key : Illegal Request [current] 
[  206.440009] sd 6:0:0:0: [sde] tag#15 Add. Sense: Unaligned write command
[  206.440010] sd 6:0:0:0: [sde] tag#15 CDB: Read(10) 28 00 12 00 82 70 00 00 20 00
[  206.440010] print_req_error: I/O error, dev sde, sector 302023280 flags 700
[  206.446970] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=154631725056 size=16384 flags=1808a0
[  206.446973] ata7: EH complete
[  211.422463] ata8.00: exception Emask 0x10 SAct 0x8000000 SErr 0x4050000 action 0xe frozen
[  211.430641] ata8.00: irq_stat 0x00000040, connection status changed
[  211.436912] ata8: SError: { PHYRdyChg CommWake DevExch }
[  211.442230] ata8.00: failed command: READ FPDMA QUEUED
[  211.447369] ata8.00: cmd 60/08:d8:d0:04:0f/02:00:00:00:00/40 tag 27 ncq dma 266240 in
                        res 50/00:50:78:04:0f/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[  211.463168] ata8.00: status: { DRDY }
[  211.466835] ata8: hard resetting link
[  212.178766] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  212.194763] ata8.00: configured for UDMA/133
[  212.195086] sd 7:0:0:0: [sdf] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  212.195088] sd 7:0:0:0: [sdf] tag#27 Sense Key : Illegal Request [current] 
[  212.195090] sd 7:0:0:0: [sdf] tag#27 Add. Sense: Unaligned write command
[  212.195091] sd 7:0:0:0: [sdf] tag#27 CDB: Read(10) 28 00 00 0f 04 d0 00 02 08 00
[  212.195092] print_req_error: I/O error, dev sdf, sector 984272 flags 700
[  212.201805] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=499752960 size=266240 flags=40080cb0
[  212.201813] ata8: EH complete
[  216.434765] ata8: limiting SATA link speed to 3.0 Gbps
[  216.434768] ata8.00: exception Emask 0x10 SAct 0x80 SErr 0x4050000 action 0xe frozen
[  216.442533] ata8.00: irq_stat 0x00000040, connection status changed
[  216.448803] ata8: SError: { PHYRdyChg CommWake DevExch }
[  216.454117] ata8.00: failed command: READ FPDMA QUEUED
[  216.459257] ata8.00: cmd 60/10:38:70:64:0d/02:00:01:00:00/40 tag 7 ncq dma 270336 in
                        res 50/00:58:18:64:0d/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
[  216.474968] ata8.00: status: { DRDY }
[  216.478648] ata8: hard resetting link
[  217.190915] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  217.202859] ata8.00: configured for UDMA/133
[  217.204664] sd 7:0:0:0: [sdf] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  217.204665] sd 7:0:0:0: [sdf] tag#7 Sense Key : Illegal Request [current] 
[  217.204666] sd 7:0:0:0: [sdf] tag#7 Add. Sense: Unaligned write command
[  217.204668] sd 7:0:0:0: [sdf] tag#7 CDB: Read(10) 28 00 01 0d 64 70 00 02 10 00
[  217.204669] print_req_error: I/O error, dev sdf, sector 17654896 flags 700
[  217.211553] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=9035112448 size=270336 flags=40080cb0
[  217.211575] ata8: EH complete
[  226.726877] ata8.00: exception Emask 0x10 SAct 0x200000 SErr 0x4050000 action 0xe frozen
[  226.734988] ata8.00: irq_stat 0x00000040, connection status changed
[  226.741254] ata8: SError: { PHYRdyChg CommWake DevExch }
[  226.746565] ata8.00: failed command: READ FPDMA QUEUED
[  226.751706] ata8.00: cmd 60/58:a8:a8:ee:2e/00:00:01:00:00/40 tag 21 ncq dma 45056 in
                        res 50/00:a8:00:ec:2e/00:02:01:00:00/40 Emask 0x10 (ATA bus error)
[  226.767420] ata8.00: status: { DRDY }
[  226.771086] ata8: hard resetting link
[  227.483183] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  227.499067] ata8.00: configured for UDMA/133
[  227.499206] sd 7:0:0:0: [sdf] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  227.499207] sd 7:0:0:0: [sdf] tag#21 Sense Key : Illegal Request [current] 
[  227.499208] sd 7:0:0:0: [sdf] tag#21 Add. Sense: Unaligned write command
[  227.499210] sd 7:0:0:0: [sdf] tag#21 CDB: Read(10) 28 00 01 2e ee a8 00 00 58 00
[  227.499211] print_req_error: I/O error, dev sdf, sector 19852968 flags 700
[  227.506098] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=10160525312 size=45056 flags=1808b0
[  227.506105] ata8: EH complete
[  230.718976] ata5.00: exception Emask 0x10 SAct 0x1000000 SErr 0x4050000 action 0xe frozen
[  230.727173] ata5.00: irq_stat 0x00000040, connection status changed
[  230.733451] ata5: SError: { PHYRdyChg CommWake DevExch }
[  230.738768] ata5.00: failed command: READ FPDMA QUEUED
[  230.743908] ata5.00: cmd 60/78:c0:c8:0b:3c/04:00:01:00:00/40 tag 24 ncq dma 585728 in
                        res 50/00:58:70:0b:3c/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
[  230.759713] ata5.00: status: { DRDY }
[  230.763382] ata5: hard resetting link
[  231.475281] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  231.489245] ata5.00: configured for UDMA/133
[  231.489393] sd 4:0:0:0: [sdc] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  231.489394] sd 4:0:0:0: [sdc] tag#24 Sense Key : Illegal Request [current] 
[  231.489395] sd 4:0:0:0: [sdc] tag#24 Add. Sense: Unaligned write command
[  231.489396] sd 4:0:0:0: [sdc] tag#24 CDB: Read(10) 28 00 01 3c 0b c8 00 04 78 00
[  231.489397] print_req_error: I/O error, dev sdc, sector 20712392 flags 700
[  231.496280] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=10600550400 size=585728 flags=40080cb0
[  231.496287] ata5: EH complete
[  251.575584] ata8.00: exception Emask 0x10 SAct 0x180 SErr 0x4050000 action 0xe frozen
[  251.583417] ata8.00: irq_stat 0x00000040, connection status changed
[  251.589690] ata8: SError: { PHYRdyChg CommWake DevExch }
[  251.595005] ata8.00: failed command: READ FPDMA QUEUED
[  251.600148] ata8.00: cmd 60/50:38:60:8a:96/00:00:01:00:00/40 tag 7 ncq dma 40960 in
                        res 50/00:08:50:88:96/00:02:01:00:00/40 Emask 0x10 (ATA bus error)
[  251.615773] ata8.00: status: { DRDY }
[  251.619440] ata8.00: failed command: READ FPDMA QUEUED
[  251.624580] ata8.00: cmd 60/50:40:b8:8a:96/00:00:01:00:00/40 tag 8 ncq dma 40960 in
                        res 50/00:08:50:88:96/00:02:01:00:00/40 Emask 0x10 (ATA bus error)
[  251.640206] ata8.00: status: { DRDY }
[  251.643872] ata8: hard resetting link
[  252.355771] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  252.368058] ata8.00: configured for UDMA/133
[  252.368195] sd 7:0:0:0: [sdf] tag#7 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  252.368196] sd 7:0:0:0: [sdf] tag#7 Sense Key : Illegal Request [current] 
[  252.368198] sd 7:0:0:0: [sdf] tag#7 Add. Sense: Unaligned write command
[  252.368199] sd 7:0:0:0: [sdf] tag#7 CDB: Read(10) 28 00 01 96 8a 60 00 00 50 00
[  252.368200] print_req_error: I/O error, dev sdf, sector 26643040 flags 700
[  252.375087] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=13637042176 size=40960 flags=1808b0
[  252.375109] sd 7:0:0:0: [sdf] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  252.375110] sd 7:0:0:0: [sdf] tag#8 Sense Key : Illegal Request [current] 
[  252.375111] sd 7:0:0:0: [sdf] tag#8 Add. Sense: Unaligned write command
[  252.375112] sd 7:0:0:0: [sdf] tag#8 CDB: Read(10) 28 00 01 96 8a b8 00 00 50 00
[  252.375113] print_req_error: I/O error, dev sdf, sector 26643128 flags 700
[  252.381989] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=13637087232 size=40960 flags=1808b0
[  252.381991] ata8: EH complete
[  258.843715] ata8.00: exception Emask 0x10 SAct 0x80010 SErr 0x4050000 action 0xe frozen
[  258.851730] ata8.00: irq_stat 0x00000040, connection status changed
[  258.858044] ata8: SError: { PHYRdyChg CommWake DevExch }
[  258.863366] ata8.00: failed command: READ FPDMA QUEUED
[  258.868506] ata8.00: cmd 60/10:20:48:0c:ac/02:00:01:00:00/40 tag 4 ncq dma 270336 in
                        res 50/00:58:f0:0b:ac/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
[  258.884237] ata8.00: status: { DRDY }
[  258.887900] ata8.00: failed command: READ FPDMA QUEUED
[  258.893043] ata8.00: cmd 60/58:98:58:0e:ac/00:00:01:00:00/40 tag 19 ncq dma 45056 in
                        res 50/00:58:f0:0b:ac/00:00:01:00:00/40 Emask 0x10 (ATA bus error)
[  258.908756] ata8.00: status: { DRDY }
[  258.912423] ata8: hard resetting link
[  259.623929] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  259.675597] ata8.00: configured for UDMA/133
[  259.675772] sd 7:0:0:0: [sdf] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  259.675774] sd 7:0:0:0: [sdf] tag#4 Sense Key : Illegal Request [current] 
[  259.675775] sd 7:0:0:0: [sdf] tag#4 Add. Sense: Unaligned write command
[  259.675776] sd 7:0:0:0: [sdf] tag#4 CDB: Read(10) 28 00 01 ac 0c 48 00 02 10 00
[  259.675777] print_req_error: I/O error, dev sdf, sector 28052552 flags 700
[  259.682665] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=14358712320 size=270336 flags=40080cb0
[  259.682688] sd 7:0:0:0: [sdf] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  259.682689] sd 7:0:0:0: [sdf] tag#19 Sense Key : Illegal Request [current] 
[  259.682690] sd 7:0:0:0: [sdf] tag#19 Add. Sense: Unaligned write command
[  259.682691] sd 7:0:0:0: [sdf] tag#19 CDB: Read(10) 28 00 01 ac 0e 58 00 00 58 00
[  259.682691] print_req_error: I/O error, dev sdf, sector 28053080 flags 700
[  259.689566] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=14358982656 size=45056 flags=1808b0
[  259.689569] ata8: EH complete
[  277.468146] ata8: limiting SATA link speed to 1.5 Gbps
[  277.468148] ata8.00: exception Emask 0x10 SAct 0x9000 SErr 0x4050000 action 0xe frozen
[  277.476093] ata8.00: irq_stat 0x00000040, connection status changed
[  277.482372] ata8: SError: { PHYRdyChg CommWake DevExch }
[  277.487691] ata8.00: failed command: READ FPDMA QUEUED
[  277.492835] ata8.00: cmd 60/58:60:00:cb:31/00:00:03:00:00/40 tag 12 ncq dma 45056 in
                        res 50/00:68:98:c8:31/00:02:03:00:00/40 Emask 0x10 (ATA bus error)
[  277.508555] ata8.00: status: { DRDY }
[  277.512229] ata8.00: failed command: READ FPDMA QUEUED
[  277.517366] ata8.00: cmd 60/58:78:58:cb:31/00:00:03:00:00/40 tag 15 ncq dma 45056 in
                        res 50/00:68:98:c8:31/00:02:03:00:00/40 Emask 0x10 (ATA bus error)
[  277.533080] ata8.00: status: { DRDY }
[  277.536747] ata8: hard resetting link
[  278.248328] ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[  278.272289] ata8.00: configured for UDMA/133
[  278.272459] sd 7:0:0:0: [sdf] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  278.272460] sd 7:0:0:0: [sdf] tag#12 Sense Key : Illegal Request [current] 
[  278.272461] sd 7:0:0:0: [sdf] tag#12 Add. Sense: Unaligned write command
[  278.272462] sd 7:0:0:0: [sdf] tag#12 CDB: Read(10) 28 00 03 31 cb 00 00 00 58 00
[  278.272463] print_req_error: I/O error, dev sdf, sector 53594880 flags 700
[  278.279348] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=27436384256 size=45056 flags=1808b0
[  278.279370] sd 7:0:0:0: [sdf] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  278.279371] sd 7:0:0:0: [sdf] tag#15 Sense Key : Illegal Request [current] 
[  278.279372] sd 7:0:0:0: [sdf] tag#15 Add. Sense: Unaligned write command
[  278.279373] sd 7:0:0:0: [sdf] tag#15 CDB: Read(10) 28 00 03 31 cb 58 00 00 58 00
[  278.279374] print_req_error: I/O error, dev sdf, sector 53594968 flags 700
[  278.286251] zio pool=dpool vdev=/dev/mapper/slot6crypt error=5 type=1 offset=27436429312 size=45056 flags=1808b0
[  278.286254] ata8: EH complete
[  363.007111] ata7.00: exception Emask 0x10 SAct 0x80000008 SErr 0x4050000 action 0xe frozen
[  363.015384] ata7.00: irq_stat 0x00000040, connection status changed
[  363.021655] ata7: SError: { PHYRdyChg CommWake DevExch }
[  363.026980] ata7.00: failed command: READ FPDMA QUEUED
[  363.032139] ata7.00: cmd 60/58:18:08:8a:03/00:00:05:00:00/40 tag 3 ncq dma 45056 in
                        res 50/00:58:58:89:03/00:00:05:00:00/40 Emask 0x10 (ATA bus error)
[  363.047771] ata7.00: status: { DRDY }
[  363.051436] ata7.00: failed command: READ FPDMA QUEUED
[  363.056577] ata7.00: cmd 60/58:f8:b0:89:03/00:00:05:00:00/40 tag 31 ncq dma 45056 in
                        res 50/00:58:58:89:03/00:00:05:00:00/40 Emask 0x10 (ATA bus error)
[  363.072289] ata7.00: status: { DRDY }
[  363.075955] ata7: hard resetting link
[  363.787376] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  363.801367] ata7.00: configured for UDMA/133
[  363.803215] sd 6:0:0:0: [sde] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  363.803217] sd 6:0:0:0: [sde] tag#3 Sense Key : Illegal Request [current] 
[  363.803218] sd 6:0:0:0: [sde] tag#3 Add. Sense: Unaligned write command
[  363.803219] sd 6:0:0:0: [sde] tag#3 CDB: Read(10) 28 00 05 03 8a 08 00 00 58 00
[  363.803221] print_req_error: I/O error, dev sde, sector 84118024 flags 700
[  363.810101] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=43064233984 size=45056 flags=1808b0
[  363.810124] sd 6:0:0:0: [sde] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  363.810125] sd 6:0:0:0: [sde] tag#31 Sense Key : Illegal Request [current] 
[  363.810126] sd 6:0:0:0: [sde] tag#31 Add. Sense: Unaligned write command
[  363.810127] sd 6:0:0:0: [sde] tag#31 CDB: Read(10) 28 00 05 03 89 b0 00 00 58 00
[  363.810128] print_req_error: I/O error, dev sde, sector 84117936 flags 700
[  363.817005] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=43064188928 size=45056 flags=1808b0
[  363.817008] ata7: EH complete
[  376.867685] ata5.00: exception Emask 0x10 SAct 0x10 SErr 0x4050000 action 0xe frozen
[  376.875448] ata5.00: irq_stat 0x00000040, connection status changed
[  376.881722] ata5: SError: { PHYRdyChg CommWake DevExch }
[  376.887060] ata5.00: failed command: READ FPDMA QUEUED
[  376.892203] ata5.00: cmd 60/20:20:18:26:37/04:00:05:00:00/40 tag 4 ncq dma 540672 in
                        res 50/00:58:c0:25:37/00:00:05:00:00/40 Emask 0x10 (ATA bus error)
[  376.907912] ata5.00: status: { DRDY }
[  376.911581] ata5: hard resetting link
[  377.623975] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  377.640004] ata5.00: configured for UDMA/133
[  377.640167] sd 4:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  377.640168] sd 4:0:0:0: [sdc] tag#4 Sense Key : Illegal Request [current] 
[  377.640182] sd 4:0:0:0: [sdc] tag#4 Add. Sense: Unaligned write command
[  377.640183] sd 4:0:0:0: [sdc] tag#4 CDB: Read(10) 28 00 05 37 26 18 00 04 20 00
[  377.640184] print_req_error: I/O error, dev sdc, sector 87500312 flags 700
[  377.647072] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=44795965440 size=540672 flags=40080cb0
[  377.647094] ata5: EH complete
[  385.600157] ata7: limiting SATA link speed to 3.0 Gbps
[  385.600159] ata7.00: exception Emask 0x10 SAct 0x600000 SErr 0x4050000 action 0xe frozen
[  385.608254] ata7.00: irq_stat 0x00000040, connection status changed
[  385.614532] ata7: SError: { PHYRdyChg CommWake DevExch }
[  385.619844] ata7.00: failed command: READ FPDMA QUEUED
[  385.624985] ata7.00: cmd 60/58:a8:d8:91:55/00:00:05:00:00/40 tag 21 ncq dma 45056 in
                        res 50/00:08:d0:90:55/00:01:05:00:00/40 Emask 0x10 (ATA bus error)
[  385.640696] ata7.00: status: { DRDY }
[  385.644362] ata7.00: failed command: READ FPDMA QUEUED
[  385.649505] ata7.00: cmd 60/58:b0:30:92:55/00:00:05:00:00/40 tag 22 ncq dma 45056 in
                        res 50/00:08:d0:90:55/00:01:05:00:00/40 Emask 0x10 (ATA bus error)
[  385.665216] ata7.00: status: { DRDY }
[  385.668884] ata7: hard resetting link
[  386.380318] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  386.400326] ata7.00: configured for UDMA/133
[  386.402134] sd 6:0:0:0: [sde] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  386.402135] sd 6:0:0:0: [sde] tag#21 Sense Key : Illegal Request [current] 
[  386.402136] sd 6:0:0:0: [sde] tag#21 Add. Sense: Unaligned write command
[  386.402138] sd 6:0:0:0: [sde] tag#21 CDB: Read(10) 28 00 05 55 91 d8 00 00 58 00
[  386.402139] print_req_error: I/O error, dev sde, sector 89493976 flags 700
[  386.409028] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=45816721408 size=45056 flags=1808b0
[  386.409050] sd 6:0:0:0: [sde] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  386.409051] sd 6:0:0:0: [sde] tag#22 Sense Key : Illegal Request [current] 
[  386.409052] sd 6:0:0:0: [sde] tag#22 Add. Sense: Unaligned write command
[  386.409053] sd 6:0:0:0: [sde] tag#22 CDB: Read(10) 28 00 05 55 92 30 00 00 58 00
[  386.409053] print_req_error: I/O error, dev sde, sector 89494064 flags 700
[  386.415929] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=45816766464 size=45056 flags=1808b0
[  386.415931] ata7: EH complete
[  395.888379] ata7.00: exception Emask 0x10 SAct 0x80000000 SErr 0x4050000 action 0xe frozen
[  395.896668] ata7.00: irq_stat 0x00000040, connection status changed
[  395.902945] ata7: SError: { PHYRdyChg CommWake DevExch }
[  395.908288] ata7.00: failed command: READ FPDMA QUEUED
[  395.913442] ata7.00: cmd 60/58:f8:50:f9:79/00:00:05:00:00/40 tag 31 ncq dma 45056 in
                        res 50/00:20:30:f5:79/00:04:05:00:00/40 Emask 0x10 (ATA bus error)
[  395.929154] ata7.00: status: { DRDY }
[  395.932820] ata7: hard resetting link
[  396.644713] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  396.660645] ata7.00: configured for UDMA/133
[  396.662498] sd 6:0:0:0: [sde] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  396.662500] sd 6:0:0:0: [sde] tag#31 Sense Key : Illegal Request [current] 
[  396.662501] sd 6:0:0:0: [sde] tag#31 Add. Sense: Unaligned write command
[  396.662502] sd 6:0:0:0: [sde] tag#31 CDB: Read(10) 28 00 05 79 f9 50 00 00 58 00
[  396.662503] print_req_error: I/O error, dev sde, sector 91879760 flags 700
[  396.669386] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=47038242816 size=45056 flags=1808b0
[  396.669393] ata7: EH complete
[  411.937054] ata7.00: exception Emask 0x10 SAct 0x80000 SErr 0x4050000 action 0xe frozen
[  411.945076] ata7.00: irq_stat 0x00000040, connection status changed
[  411.951367] ata7: SError: { PHYRdyChg CommWake DevExch }
[  411.956687] ata7.00: failed command: READ FPDMA QUEUED
[  411.961825] ata7.00: cmd 60/58:98:f8:7b:b5/00:00:05:00:00/40 tag 19 ncq dma 45056 in
                        res 50/00:10:e8:79:b5/00:02:05:00:00/40 Emask 0x10 (ATA bus error)
[  411.977555] ata7.00: status: { DRDY }
[  411.981223] ata7: hard resetting link
[  412.696988] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  412.705369] ata7.00: configured for UDMA/133
[  412.705537] sd 6:0:0:0: [sde] tag#19 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  412.705539] sd 6:0:0:0: [sde] tag#19 Sense Key : Illegal Request [current] 
[  412.705540] sd 6:0:0:0: [sde] tag#19 Add. Sense: Unaligned write command
[  412.705541] sd 6:0:0:0: [sde] tag#19 CDB: Read(10) 28 00 05 b5 7b f8 00 00 58 00
[  412.705542] print_req_error: I/O error, dev sde, sector 95779832 flags 700
[  412.712437] zio pool=dpool vdev=/dev/mapper/slot5crypt error=5 type=1 offset=49035079680 size=45056 flags=1808b0
[  412.712443] ata7: EH complete
[  481.999222] ata5.00: exception Emask 0x10 SAct 0x2000000 SErr 0x4040000 action 0xe frozen
[  482.007400] ata5.00: irq_stat 0x00000040, connection status changed
[  482.013671] ata5: SError: { CommWake DevExch }
[  482.018117] ata5.00: failed command: READ FPDMA QUEUED
[  482.023257] ata5.00: cmd 60/d0:c8:30:5b:4c/04:00:07:00:00/40 tag 25 ncq dma 630784 in
                        res 50/00:58:d8:5a:4c/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  482.039054] ata5.00: status: { DRDY }
[  482.042722] ata5: hard resetting link
[  482.755401] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  482.769473] ata5.00: configured for UDMA/133
[  482.771233] sd 4:0:0:0: [sdc] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  482.771234] sd 4:0:0:0: [sdc] tag#25 Sense Key : Illegal Request [current] 
[  482.771235] sd 4:0:0:0: [sdc] tag#25 Add. Sense: Unaligned write command
[  482.771236] sd 4:0:0:0: [sdc] tag#25 CDB: Read(10) 28 00 07 4c 5b 30 00 04 d0 00
[  482.771238] print_req_error: I/O error, dev sdc, sector 122444592 flags 700
[  482.778209] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=62687436800 size=630784 flags=40080cb0
[  482.778216] ata5: EH complete
[  498.571587] ata5: limiting SATA link speed to 3.0 Gbps
[  498.571589] ata5.00: exception Emask 0x10 SAct 0x20 SErr 0x4050000 action 0xe frozen
[  498.579338] ata5.00: irq_stat 0x00000040, connection status changed
[  498.585625] ata5: SError: { PHYRdyChg CommWake DevExch }
[  498.590935] ata5.00: failed command: READ FPDMA QUEUED
[  498.596075] ata5.00: cmd 60/28:28:80:80:8a/05:00:07:00:00/40 tag 5 ncq dma 675840 in
                        res 50/00:58:28:80:8a/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  498.611787] ata5.00: status: { DRDY }
[  498.615454] ata5: hard resetting link
[  499.327851] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  499.343276] ata5.00: configured for UDMA/133
[  499.343443] sd 4:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  499.343444] sd 4:0:0:0: [sdc] tag#5 Sense Key : Illegal Request [current] 
[  499.343445] sd 4:0:0:0: [sdc] tag#5 Add. Sense: Unaligned write command
[  499.343446] sd 4:0:0:0: [sdc] tag#5 CDB: Read(10) 28 00 07 8a 80 80 00 05 28 00
[  499.343448] print_req_error: I/O error, dev sdc, sector 126517376 flags 700
[  499.350438] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=64772702208 size=675840 flags=40080cb0
[  499.350460] ata5: EH complete
[  505.699719] ata5.00: exception Emask 0x10 SAct 0x90000000 SErr 0x4050000 action 0xe frozen
[  505.707993] ata5.00: irq_stat 0x00000040, connection status changed
[  505.714259] ata5: SError: { PHYRdyChg CommWake DevExch }
[  505.719569] ata5.00: failed command: READ FPDMA QUEUED
[  505.724713] ata5.00: cmd 60/40:e0:50:2f:a2/05:00:07:00:00/40 tag 28 ncq dma 688128 in
                        res 50/00:58:f8:2e:a2/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  505.740518] ata5.00: status: { DRDY }
[  505.744183] ata5.00: failed command: READ FPDMA QUEUED
[  505.749326] ata5.00: cmd 60/48:f8:90:34:a2/01:00:07:00:00/40 tag 31 ncq dma 167936 in
                        res 50/00:58:f8:2e:a2/00:00:07:00:00/40 Emask 0x10 (ATA bus error)
[  505.765126] ata5.00: status: { DRDY }
[  505.768793] ata5: hard resetting link
[  506.480042] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  506.494089] ata5.00: configured for UDMA/133
[  506.494268] sd 4:0:0:0: [sdc] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  506.494269] sd 4:0:0:0: [sdc] tag#28 Sense Key : Illegal Request [current] 
[  506.494270] sd 4:0:0:0: [sdc] tag#28 Add. Sense: Unaligned write command
[  506.494272] sd 4:0:0:0: [sdc] tag#28 CDB: Read(10) 28 00 07 a2 2f 50 00 05 40 00
[  506.494273] print_req_error: I/O error, dev sdc, sector 128069456 flags 4700
[  506.501348] sd 4:0:0:0: [sdc] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  506.501348] sd 4:0:0:0: [sdc] tag#31 Sense Key : Illegal Request [current] 
[  506.501349] sd 4:0:0:0: [sdc] tag#31 Add. Sense: Unaligned write command
[  506.501365] sd 4:0:0:0: [sdc] tag#31 CDB: Read(10) 28 00 07 a2 34 90 00 01 48 00
[  506.501365] print_req_error: I/O error, dev sdc, sector 128070800 flags 700
[  506.508331] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=65567367168 size=856064 flags=40080cb0
[  506.508334] ata5: EH complete
[  521.132244] ata5.00: exception Emask 0x10 SAct 0x30 SErr 0x4050000 action 0xe frozen
[  521.140004] ata5.00: irq_stat 0x00000040, connection status changed
[  521.146280] ata5: SError: { PHYRdyChg CommWake DevExch }
[  521.151598] ata5.00: failed command: READ FPDMA QUEUED
[  521.156738] ata5.00: cmd 60/58:20:98:44:1b/00:00:08:00:00/40 tag 4 ncq dma 45056 in
                        res 50/00:b8:e0:42:1b/00:01:08:00:00/40 Emask 0x10 (ATA bus error)
[  521.172366] ata5.00: status: { DRDY }
[  521.176027] ata5.00: failed command: READ FPDMA QUEUED
[  521.181170] ata5.00: cmd 60/58:28:f0:44:1b/00:00:08:00:00/40 tag 5 ncq dma 45056 in
                        res 50/00:b8:e0:42:1b/00:01:08:00:00/40 Emask 0x10 (ATA bus error)
[  521.196795] ata5.00: status: { DRDY }
[  521.200465] ata5: hard resetting link
[  521.912451] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  521.926523] ata5.00: configured for UDMA/133
[  521.926688] sd 4:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  521.926690] sd 4:0:0:0: [sdc] tag#4 Sense Key : Illegal Request [current] 
[  521.926691] sd 4:0:0:0: [sdc] tag#4 Add. Sense: Unaligned write command
[  521.926692] sd 4:0:0:0: [sdc] tag#4 CDB: Read(10) 28 00 08 1b 44 98 00 00 58 00
[  521.926693] print_req_error: I/O error, dev sdc, sector 136004760 flags 700
[  521.933674] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=69630242816 size=45056 flags=1808b0
[  521.933696] sd 4:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  521.933697] sd 4:0:0:0: [sdc] tag#5 Sense Key : Illegal Request [current] 
[  521.933698] sd 4:0:0:0: [sdc] tag#5 Add. Sense: Unaligned write command
[  521.933699] sd 4:0:0:0: [sdc] tag#5 CDB: Read(10) 28 00 08 1b 44 f0 00 00 58 00
[  521.933699] print_req_error: I/O error, dev sdc, sector 136004848 flags 700
[  521.940658] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=69630287872 size=45056 flags=1808b0
[  521.940661] ata5: EH complete
[  529.968429] ata5.00: exception Emask 0x10 SAct 0x28020 SErr 0x4050000 action 0xe frozen
[  529.976441] ata5.00: irq_stat 0x00000040, connection status changed
[  529.982713] ata5: SError: { PHYRdyChg CommWake DevExch }
[  529.988045] ata5.00: failed command: READ FPDMA QUEUED
[  529.993187] ata5.00: cmd 60/58:28:68:11:3b/00:00:08:00:00/40 tag 5 ncq dma 45056 in
                        res 50/00:58:c8:0b:3b/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[  530.008812] ata5.00: status: { DRDY }
[  530.012477] ata5.00: failed command: READ FPDMA QUEUED
[  530.017618] ata5.00: cmd 60/08:78:20:0c:3b/01:00:08:00:00/40 tag 15 ncq dma 135168 in
                        res 50/00:58:c8:0b:3b/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[  530.033415] ata5.00: status: { DRDY }
[  530.037082] ata5.00: failed command: READ FPDMA QUEUED
[  530.042225] ata5.00: cmd 60/c8:88:a0:0d:3b/03:00:08:00:00/40 tag 17 ncq dma 495616 in
                        res 50/00:58:c8:0b:3b/00:00:08:00:00/40 Emask 0x10 (ATA bus error)
[  530.058024] ata5.00: status: { DRDY }
[  530.061691] ata5: hard resetting link
[  530.772687] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[  530.786899] ata5.00: configured for UDMA/133
[  530.787038] sd 4:0:0:0: [sdc] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  530.787039] sd 4:0:0:0: [sdc] tag#5 Sense Key : Illegal Request [current] 
[  530.787040] sd 4:0:0:0: [sdc] tag#5 Add. Sense: Unaligned write command
[  530.787042] sd 4:0:0:0: [sdc] tag#5 CDB: Read(10) 28 00 08 3b 11 68 00 00 58 00
[  530.787043] print_req_error: I/O error, dev sdc, sector 138088808 flags 700
[  530.794014] zio pool=dpool vdev=/dev/mapper/slot3crypt error=5 type=1 offset=70697275392 size=45056 flags=1808b0
[  530.794037] sd 4:0:0:0: [sdc] tag#15 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[  530.794038] sd 4:0:0:0: [sdc] tag#15 Sense Key : Illegal Request [current] 
[  530.794039] sd 4:0:0:0: [sdc] tag#15 Add. Sense: Unaligned write command

The errors appear randomly for the four disks, but only one is really failing badly withing zfs:

root@debian:[~]# zpool status
  pool: dpool
 state: DEGRADED
  scan: resilvered 19,9G in 0 days 00:32:14 with 0 errors on Sun Apr 26 19:21:30 2020
config:

	NAME                       STATE     READ WRITE CKSUM
	dpool                      DEGRADED     0     0     0
	  raidz1-0                 DEGRADED     0     0     0
	    slot3crypt             ONLINE       0     0     0
	    replacing-1            UNAVAIL      0     0     0  insufficient replicas
	      1293895313619506864  UNAVAIL      0     0     0  was /dev/mapper/slot4crypt/old
	      slot4crypt           FAULTED      0    26     0  too many errors
	    slot5crypt             ONLINE       0     0     0
	    slot6crypt             ONLINE       0     0     0
	logs	
	  mirror-1                 ONLINE       0     0     0
	    slot1p2crypt2          ONLINE       0     0     0
	    slot2p2crypt2          ONLINE       0     0     0

errors: No known data errors

Without ZFS the disk sdd (slot4crypt) is working without any issues. S.M.A.R.T not showing any issues at all.

@c0d3z3r0
Copy link
Contributor

uhm. interesting. #4873 (comment) seems to work for me, too. but WHY?!

@c0d3z3r0
Copy link
Contributor

c0d3z3r0 commented May 4, 2020

Finally, this turned out to be a system firmware issue. See https://review.coreboot.org/c/coreboot/+/40877

You can try this:

echo maximum_performance | sudo tee /sys/class/scsi_host/host*/link_power_management_policy

If that solves the problem, it's pretty likely that you have the same problem. Try to get a BIOS/UEFI update from your vendor.

@dd1dd1
Copy link
Author

dd1dd1 commented May 4, 2020

That's interesting. My "unaligned write" errors have gone away after I moved the disks from an oldish socket 1366 mobo and a brand new socket 1151 mobo. It could have been the old bios on the old mobo not initializing the SSD SATA link correctly... K.O.

@flo82
Copy link

flo82 commented May 12, 2020

had the same problems with brand new 14TB drives on a skylake motherboard. Setting GRUB_CMD_LINUX="libata.force=3.0" fixed this issue for me. This forces the sata controller running at 3 GBIT/s - for a rusty drive still enough.

I have not tested the maximum_performance switch for the ports. Waiting for a firmware/bios update... let's see

@t-np
Copy link

t-np commented Aug 13, 2020

i have the same issues on a AMD x399 Threadripper board with 2 16TB Seagate exos Sata drives. I get mostly errors on one of them, i checked them recently with seagate windows tool and both seem fine. Currently i have the HDD Enclosure+Cabling in suspicion that they can not handle the full SATA 6 Speeds in a sustained manner, i'll rework the case this weekend and move around disks and see if it helps

link_power_management_policy was already on maximum_performance

@bjquinn
Copy link

bjquinn commented Sep 14, 2020

I'm getting this too on four separate machines now. Swapping drives, cables, drive bay, etc. didn't work. But putting all the drives on an HBA solved it. Supermicro H11DSi motherboard on all 4 machines, seems only to happen when using SATA controller on motherboard. BIOS already upgraded to latest. Using zfs via latest Proxmox 6.2 (zfs root/boot). A scrub can trigger it every time, though some of the machines can go a while without issues if they're not stressed. I've reached out to SuperMicro, but they don't have anything helpful yet.

I'm primarily seeing this on HGST 10TB SATA drives, though I did see it a few times on an Intel SSD.

@t-np
Copy link

t-np commented Sep 14, 2020

Swapping drives, cables, drive bay, etc. didn't work.

i switched cabling and used a new sata drive cage, so far my problems are gone

@bjquinn
Copy link

bjquinn commented Sep 15, 2020

Swapping drives, cables, drive bay, etc. didn't work.

i switched cabling and used a new sata drive cage, so far my problems are gone

Yeah, that's what's weird about this one. I've resolved similar problems that way in the past, but it's not resolving it this time, on multiple different systems.

@cluck
Copy link

cluck commented May 21, 2021

Can confirm too that this happens on at least three similar hosts all with an ASRock TRX40D8-2N2T Mainboard (SATAIII via TRX14, and/or via ASM 1061) and varying hard disks and SATA cables. Can't see the same happening with FreeNAS/TrueNAS running on the same setup, and neither when the setup is altered so that the disks are attached over an add-in LSI HBA card instead of using the Mainboard connectors.

Also confirming that the failure is reproducibly happening only after having sent many gigabytes of data over the SATA wires. The failures seems to suddenly affecting multipe ata links concurrently. My guess at this point is a kernel bug with some HBAs.

To reproduce: boot a Debian Buster (or Proxmox) from an additional disk, and build a raidz2 over four disks attached to the TRX40 or the ASM 1061, then concurrently run bonnie++ -d /path/to/zfs -u root -r $MB_RAM and fio --name=randrw --rw=randrw --direct=1 --ioengine=libaio --bs=16k --numjobs=1600 --rwmixread=10 --size=1M --runtime=600 --group_reporting and what it fail after ~15-20min.

Found with Proxmox kernel 5.4.73-1-pve, and still present in 5.4.114-1-pve.
zfs-2.0.3-1~bpo10+1
zfs-kmod-2.0.4-pve1

@bjquinn
Copy link

bjquinn commented May 21, 2021

I can reproduce with a simple zpool scrub. It's true that it only happens after many GBs of data transfer, but I can trigger it every time.

I was also able to eliminate the problem with an add in LSI 9300-4I HBA, but of course that's not a true fix. However, I had to get our servers stable, so I bought HBAs for all of them, and I haven't seen a single problem since.

Edit: I also had to update the HBA firmware to the latest in order to avoid hard reboots under the same type of load. I would assume that's unrelated (it was NOT the same "unaligned write error" in dmesg, it was a hard freeze/reboot), but I thought I should mention it. Anyway, an HBA with fully updated firmware does seem to have eliminated the problem for me across the several servers I was having this issue with, as a workaround.

@cluck
Copy link

cluck commented May 22, 2021

Update: Kernel 5.10¹ from Debian Buster Backports, and Kernel 5.11 from Proxmox pvetest repo are stable on the TRX14 SATA links. They both continue to fail on the ASMedia ASM1062²; but less spectacularly so in dmesg.

¹ Debian Backports 5.10.0-0.bpo.5-amd64
² I previously wrote ASM 1061, as the Mainboard manual reports it, but lspci reports ASM1062 (rev 02).

@t-np
Copy link

t-np commented May 23, 2021

my problems returned in a differen kind of way on my personal host where i passthrough pcie devices incl. gpu. this is the host where i originally had the problems with the two sata 16TB exos drives where the issues seem to went away after i replaced the disk cage and cabling. But it seems the problem just moved on to the nvme port. I had one Samsung 970EVO with 2TB in that port and did not use it yet and had that disk in a ZFS Pool. Almost every time i scrubbed that nvme i got following errors on random sectors

┌Nov 8 00:26:18 kernel: [306450.056343] blk_update_request: critical medium error, dev nvme0n1, sector 172517081 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0                             │
│Nov 8 00:26:18 kernel: [306450.056380] zio pool=zfs vdev=/dev/disk/by-id/nvme-eui.0025385b81b21500-part1 error=61 type=1 offset=88327696896 size=131072 flags=1808b0                          │
└Nov 8 00:26:18 zed: eid=54 class=io pool_guid=0xFD58158A8A98C87E vdev_path=/dev/disk/by-id/nvme-eui.0025385b81b21500-part1                                               │
─Nov 8 00:26:47 kernel: [306479.686640] zio pool=zfs vdev=/dev/disk/by-id/nvme-eui.0025385b81b21500-part1 error=61 type=1 offset=153830233600 size=131072 flags=40080cb0                         │
┌Nov 8 00:26:47 kernel: [306479.686602] blk_update_request: critical medium error, dev nvme0n1, sector 300451723 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0                             │
│Nov 8 00:26:47 kernel: [306479.686640] zio pool=zfs vdev=/dev/disk/by-id/nvme-eui.0025385b81b21500-part1 error=61 type=1 offset=153830233600 size=131072 flags=40080cb0

it got somehow worse and was not just limited to scrub problems. my win10 guest that passes through multiple pcie devices from the proxmox host could not initialize hardware when booting up and locked the host up solid when the windows circle rotates and windows sorts out hardware, sometimes it got a screen further and locked up on applying configuration, and even more rarely it got to desktop and then locked up and in very rare cases it locked up hours days or even a week later. i fiddled so long around and i got a few hints in the console that it seem to be a hardware initialization issue. Sometimes it wrote something into dmesg just before locking up. sometimes USB3 Adapters that were passed through but i also noticed the nvme device that was not passed through at all.

This Samsung was counting up SMART inegrity errors everytime i ran zfs scrub and im right now far over
Data Integrity Errors: 31 and Error Information Log Entries: 48 but Windows Samsung Magician just tells everything is fine. Samsung did not want to RMA the device initially and i might be out of warranty now...

In the end this was the single issue that was plaguing me for the last few months where i needed to hope and gamble to get my guest up and running. I first thought it was related to the kernel version of proxomox and the zfs module as there were some known issues that locked up the host in the latest Kernel.
After i removed the Samsung NVME 970 EVO 2TB everything is fine and runs without any issues on proxmox kernel 5.11.17-1~bpo10

@stuckj
Copy link

stuckj commented Sep 30, 2021

This started happening to me pretty regularly on a supermicro virtualization server (X10SDV-8C-TLN4F). I have 4 500GB crucial MX500's in a 2x2 mirrored pool with one hot spare. Every couple weeks I'm seeing the hot spare kick in. The drive that dies always dies with the error mentioned at the top of this issue (unaligned write). Soft resetting the host bus doesn't work. Unplugging the SATA cable doesn't work. You have to unplug power to the drive and re-plug in to get it back up. All 5 drives are brand new (I replaced the old SSDs when this problem cropped up thinking they were failing...guess they weren't). It's not always the same drive that this happens with (it's happened across 3 different ones).

The old drives were Samsung EVO 860's and had the same problem. So...not manufacturer specific. It's still running the original BIOS so I may try updating that to see if it makes a difference.

This smells like a kernel bug though rather than a ZFS bug given that the failure is at the ATA level. Unless ZFS can write something that can result in an unaligned write.

@stuckj
Copy link

stuckj commented Sep 30, 2021

I do have trim enabled on the pool. Perhaps it could be related to this: #8552. Though these aren't Samsung drives. I can try the noncq setting on the next reboot when I can do some downtime.

@stuckj
Copy link

stuckj commented Sep 30, 2021

Also, you can disable NCQ without a reboot using this technique: https://unix.stackexchange.com/a/611496

@stuckj
Copy link

stuckj commented Oct 3, 2021

Disabling NCQ with the above technique (the technique that doesn't require rebooting) was not effective for me. I'm going to try disabling autotrim and just leave it to the monthly manual trim (which I believe should run in a couple of hours since the default is the first Sunday of each month).

@stuckj
Copy link

stuckj commented Oct 4, 2021

Nope. That didn't work either. :-P I'm starting to suspect failing hardware for me now too (or a kernel bug in a recent update). To my knowledge, the D-1541 Xeon's are a SoC so all the SATA controllers are actually in the CPU. Seems weird it would fail, but there must be some components on the MB. I'm moving them to the backplane to see if that resolves the issue. Only had them on the internal ports for speed since they're each individual SATA III ports vs the shared SATA II backplane.

@livelace
Copy link

livelace commented Oct 4, 2021

@stuckj Hello
I see your "fight" and want to share mine.
Since I wrote this short note - I haven't had "unaligned write command" problem.
Hope this can help somehow.

@bjquinn
Copy link

bjquinn commented Oct 4, 2021

@livelace thanks for the writeup. In my case though the issue predates zstd, for whatever that's worth.

@stuckj
Copy link

stuckj commented Oct 4, 2021

Yep, thanks @livelace. Doesn't apply in my case either though since the current problem I'm having is with Crucial MX500's (5 of them, all brand new, 2x2 mirrored pairs and a spare). Also using lz4 rather than zstd for me.

@stuckj
Copy link

stuckj commented Oct 5, 2021

I do have a bit of a performance hit, but so far everything is working on the backplane whereas every day for the past week I was getting at least 1 drive failure. Prior to the switch yesterday I had 4 fail at once which required me to reboot everything.

I guess it's possible that I have failing SATA ports on the motherboard. But...I don't think it's likely. Also, this started happening after applying both a kernel update and a ZFS update. I had been holding off on going from the latest 0.8.4 and updated kernel due to wanting the same ZFS version for my znapzend container. I finally got around to making a znapzend container based on the proxmox repos though so finally made the ZFS and kernel jump before this started happening.

I'm not sure how best to narrow this down to a ZFS or kernel bug, but it seems like it's one of those.

@stuckj
Copy link

stuckj commented Oct 6, 2021

Huh. This is the same symptoms from years ago on the exact same MB I'm using, though it's with intel SSDs instead of crucial. In that case it was the drive firmware that was updated to fix it. Crucial doesn't appear to have new firmware for the MX500's I'm using though:

https://community.intel.com/t5/Solid-State-Drives/S3610-SSDs-have-failed-quot-READ-WRITE-FPDMA-QUEUED-quot-ATA/m-p/291172

Guess I'm just stuck with the backplane for the time being (or until I upgrade the machine?). Meh, it makes drive maintenance easier I suppose.

@frasc
Copy link

frasc commented Dec 16, 2021

I had a very similar issue, that is failed writes with the dmesg message:
[Mon May 31 10:15:51 2021] sd 5:0:0:0: [sdj] tag#1 Add. Sense: Unaligned write command

My Hardware configuration introduces something different from the other cases:

  • one pool is 6 disks, 4TB, sector size 4096, but they are plugged on SATA II controller as this is an old motherboard
  • the 2nd pool is 5 disks, 4TB, sector size 4096. They are in an external enclosure, connected vi eSATA and a PMP in the enclosure. The PMP provides 5 SATA II ports

I was having the issue with 'scrub' on both pools.

The way I solved it, or at least hope to have solved it, is by backing up all data, recreating the pools and restoring the data.
I haven't seen the 'Unaligned write command' for a few months now.

There is something that probably needs to be confirmed:
The older pools (with 'Unaligned command') were created under Ubuntu with ZFS 1.x. The problem appeared after Ubuntu upgraded from ZFS 1.x to ZFS 2.0.1
The two new pools (so far without 'Unaligned command') have been created under Gentoo with ZFS 2.1 and ZFS 2.0.4.

I have no doubt that the ZFS team verifies that a pool survives a ZFS upgrade.

The question is whether there could be a corner case where strange issues happen. I mean, ZFS is a hugely complicated filesystem, and it's conceivable that something could have been introduced.
If the pools needed to be recreated when upgrading ZFS to a major release, that'd be most unfortunate.

Thank you.

@bmilde
Copy link

bmilde commented Aug 30, 2023

@lorenz Interesting, that might explain why I've gotton the unaligned write error too.

I've tried everything from updating ssd firmware, maximum link power settings, to no trim, upgrading the linux kernel, downgrading the port speeds to 3gbps. Nothing worked, slower speed just meant it took longer for the problem to resurface.

What has likely fixed it now is a cable replacement. I noticed that only 2 drives were failing and they had a different brand of cables. The new cables are thicker too, so probably they have much better shielding. Definitely try to replace your sata cable if you're seeing this error!

Side note: zfs didn't handle random writes erroring out gently, the pool was broken beyond repair and I had to rebuild from backups.

@meyergru
Copy link

meyergru commented Sep 6, 2023

I am on the same boat here. System is an X570-based mainboard with FCH controllers plus an ASMEDIA ASM1166.
The drivers are 8 WD Utrastar 18 TB, 6 of which are known good because they ran on an ARECA RAID controller for 2 years with no problems. The drives are in MBP155 cages, which certainly give another potential contact point failure.
Everything running under Proxmox 8.0.4 with Kernel 6.2.x.

I had ZFS errors galore and tried:

  1. Exchanging SATA cables - no dice.
  2. Suspected that the ASM1166 is problematic. Ordered a JMB585, not yet arrived.
  3. Limiting SATA speed to 3 GBit/s since the former ARECA did only SATA II - not really a viable option for ASM1166, dmesg first says speed is limited, afterwards raised to 6 GBit/s, though. This works for the FCH controller, though.
  4. In the meantime, I found that the cage could have been the culprit as all affected drives were on that cage. Changed the MOLEX splitter cables to direct power supply connections.
  5. After that, everything SEEMED to work, but now I get errors even on the FCH-connected drives.
  6. I now have set link_power_management_policy to max_performance, scrubbing right now and hoping for the best.

Also, I see those errors turning up only after a few hours of time (or lots of data) has passed.

Using the JMB585 could still be no option, even if now drives on the motherboard controller show these errors, because I can probably limit SATA speed with that controller, which was impossible with the ASM1166. I will try that as a last-but-one resort if limiting link power does not resolve this.

I hate the thought of having to use a HBA adapter consuming more power.

P.S.: The JMB585 can be limited to 3 Gbps, Otherwise, no change, I still get errors on random disks. Have ordered an LSI 9211.8i now. However, this points to a real problem in the interaction between libata and ZFS.

P.P.S: I disabled NCQ and the problem is gone. I did not bother to try the LSI controller. Will follow up with some insights.

@meyergru
Copy link

OpenZFS for Linux problem with libata - root cause identified?

Just to reiterate on what I wrote about this here, I have a Linux box with 8 WDC 18 TByte SATA drives, 4 of which are connected through the mainboard controllers (AMD FCH variants) and 4 through an ASMEDIA ASM1166. They build a raidz2 running under Proxmox with a 6.2 kernel. During my nightly backups, the drives would regularly fail and errors showed up in the logs, more often than not "unaligned write errors".

First thing to note is that one poster in the thread mentioned that the "Unaligned write" is a bug in libata, in that "other" errors are mapped to this one in the scsi translation code (https://lore.kernel.org/all/20230623181908.2032764-1-lorenz@brun.one/). Thus, the error itself is meaningless.

In the thread, several possible remedies were offered, such as:

  1. Faulty SATA cables (I replaced them all, no change, but I admit this could be the problem in some cases)
  2. Faulty disks (Mine were known to be good, and also, errors were randomly distributed among them)
  3. Power saving in the SATA link or the PCI bus (disabling this did not help)
  4. Problematic controllers (Both the FCH and the ASM1166 chips as well as a JMB585 showed the same behaviour)
  5. Limiting SATA speed to SATA 3.0 Gbps or even to 1.5 Gbps (3.0 Gbps did not help, and was not even possible with the ASM1166 as the speed was always reset to 6.0 Gbps, but I could check with FCH and JMB585 controllers)
  6. Disabling NCQ (guess what, this helped!)
  7. Replacing the SATA controllers with an LSI 9211-8i (I guess this would have helped, as others have reported, because it probably does not use NCQ)

I am 99% sure that it boils down to a bad interaction between OpenZFS and libata with NCQ enabled and I have a theory why this is so:
When you look at how NCQ works, it is a queue of up to 32 (or to be exact 31 for implementation reasons) tasks that can be given to the disk drive. Those tasks can be handled in any order by the drive hardware, e.g. in order to minimize seek times. This, when you give the drive 3 tasks, like "read sectors 1, 42 and 2, the drive might decide to reorder them and read sector 42 last, thus saving one seek in the process.

Now imagine a time of high I/O pressure, like when I do my nightly backups. OpenZFS has some queues of its own which are then given to the drives and for each task started, OpenZFS expects a result (but in no particular order). However, when a task returns, it opens up a slot in the NCQ queue, which is immediately filled with another task because of the high I/O pressure. That means that the sector 42 could potentially never be read at all, provided that other tasks are prioritized higher by the drive hardware.

I believe, this is exactly what is happening and if one task result is not received within the expected time frame, a timeout with an unspecific error occurs.

This is the result of putting one (or more) quite large queues within OpenZFS before a smaller hardware queue (NCQ).

It explains why both solutions 6 and probably 7 from my list above cure the problem: Without NCQ, every task must first be finished before the next one can be started. It also explains why this problem is not as evident with other filesystems - were this a general problem with libata, it would have been fixed long ago.

I would even guess reducing SATA speed to 1.5 Gbps would help (one guy reported this) - I bet this is simply because the resulting speed of ~150 MByte/s is somewhat lower than modern hard disks, such that the disk can always finish tasks before the next one is started, whereas 3 Gpbs is still faster than modern spinning rust.

If I am right, two things should be considered:

a. The problem should be analysed and fixed in a better way, like throttling the libata NCQ queue if pressure gets too high, just before timeouts are thrown. This would give the drive time to finish existing tasks.
b. There should be a warning or some kind of automatism to disable NCQ for OpenZFS for the time being.

I also think that the parformance impact of disabling NCQ with OpenZFS is probably neglible, because OpenZFS has prioritized queues for different operations anyway.

@dd1dd1
Copy link
Author

dd1dd1 commented Sep 11, 2023

(I am the OP, I have some experience with linux kernel drivers, and embedded firmware development)

I like how you wrote it all up, but I doubt you can bring a closure to this problem. IMO, the list of "remedies" is basically snake oil, if any of these remedies was "a solution", this bug would be closed long time ago.

I think "NCQ timeout" does not explain this problem:
(a) if NCQ specs permitted a queued command to be postponed indefinitely (and cause a timeout), it would be a serious bug in the NCQ specs, unlikely, but one has to read them carefully to be sure. if specs permitted it, it would likely have caused trouble in places other than ZFS, people would have complained, specs would have been fixed. of course it is always possible that specific HDD firmware has bugs and can postpone some queued commands indefinitely (and cause a timeout), even if specs do not permit it.
(b) if queued command timeout was "it", disabling NCQ would have been "the solution", and the best we can tell, it is not.

I think we now have to wait for the libata bug fix to make it into production kernels. Then we will see what the actual error is. "unaligned write command" never made sense to me, and now we know it is most likely bogus.

K.O.

@meyergru
Copy link

meyergru commented Sep 11, 2023

I did not imply that NCQ allows a command to be left indefinitely in itself. It can only be postponed by the hardware in that it may reorder the commands in any way it likes. This is just how NCQ works.

Thus, an indefinite postponing can only be occur if someone "pressures" the queue consistenly - actually, the drive is free to reorder new incoming commands and intersperse them with previous ones - matter-of-fact, there is no difference between issuing 32 commands in short succession and issuing a few more only after some have finished. Call that behaviour a design flaw, but I think it exists and the problem in question surfaces only when some other conditions are met.

And I strongly believe that OpenZFS can cause exactly that situation, especially with write patterns of raidz under high I/O pressure. I doubt that this bug would occur with other filesystems where no such complex patterns from several internal queues ever happen.

As to why the "fixes" worked sometimes (or seemed to have worked): As I said, #6 and #7 both disable NCQ. Reducing the speed to 1.5 Gbps will most likely reduce the I/O pressure enough to make the problem go away and other solutions may help people who really have hardware problems.

Also, I have read nobody so far who has tried to disable NCQ and not done something else alongside (e.g. reducing speed as well). I refrained from disabling NCQ first only because I thought it would hurt performance - which it did not. Thus, my experiments ruled out one single potential cause after another, leaving only the disabling of NCQ as the effective cure. I admit that I probably should wait a few more nights before jumping to conclusions, however these problem were consistent with every setup I tried so far. (P.S.: It has been three days in a row now that no problems occured)

Nothing written here nor anything I have tried so far refutes my theory.

I agree there is a slight chance of my WDC drives having a problem with NCQ in the first place - I have seen comments on some Samsung SSDs having that problem with certain firmware revisions. But that would not have gone unnoticed, I bet.

@m6100
Copy link

m6100 commented Nov 24, 2023

Just FYI the Unaligned Write error is a bug in Linux's libata:

https://lore.kernel.org/all/20230623181908.2032764-1-lorenz@brun.one/

Unfortunately, this patch was never applied and the issue got no further attention after a short discussion. There also seems to be no other cleanup having been done on this topic, at least I couldn't find anything related in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/ata/libata-scsi.c

Basically libata is not implementing the SAT (SCSI-to-ATA) translation correctly and ends up converting ATA errors it shouldn't, producing nonsense. This is still a real error (probably a link reset or similar), but the SCSI conversion is broken.

I think you are correct on this, I'm seeing the error also on a system where I put a new 6.0 GBps Harddisk with 512 byte sectors in a older cartridge with old cabling (going to replace this). For this zpool status lists 164 write errors after writing about 100GB and degraded state. The harddisk increased UDMA_CRC_Error_Count SMART raw value from 0 to 3 but otherwise has no problems. The dmesg info also indicates a prior interface/bus error that is then decoded as unaligned write on the tagged command:

[18701828.321386] ata4.00: exception Emask 0x10 SAct 0x3c00002 SErr 0x400100 action 0x6 frozen
[18701828.322053] ata4.00: irq_stat 0x08000000, interface fatal error
[18701828.322652] ata4: SError: { UnrecovData Handshk }
[18701828.323256] ata4.00: failed command: WRITE FPDMA QUEUED
[18701828.323885] ata4.00: cmd 61/58:08:b8:da:63/00:00:48:00:00/40 tag 1 ncq dma 45056 out
res 40/00:08:b8:da:63/00:00:48:00:00/40 Emask 0x10 (ATA bus error)
[...]
[18701830.479670] sd 4:0:0:0: [sdo] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=2s
[18701830.479679] sd 4:0:0:0: [sdo] tag#1 Sense Key : Illegal Request [current]
[18701830.479687] sd 4:0:0:0: [sdo] tag#1 Add. Sense: Unaligned write command
[18701830.479696] sd 4:0:0:0: [sdo] tag#1 CDB: Write(16) 8a 00 00 00 00 00 48 63 da b8 00 00 00 58 00 00

@lorenz
Copy link
Contributor

lorenz commented Nov 24, 2023

The reason it never got applied is mostly because as it turns out this is a deeper architectural issue with libata, there is no valid SCSI error code here. Sadly I'm not familiar enough with Linux SCSI midlayer to implement the necessary changes.

CRC errors are not the only type of link error. You are probably losing the SATA link which causes reset/retraining which is one of the known things libata doesn't handle correctly.

@ngrigoriev
Copy link

ngrigoriev commented Jun 8, 2024

I have just got this problem on a brand new WD Red WD40EFPX. I bought it yesterday to replace the failed mirror drive. Getting these errors while ZFS is resilvering it.

No question about faulty controller, cable or anything else hardware-related. The system worked for a very long time, the failed component I have replaced is the disk. The new disk is unlikely to be bad. One way or another, it is related to the new disk interaction with the old system.

@lorenz
Copy link
Contributor

lorenz commented Jun 9, 2024

This is not really a ZFS issue, it's a hardware/firmware issue being handled badly by the Linux kernel's SCSI subsystem. This issue should probably be closed here.

@ngrigoriev These errors are in pretty much all cases hardware/firmware-related. Post the kernel log if you want me to take a look at the issue.

@ngrigoriev
Copy link

ngrigoriev commented Jun 9, 2024

@lorenz

I was able to finish the resilvering process, but at the very end it started failing again. And this was with "libata.force=3.0 libata.force=noncq"

Right after reboot:

[   74.500345] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   74.500372] ata3.00: failed command: WRITE DMA EXT
[   74.500391] ata3.00: cmd 35/00:08:e8:28:00/00:00:16:00:00/e0 tag 23 dma 4096 out
                        res 40/00:20:58:6f:05/00:00:00:00:00/e0 Emask 0x4 (timeout)
[   74.500409] ata3.00: status: { DRDY }
[   74.500423] ata3: hard resetting link
[   74.976320] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[   74.977298] ata3.00: configured for UDMA/133
[   74.977339] sd 2:0:0:0: [sdb] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=31s
[   74.977347] sd 2:0:0:0: [sdb] tag#23 Sense Key : Illegal Request [current]
[   74.977354] sd 2:0:0:0: [sdb] tag#23 Add. Sense: Unaligned write command
[   74.977363] sd 2:0:0:0: [sdb] tag#23 CDB: Write(16) 8a 00 00 00 00 00 16 00 28 e8 00 00 00 08 00 00
[   74.977373] blk_update_request: I/O error, dev sdb, sector 369109224 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[   74.977403] zio pool=******** vdev=/dev/disk/by-id/ata-WDC_WD40EFPX-68C6CN0_WD-WX12D14P6EH0-part1 error=5 type=2 offset=188982874112 size=4096 flags=180880
[   74.977433] ata3: EH complete

Drive:

Device Model:     WDC WD40EFPX-68C6CN0
Serial Number:    WD-WX12D14P6EH0
LU WWN Device Id: 5 0014ee 216332baf
Firmware Version: 81.00A81
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Sun Jun  9 09:50:07 2024 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       1
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       17
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       3
194 Temperature_Celsius     0x0022   111   107   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       8
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

I understand that it is not zfs fault directly, but, interestingly enough, it is triggered by ZFS specifically. This machine has 5 HDDs. None of them demonstrated this issue for years. It only started to happen with this new WD Red drive that replaced the failed one.

@ngrigoriev
Copy link

ngrigoriev commented Jun 10, 2024

Tried everything I have read about. All combination of the options, libata.force (noncq,3.0, even 1.5). Nothing really worked. At most after a couple of hours, even after a successful resilvering of the entire drive, a bunch of errors would just appear. Also I have noticed that if I set the speed to 1,5Gbps, then other drives on this controller start getting the similar problems.

SCSI power settings are forced to max_power for all hosts.

I have a combination of different drives in this home NAS, some are 6Gbps, some are 3.0.

What I am trying now, I have connected this new drive to the second SATA port on the motherboard instead of the PCIE SATA controller. And I have removed all libata settings. So far so good, keeping fingers crossed. If that does not help, then I am out of options. I have already changed the cable to be sure. Another controller? Well, I have already tried two different ones effectively: ASMedia and the onboard Intel one.

@lorenz
Copy link
Contributor

lorenz commented Jun 10, 2024

Basically what's happening is that your disk does not respond within 7 or 10s (the Linux ATA command timeout) to a write command the kernel sent. ATA does not have a good way to abort commands (SCSI and NVMe do), so the kernel "aborts" the command by resetting the link.
Problem is the link reset is improperly implemented in Linux, resulting in spurious write errors and bogus error codes. Basically Linux does not automatically retry outstanding I/O requests, instead failing them with an improper error code as this is not standardized behavior and as such doesn't have a proper error code.

Unless you've hot-plugged the disk I suspect you have either a cabling issue or one side of the link (either the SATA controller or the disk controller) is bad as we see CRC errors on the link. This would explain the weird timeouts as the command might have been dropped due to a bad CRC.

@ngrigoriev
Copy link

ngrigoriev commented Jun 10, 2024

Yes, it seems so, and it only happens under heavy write activity, apparently.

Is there a way to control this timeout? (I understand it is not the place to ask this kind of question :( )

@fcmircea
Copy link

fcmircea commented Jun 10, 2024 via email

@G8EjlKeK7CwVQP2acz2B
Copy link

G8EjlKeK7CwVQP2acz2B commented Jun 10, 2024 via email

@fcmircea
Copy link

fcmircea commented Jun 10, 2024 via email

@Shellcat-Zero
Copy link

I'll chime in and say that my issues went away after upgrading my hardware, also specifically going from consumer to server grade hardware, but my disks are not using a dedicated HBA. The issue only arose after upgrading from Ubuntu 18.04 to 22.04, and I'm primarily faulting the old PSU at this point for wreaking havoc on my disks and possibly damaging an HBA that I trialed on the old consumer-grade motherboard.

@lorenz
Copy link
Contributor

lorenz commented Jun 10, 2024

@ngrigoriev Generally for ZFS you should be using ATA Error Recovery Control so that the disk does not delay command completion for too long. Your disk should have that feature. With that you should never hit the timeouts because the disk takes too long. But considering I'm seeing CRC errors on your disk I think the command might not actually have arrived at all, ERC does not help there. The only way forward is a link reset, which Linux does perform. Problem is again that the SAT (SCSI/ATA translation layer)'s error handler (EH) is implemented incorrectly on Linux, thus you get visible faults in ZFS from this. The unaligned write code is completely bogus, it basically indicates that the link has been reset.

@ngrigoriev
Copy link

These UDMA_CRC_Error_Count does not seem to correlate to the number of errors ZFS reported. It stays at 8 now, while I had many hundreds of errors reported by ZFS yesterday.

It appears that moving this particular new drive to the motherboard's controller (same SATA cable) has resolved the issue. Not a single error overnight for any ZFS pool, And the disk works at 6 Gbps. With NCQ not disabled :)

Previously the system worked well with the same number of disks all on the PCIe controller based on ASMedia 1060, without a single error. What has broken the balance was the new WD Red drive replacing the dying Seagate Barracuda. Active writing to this disk seem to cause the failed commands. Different commands, actually.

     15  FLUSH CACHE EXT
      1  READ FPDMA QUEUED
      7  WRITE DMA
     62  WRITE DMA EXT
      4  WRITE FPDMA QUEUED

Most of them are timeouts, few are "ATA bus error".

It seems that going forward I will need to find a replacement SATA controller. More of the old disks will die, to be replaced by the newer ones. And it seems that the newer disks bring the troubles....

@ngrigoriev
Copy link

I have replaced the controller, found a reasonably priced LSI SAS2008 on ebay. Moved all the disks there, except one, no errors for a month.

Ironically, I had to keep one of the disks on the onboard controller. Not the one that was misbehaving with ZFS! Turned out, that smart super-capable LSI does not recognize some SATA drives, like Seagate IronWolf, at least certain models. And there is nothing you can do with that at all :)

To be honest, this experience did shake my confidence ;) I was assuming that in 2024 (or even in 2010) we should not have any compatibility or stability problems with SATA and Linux.

@meyergru
Copy link

It has been a while since the original problem turned up. In my case, it was definitely neither caused by a hardware issue with the drives, cabling nor by a driver / chipset problem, see also this issue:

#15270

However, I think that the original problem was within OpenZFS alone and may be fixed by now:

What I found is the hint to this pull: #15414 leads to this pull: #15588. That pull has some interesting notes which could explain the errors reported here completely and are not too far off my suspicions about I/O pressure causing this.

In the end, the final pull that has been accepted was #16032 and it is contained in OpenZFS 2.2.4. To verify that the fix is present in your applicable Linux version, you can look cat "/sys/module/zfs/parameters/zfs_vdev_disk_classic". If this is present and has the value 1, this fix is present and you may probably safely remove the libata.force=noncq. I did this on my Proxmox 8.2.4 installation, which now has OpenZFS 2.2.4 under the hood, instead of 2.12 in late 2023.

If you still experience problems, their root cause may be something different than the underlying OpenZFS problem from 2020.

I think that this issue should be closed now and have done that for 15270.

@robn
Copy link
Member

robn commented Aug 12, 2024

Not quite: zfs_vdev_disk_classic=1 is to use the "classic" version, that is, the same code that has existed since forever. Set it to 0 to use the new submission method.

(I have no opinion on this particular issue; just pointing out the inverted option).

@morgoth6
Copy link

Not sure is my problem related, but it looks like it. I start to observe frequent CKSUM errors after upgrade ZFS to test DirectIO merged recently. Disks are 100% OK and doesn't show any signs of trouble (in both SMART and in stress testing) and all machines have ECC memory, etc. It looks like:
eid=161 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0RB14576 size=20480 offset=835569537024 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359 eid=160 class=data pool='pikPool' priority=4 err=52 flags=0x18080b1 bookmark=645:134:0:36359 eid=162 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0RB14573 size=20480 offset=835569537024 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359 eid=163 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0T700207 size=24576 offset=835569532928 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359
I encouter this problem long time ago when testing DirectI/O patches (before it was merged) but ignore it (was sure I mess something when applying the patch) but it seems it's still present after merge.
A few observations bellow as maybe it will be helpful for someone (I am more than happy to provide more info if required)

  • It seems in all cases I observed it appears after messing with XATTR's or ACL's. For example I can reproduce it on Qemu qcow2 file with Win10 quite easily (cache="none" io="threads" discard="unmap") The linked error:
    warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/domain-4-vAC/master-key.aes secdriver=selinux warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/domain-4-vAC/master-key.aes secdriver=dac warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/pikPool/libvirt/zimages/vACam10.qcow2 secdriver=selinux warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/nvram/vAC_VARS.fd secdriver=selinux
    Not sure is relevant, but all pool's are using ashift=12.

After that same file are show in zpool status -v as damaged. Funny thing the qemu-img check always report them as OK.

  • This problem never resulted in any data lose here and in most cases 1-2 scrub&clear passes was enough for them to "fix itself" In worst case qemu-img convert ... to new qcow2 and replace old file was enough.
  • Using zfs set direct=disabled ... seems to workaround this as no errors after that (3-5 days)

@bwatkinson
Copy link
Contributor

Not sure is my problem related, but it looks like it. I start to observe frequent CKSUM errors after upgrade ZFS to test DirectIO merged recently. Disks are 100% OK and doesn't show any signs of trouble (in both SMART and in stress testing) and all machines have ECC memory, etc. It looks like: eid=161 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0RB14576 size=20480 offset=835569537024 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359 eid=160 class=data pool='pikPool' priority=4 err=52 flags=0x18080b1 bookmark=645:134:0:36359 eid=162 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0RB14573 size=20480 offset=835569537024 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359 eid=163 class=checksum pool='pikPool' vdev=ata-SAMSUNG_MZ7KH960HAJR-00005_S47NNA0T700207 size=24576 offset=835569532928 priority=4 err=52 flags=0x1000b0 bookmark=645:134:0:36359 I encouter this problem long time ago when testing DirectI/O patches (before it was merged) but ignore it (was sure I mess something when applying the patch) but it seems it's still present after merge. A few observations bellow as maybe it will be helpful for someone (I am more than happy to provide more info if required)

  • It seems in all cases I observed it appears after messing with XATTR's or ACL's. For example I can reproduce it on Qemu qcow2 file with Win10 quite easily (cache="none" io="threads" discard="unmap") The linked error:
    warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/domain-4-vAC/master-key.aes secdriver=selinux warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/domain-4-vAC/master-key.aes secdriver=dac warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/pikPool/libvirt/zimages/vACam10.qcow2 secdriver=selinux warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /var/lib/libvirt/qemu/nvram/vAC_VARS.fd secdriver=selinux
    Not sure is relevant, but all pool's are using ashift=12.

After that same file are show in zpool status -v as damaged. Funny thing the qemu-img check always report them as OK.

  • This problem never resulted in any data lose here and in most cases 1-2 scrub&clear passes was enough for them to "fix itself" In worst case qemu-img convert ... to new qcow2 and replace old file was enough.
  • Using zfs set direct=disabled ... seems to workaround this as no errors after that (3-5 days)

@morgoth6 I actually have been able to reproduce this myself. I am working on a patch for it right now. Should have a PR open to master in the next day or two to address this.

@requeijaum
Copy link

I think my WD Red (not PLUS) drives died in June this year, due to this problem.

@stuckj
Copy link

stuckj commented Oct 11, 2024

I think my WD Red (not PLUS) drives died in June this year, due to this problem.

That seems unlikely. These errors could be FROM a dying drive but wouldn't hurt your drive. If you got unaligned write errors on a healthy drive the result would be a failed write meaning your drive wouldn't physically be doing anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests