-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the return of "Unaligned write command" errors #10094
Comments
Some details. OS is CentOS-7, kernel 3.10.0-1062.12.1.el7.x86_64, zfs-0.8.3-1. Problem disks: # ./smart-status.perl Disk model serial temperature realloc pending uncorr CRC err RRER /dev/sda WDC WDS100T2B0A-00SM50 195004A00B9C 26 . ? ? . ? /dev/sdg WDC WDS100T2B0A-00SM50 195008A008F8 26 . ? ? . ? # more /sys/class/block/sda/queue/physical_block_size 512 # more /sys/class/block/sdg/queue/physical_block_size 4096 # zpool status zssd1tb pool: zssd1tb state: ONLINE scan: scrub repaired 0B in 0 days 00:05:55 with 0 errors on Mon Mar 2 13:24:15 2020 config: NAME STATE READ WRITE CKSUM zssd1tb ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WDS100T2B0A-00SM50_195004A00B9C ONLINE 0 0 0 ata-WDC_WDS100T2B0A-00SM50_195008A008F8 ONLINE 0 0 0 errors: No known data errors Typical error from "dmesg" [1974990.399004] ata8.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen [1974990.399009] ata8.00: failed command: WRITE FPDMA QUEUED [1974990.399013] ata8.00: cmd 61/08:d8:e0:27:70/00:00:74:00:00/40 tag 27 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [1974990.399015] ata8.00: status: { DRDY } [1974990.399018] ata8: hard resetting link [1974990.707014] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1974990.710637] ata8.00: configured for UDMA/133 [1974990.710690] sd 7:0:0:0: [sdg] tag#27 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [1974990.710693] sd 7:0:0:0: [sdg] tag#27 Sense Key : Illegal Request [current] [descriptor] [1974990.710695] sd 7:0:0:0: [sdg] tag#27 Add. Sense: Unaligned write command [1974990.710698] sd 7:0:0:0: [sdg] tag#27 CDB: Write(10) 2a 00 74 70 27 e0 00 00 08 00 [1974990.710699] blk_update_request: I/O error, dev sdg, sector 1953507296 [1974990.710703] zio pool=zssd1tb vdev=/dev/disk/by-id/ata-WDC_WDS100T2B0A-00SM50_195008A008F8-part1 error=5 type=2 offset=1000194686976 size=4096 flags=180ac0 [1974990.710708] ata8: EH complete |
Such errors almost always denote a problem due to a failing SSD/HDD. The system even tried to reset the SATA link, with the write still failing. |
shodashok - failing hdd: no (this is an ssd), failing ssd: unlikely: (a) ssd is brand new, (b) failing devices usually throw read/write i/o errors, not "illegal request - unaligned write command", (c) bum ssd firmware (hello, Intel!) - unlikely, both ssds have the same firmware, only one throws the errors. (d) overheated ssd malfunction: no, ssd is at room temperature, cool to the touch. K.O. |
additional information: moved the two problem ssds to different sata ports, now both report physical sector size 512. (one was reporting 4096). Now let's see if the errors go away. K.O. |
@dd1dd1 I had an identical problem on a Crucial MX500 SSD. After some days, the SSD controller failed catastrophically (with the SSD no more discovered on BIOS screen). So, I really doubt it is a ZFS problem. The fact that simply swapping SATA port changed the reported sector size is really suspicious. |
I'm experiencing the same issues. These errors appeared out of nothing. I highly doubt that all 4 disks die at exact the same time.
The errors appear randomly for the four disks, but only one is really failing badly withing zfs:
Without ZFS the disk sdd (slot4crypt) is working without any issues. S.M.A.R.T not showing any issues at all. |
uhm. interesting. #4873 (comment) seems to work for me, too. but WHY?! |
Finally, this turned out to be a system firmware issue. See https://review.coreboot.org/c/coreboot/+/40877 You can try this: echo maximum_performance | sudo tee /sys/class/scsi_host/host*/link_power_management_policy If that solves the problem, it's pretty likely that you have the same problem. Try to get a BIOS/UEFI update from your vendor. |
That's interesting. My "unaligned write" errors have gone away after I moved the disks from an oldish socket 1366 mobo and a brand new socket 1151 mobo. It could have been the old bios on the old mobo not initializing the SSD SATA link correctly... K.O. |
had the same problems with brand new 14TB drives on a skylake motherboard. Setting GRUB_CMD_LINUX="libata.force=3.0" fixed this issue for me. This forces the sata controller running at 3 GBIT/s - for a rusty drive still enough. I have not tested the maximum_performance switch for the ports. Waiting for a firmware/bios update... let's see |
i have the same issues on a AMD x399 Threadripper board with 2 16TB Seagate exos Sata drives. I get mostly errors on one of them, i checked them recently with seagate windows tool and both seem fine. Currently i have the HDD Enclosure+Cabling in suspicion that they can not handle the full SATA 6 Speeds in a sustained manner, i'll rework the case this weekend and move around disks and see if it helps link_power_management_policy was already on maximum_performance |
I'm getting this too on four separate machines now. Swapping drives, cables, drive bay, etc. didn't work. But putting all the drives on an HBA solved it. Supermicro H11DSi motherboard on all 4 machines, seems only to happen when using SATA controller on motherboard. BIOS already upgraded to latest. Using zfs via latest Proxmox 6.2 (zfs root/boot). A scrub can trigger it every time, though some of the machines can go a while without issues if they're not stressed. I've reached out to SuperMicro, but they don't have anything helpful yet. I'm primarily seeing this on HGST 10TB SATA drives, though I did see it a few times on an Intel SSD. |
i switched cabling and used a new sata drive cage, so far my problems are gone |
Yeah, that's what's weird about this one. I've resolved similar problems that way in the past, but it's not resolving it this time, on multiple different systems. |
Can confirm too that this happens on at least three similar hosts all with an ASRock TRX40D8-2N2T Mainboard (SATAIII via TRX14, and/or via ASM 1061) and varying hard disks and SATA cables. Can't see the same happening with FreeNAS/TrueNAS running on the same setup, and neither when the setup is altered so that the disks are attached over an add-in LSI HBA card instead of using the Mainboard connectors. Also confirming that the failure is reproducibly happening only after having sent many gigabytes of data over the SATA wires. The failures seems to suddenly affecting multipe ata links concurrently. My guess at this point is a kernel bug with some HBAs. To reproduce: boot a Debian Buster (or Proxmox) from an additional disk, and build a raidz2 over four disks attached to the TRX40 or the ASM 1061, then concurrently run Found with Proxmox kernel 5.4.73-1-pve, and still present in 5.4.114-1-pve. |
I can reproduce with a simple zpool scrub. It's true that it only happens after many GBs of data transfer, but I can trigger it every time. I was also able to eliminate the problem with an add in LSI 9300-4I HBA, but of course that's not a true fix. However, I had to get our servers stable, so I bought HBAs for all of them, and I haven't seen a single problem since. Edit: I also had to update the HBA firmware to the latest in order to avoid hard reboots under the same type of load. I would assume that's unrelated (it was NOT the same "unaligned write error" in dmesg, it was a hard freeze/reboot), but I thought I should mention it. Anyway, an HBA with fully updated firmware does seem to have eliminated the problem for me across the several servers I was having this issue with, as a workaround. |
Update: Kernel 5.10¹ from Debian Buster Backports, and Kernel 5.11 from Proxmox pvetest repo are stable on the TRX14 SATA links. They both continue to fail on the ASMedia ASM1062²; but less spectacularly so in dmesg. ¹ Debian Backports 5.10.0-0.bpo.5-amd64 |
my problems returned in a differen kind of way on my personal host where i passthrough pcie devices incl. gpu. this is the host where i originally had the problems with the two sata 16TB exos drives where the issues seem to went away after i replaced the disk cage and cabling. But it seems the problem just moved on to the nvme port. I had one Samsung 970EVO with 2TB in that port and did not use it yet and had that disk in a ZFS Pool. Almost every time i scrubbed that nvme i got following errors on random sectors
it got somehow worse and was not just limited to scrub problems. my win10 guest that passes through multiple pcie devices from the proxmox host could not initialize hardware when booting up and locked the host up solid when the windows circle rotates and windows sorts out hardware, sometimes it got a screen further and locked up on applying configuration, and even more rarely it got to desktop and then locked up and in very rare cases it locked up hours days or even a week later. i fiddled so long around and i got a few hints in the console that it seem to be a hardware initialization issue. Sometimes it wrote something into dmesg just before locking up. sometimes USB3 Adapters that were passed through but i also noticed the nvme device that was not passed through at all. This Samsung was counting up SMART inegrity errors everytime i ran zfs scrub and im right now far over In the end this was the single issue that was plaguing me for the last few months where i needed to hope and gamble to get my guest up and running. I first thought it was related to the kernel version of proxomox and the zfs module as there were some known issues that locked up the host in the latest Kernel. |
This started happening to me pretty regularly on a supermicro virtualization server (X10SDV-8C-TLN4F). I have 4 500GB crucial MX500's in a 2x2 mirrored pool with one hot spare. Every couple weeks I'm seeing the hot spare kick in. The drive that dies always dies with the error mentioned at the top of this issue (unaligned write). Soft resetting the host bus doesn't work. Unplugging the SATA cable doesn't work. You have to unplug power to the drive and re-plug in to get it back up. All 5 drives are brand new (I replaced the old SSDs when this problem cropped up thinking they were failing...guess they weren't). It's not always the same drive that this happens with (it's happened across 3 different ones). The old drives were Samsung EVO 860's and had the same problem. So...not manufacturer specific. It's still running the original BIOS so I may try updating that to see if it makes a difference. This smells like a kernel bug though rather than a ZFS bug given that the failure is at the ATA level. Unless ZFS can write something that can result in an unaligned write. |
I do have trim enabled on the pool. Perhaps it could be related to this: #8552. Though these aren't Samsung drives. I can try the noncq setting on the next reboot when I can do some downtime. |
Also, you can disable NCQ without a reboot using this technique: https://unix.stackexchange.com/a/611496 |
Disabling NCQ with the above technique (the technique that doesn't require rebooting) was not effective for me. I'm going to try disabling autotrim and just leave it to the monthly manual trim (which I believe should run in a couple of hours since the default is the first Sunday of each month). |
Nope. That didn't work either. :-P I'm starting to suspect failing hardware for me now too (or a kernel bug in a recent update). To my knowledge, the D-1541 Xeon's are a SoC so all the SATA controllers are actually in the CPU. Seems weird it would fail, but there must be some components on the MB. I'm moving them to the backplane to see if that resolves the issue. Only had them on the internal ports for speed since they're each individual SATA III ports vs the shared SATA II backplane. |
@livelace thanks for the writeup. In my case though the issue predates zstd, for whatever that's worth. |
Yep, thanks @livelace. Doesn't apply in my case either though since the current problem I'm having is with Crucial MX500's (5 of them, all brand new, 2x2 mirrored pairs and a spare). Also using lz4 rather than zstd for me. |
I do have a bit of a performance hit, but so far everything is working on the backplane whereas every day for the past week I was getting at least 1 drive failure. Prior to the switch yesterday I had 4 fail at once which required me to reboot everything. I guess it's possible that I have failing SATA ports on the motherboard. But...I don't think it's likely. Also, this started happening after applying both a kernel update and a ZFS update. I had been holding off on going from the latest 0.8.4 and updated kernel due to wanting the same ZFS version for my znapzend container. I finally got around to making a znapzend container based on the proxmox repos though so finally made the ZFS and kernel jump before this started happening. I'm not sure how best to narrow this down to a ZFS or kernel bug, but it seems like it's one of those. |
Huh. This is the same symptoms from years ago on the exact same MB I'm using, though it's with intel SSDs instead of crucial. In that case it was the drive firmware that was updated to fix it. Crucial doesn't appear to have new firmware for the MX500's I'm using though: Guess I'm just stuck with the backplane for the time being (or until I upgrade the machine?). Meh, it makes drive maintenance easier I suppose. |
I had a very similar issue, that is failed writes with the dmesg message: My Hardware configuration introduces something different from the other cases:
I was having the issue with 'scrub' on both pools. The way I solved it, or at least hope to have solved it, is by backing up all data, recreating the pools and restoring the data. There is something that probably needs to be confirmed: I have no doubt that the ZFS team verifies that a pool survives a ZFS upgrade. The question is whether there could be a corner case where strange issues happen. I mean, ZFS is a hugely complicated filesystem, and it's conceivable that something could have been introduced. Thank you. |
@lorenz Interesting, that might explain why I've gotton the unaligned write error too. I've tried everything from updating ssd firmware, maximum link power settings, to no trim, upgrading the linux kernel, downgrading the port speeds to 3gbps. Nothing worked, slower speed just meant it took longer for the problem to resurface. What has likely fixed it now is a cable replacement. I noticed that only 2 drives were failing and they had a different brand of cables. The new cables are thicker too, so probably they have much better shielding. Definitely try to replace your sata cable if you're seeing this error! Side note: zfs didn't handle random writes erroring out gently, the pool was broken beyond repair and I had to rebuild from backups. |
I am on the same boat here. System is an X570-based mainboard with FCH controllers plus an ASMEDIA ASM1166. I had ZFS errors galore and tried:
Also, I see those errors turning up only after a few hours of time (or lots of data) has passed. Using the JMB585 could still be no option, even if now drives on the motherboard controller show these errors, because I can probably limit SATA speed with that controller, which was impossible with the ASM1166. I will try that as a last-but-one resort if limiting link power does not resolve this. I hate the thought of having to use a HBA adapter consuming more power. P.S.: The JMB585 can be limited to 3 Gbps, Otherwise, no change, I still get errors on random disks. Have ordered an LSI 9211.8i now. However, this points to a real problem in the interaction between libata and ZFS. P.P.S: I disabled NCQ and the problem is gone. I did not bother to try the LSI controller. Will follow up with some insights. |
OpenZFS for Linux problem with libata - root cause identified?Just to reiterate on what I wrote about this here, I have a Linux box with 8 WDC 18 TByte SATA drives, 4 of which are connected through the mainboard controllers (AMD FCH variants) and 4 through an ASMEDIA ASM1166. They build a raidz2 running under Proxmox with a 6.2 kernel. During my nightly backups, the drives would regularly fail and errors showed up in the logs, more often than not "unaligned write errors". First thing to note is that one poster in the thread mentioned that the "Unaligned write" is a bug in libata, in that "other" errors are mapped to this one in the scsi translation code (https://lore.kernel.org/all/20230623181908.2032764-1-lorenz@brun.one/). Thus, the error itself is meaningless. In the thread, several possible remedies were offered, such as:
I am 99% sure that it boils down to a bad interaction between OpenZFS and libata with NCQ enabled and I have a theory why this is so: Now imagine a time of high I/O pressure, like when I do my nightly backups. OpenZFS has some queues of its own which are then given to the drives and for each task started, OpenZFS expects a result (but in no particular order). However, when a task returns, it opens up a slot in the NCQ queue, which is immediately filled with another task because of the high I/O pressure. That means that the sector 42 could potentially never be read at all, provided that other tasks are prioritized higher by the drive hardware. I believe, this is exactly what is happening and if one task result is not received within the expected time frame, a timeout with an unspecific error occurs. This is the result of putting one (or more) quite large queues within OpenZFS before a smaller hardware queue (NCQ). It explains why both solutions 6 and probably 7 from my list above cure the problem: Without NCQ, every task must first be finished before the next one can be started. It also explains why this problem is not as evident with other filesystems - were this a general problem with libata, it would have been fixed long ago. I would even guess reducing SATA speed to 1.5 Gbps would help (one guy reported this) - I bet this is simply because the resulting speed of ~150 MByte/s is somewhat lower than modern hard disks, such that the disk can always finish tasks before the next one is started, whereas 3 Gpbs is still faster than modern spinning rust. If I am right, two things should be considered: a. The problem should be analysed and fixed in a better way, like throttling the libata NCQ queue if pressure gets too high, just before timeouts are thrown. This would give the drive time to finish existing tasks. I also think that the parformance impact of disabling NCQ with OpenZFS is probably neglible, because OpenZFS has prioritized queues for different operations anyway. |
(I am the OP, I have some experience with linux kernel drivers, and embedded firmware development) I like how you wrote it all up, but I doubt you can bring a closure to this problem. IMO, the list of "remedies" is basically snake oil, if any of these remedies was "a solution", this bug would be closed long time ago. I think "NCQ timeout" does not explain this problem: I think we now have to wait for the libata bug fix to make it into production kernels. Then we will see what the actual error is. "unaligned write command" never made sense to me, and now we know it is most likely bogus. K.O. |
I did not imply that NCQ allows a command to be left indefinitely in itself. It can only be postponed by the hardware in that it may reorder the commands in any way it likes. This is just how NCQ works. Thus, an indefinite postponing can only be occur if someone "pressures" the queue consistenly - actually, the drive is free to reorder new incoming commands and intersperse them with previous ones - matter-of-fact, there is no difference between issuing 32 commands in short succession and issuing a few more only after some have finished. Call that behaviour a design flaw, but I think it exists and the problem in question surfaces only when some other conditions are met. And I strongly believe that OpenZFS can cause exactly that situation, especially with write patterns of raidz under high I/O pressure. I doubt that this bug would occur with other filesystems where no such complex patterns from several internal queues ever happen. As to why the "fixes" worked sometimes (or seemed to have worked): As I said, #6 and #7 both disable NCQ. Reducing the speed to 1.5 Gbps will most likely reduce the I/O pressure enough to make the problem go away and other solutions may help people who really have hardware problems. Also, I have read nobody so far who has tried to disable NCQ and not done something else alongside (e.g. reducing speed as well). I refrained from disabling NCQ first only because I thought it would hurt performance - which it did not. Thus, my experiments ruled out one single potential cause after another, leaving only the disabling of NCQ as the effective cure. I admit that I probably should wait a few more nights before jumping to conclusions, however these problem were consistent with every setup I tried so far. (P.S.: It has been three days in a row now that no problems occured) Nothing written here nor anything I have tried so far refutes my theory. I agree there is a slight chance of my WDC drives having a problem with NCQ in the first place - I have seen comments on some Samsung SSDs having that problem with certain firmware revisions. But that would not have gone unnoticed, I bet. |
Unfortunately, this patch was never applied and the issue got no further attention after a short discussion. There also seems to be no other cleanup having been done on this topic, at least I couldn't find anything related in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/ata/libata-scsi.c
I think you are correct on this, I'm seeing the error also on a system where I put a new 6.0 GBps Harddisk with 512 byte sectors in a older cartridge with old cabling (going to replace this). For this zpool status lists 164 write errors after writing about 100GB and degraded state. The harddisk increased UDMA_CRC_Error_Count SMART raw value from 0 to 3 but otherwise has no problems. The dmesg info also indicates a prior interface/bus error that is then decoded as unaligned write on the tagged command: [18701828.321386] ata4.00: exception Emask 0x10 SAct 0x3c00002 SErr 0x400100 action 0x6 frozen |
The reason it never got applied is mostly because as it turns out this is a deeper architectural issue with libata, there is no valid SCSI error code here. Sadly I'm not familiar enough with Linux SCSI midlayer to implement the necessary changes. CRC errors are not the only type of link error. You are probably losing the SATA link which causes reset/retraining which is one of the known things libata doesn't handle correctly. |
I have just got this problem on a brand new WD Red WD40EFPX. I bought it yesterday to replace the failed mirror drive. Getting these errors while ZFS is resilvering it. No question about faulty controller, cable or anything else hardware-related. The system worked for a very long time, the failed component I have replaced is the disk. The new disk is unlikely to be bad. One way or another, it is related to the new disk interaction with the old system. |
This is not really a ZFS issue, it's a hardware/firmware issue being handled badly by the Linux kernel's SCSI subsystem. This issue should probably be closed here. @ngrigoriev These errors are in pretty much all cases hardware/firmware-related. Post the kernel log if you want me to take a look at the issue. |
I was able to finish the resilvering process, but at the very end it started failing again. And this was with "libata.force=3.0 libata.force=noncq" Right after reboot:
Drive:
I understand that it is not zfs fault directly, but, interestingly enough, it is triggered by ZFS specifically. This machine has 5 HDDs. None of them demonstrated this issue for years. It only started to happen with this new WD Red drive that replaced the failed one. |
Tried everything I have read about. All combination of the options, libata.force (noncq,3.0, even 1.5). Nothing really worked. At most after a couple of hours, even after a successful resilvering of the entire drive, a bunch of errors would just appear. Also I have noticed that if I set the speed to 1,5Gbps, then other drives on this controller start getting the similar problems. SCSI power settings are forced to max_power for all hosts. I have a combination of different drives in this home NAS, some are 6Gbps, some are 3.0. What I am trying now, I have connected this new drive to the second SATA port on the motherboard instead of the PCIE SATA controller. And I have removed all libata settings. So far so good, keeping fingers crossed. If that does not help, then I am out of options. I have already changed the cable to be sure. Another controller? Well, I have already tried two different ones effectively: ASMedia and the onboard Intel one. |
Basically what's happening is that your disk does not respond within 7 or 10s (the Linux ATA command timeout) to a write command the kernel sent. ATA does not have a good way to abort commands (SCSI and NVMe do), so the kernel "aborts" the command by resetting the link. Unless you've hot-plugged the disk I suspect you have either a cabling issue or one side of the link (either the SATA controller or the disk controller) is bad as we see CRC errors on the link. This would explain the weird timeouts as the command might have been dropped due to a bad CRC. |
Yes, it seems so, and it only happens under heavy write activity, apparently. Is there a way to control this timeout? (I understand it is not the place to ask this kind of question :( ) |
But why do these errors disappear when moving to a dedicated raid card like
the older LSI? Or are they just hidden away?
În lun., 10 iun. 2024 la 04:35 Nikolai Grigoriev ***@***.***>
a scris:
… Yes, it seems so, and it only happens under heavy write activity,
apparently.
Is there a way to control this timeout? (I understand it is not the place
to ask this kind of question :( )
—
Reply to this email directly, view it on GitHub
<#10094 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGEDH5OAV4DRRJTK2AOTXTDZGT7GPAVCNFSM4K75ICA2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJVG4YDEMBUGE4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Presumably because the driver/card is SAS which is a form of SCSI which previous respondent said does not suffer from this ATA-only issue?
Joe Buehler
… On Jun 10, 2024, at 2:39 AM, fcmircea ***@***.***> wrote:
But why do these errors disappear when moving to a dedicated raid card like
the older LSI? Or are they just hidden away?
|
Looks like a consumer vs server grade hardware hardware issue in the end.
În lun., 10 iun. 2024 la 10:04, G8EjlKeK7CwVQP2acz2B <
***@***.***> a scris:
… Presumably because the driver/card is SAS which is a form of SCSI which
previous respondent said does not suffer from this ATA-only issue?
Joe Buehler
> On Jun 10, 2024, at 2:39 AM, fcmircea ***@***.***> wrote:
>
> But why do these errors disappear when moving to a dedicated raid card
like
> the older LSI? Or are they just hidden away?
—
Reply to this email directly, view it on GitHub
<#10094 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGEDH5IWYLVMB2QOGRW7ID3ZGVFWDAVCNFSM4K75ICA2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJVG42DSNJTGQ3Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I'll chime in and say that my issues went away after upgrading my hardware, also specifically going from consumer to server grade hardware, but my disks are not using a dedicated HBA. The issue only arose after upgrading from Ubuntu 18.04 to 22.04, and I'm primarily faulting the old PSU at this point for wreaking havoc on my disks and possibly damaging an HBA that I trialed on the old consumer-grade motherboard. |
@ngrigoriev Generally for ZFS you should be using ATA Error Recovery Control so that the disk does not delay command completion for too long. Your disk should have that feature. With that you should never hit the timeouts because the disk takes too long. But considering I'm seeing CRC errors on your disk I think the command might not actually have arrived at all, ERC does not help there. The only way forward is a link reset, which Linux does perform. Problem is again that the SAT (SCSI/ATA translation layer)'s error handler (EH) is implemented incorrectly on Linux, thus you get visible faults in ZFS from this. The unaligned write code is completely bogus, it basically indicates that the link has been reset. |
These UDMA_CRC_Error_Count does not seem to correlate to the number of errors ZFS reported. It stays at 8 now, while I had many hundreds of errors reported by ZFS yesterday. It appears that moving this particular new drive to the motherboard's controller (same SATA cable) has resolved the issue. Not a single error overnight for any ZFS pool, And the disk works at 6 Gbps. With NCQ not disabled :) Previously the system worked well with the same number of disks all on the PCIe controller based on ASMedia 1060, without a single error. What has broken the balance was the new WD Red drive replacing the dying Seagate Barracuda. Active writing to this disk seem to cause the failed commands. Different commands, actually.
Most of them are timeouts, few are "ATA bus error". It seems that going forward I will need to find a replacement SATA controller. More of the old disks will die, to be replaced by the newer ones. And it seems that the newer disks bring the troubles.... |
I have replaced the controller, found a reasonably priced LSI SAS2008 on ebay. Moved all the disks there, except one, no errors for a month. Ironically, I had to keep one of the disks on the onboard controller. Not the one that was misbehaving with ZFS! Turned out, that smart super-capable LSI does not recognize some SATA drives, like Seagate IronWolf, at least certain models. And there is nothing you can do with that at all :) To be honest, this experience did shake my confidence ;) I was assuming that in 2024 (or even in 2010) we should not have any compatibility or stability problems with SATA and Linux. |
It has been a while since the original problem turned up. In my case, it was definitely neither caused by a hardware issue with the drives, cabling nor by a driver / chipset problem, see also this issue: However, I think that the original problem was within OpenZFS alone and may be fixed by now: What I found is the hint to this pull: #15414 leads to this pull: #15588. That pull has some interesting notes which could explain the errors reported here completely and are not too far off my suspicions about I/O pressure causing this. In the end, the final pull that has been accepted was #16032 and it is contained in OpenZFS 2.2.4. To verify that the fix is present in your applicable Linux version, you can look cat "/sys/module/zfs/parameters/zfs_vdev_disk_classic". If this is present and has the value 1, this fix is present and you may probably safely remove the libata.force=noncq. I did this on my Proxmox 8.2.4 installation, which now has OpenZFS 2.2.4 under the hood, instead of 2.12 in late 2023. If you still experience problems, their root cause may be something different than the underlying OpenZFS problem from 2020. I think that this issue should be closed now and have done that for 15270. |
Not quite: zfs_vdev_disk_classic=1 is to use the "classic" version, that is, the same code that has existed since forever. Set it to 0 to use the new submission method. (I have no opinion on this particular issue; just pointing out the inverted option). |
Not sure is my problem related, but it looks like it. I start to observe frequent CKSUM errors after upgrade ZFS to test DirectIO merged recently. Disks are 100% OK and doesn't show any signs of trouble (in both SMART and in stress testing) and all machines have ECC memory, etc. It looks like:
After that same file are show in
|
@morgoth6 I actually have been able to reproduce this myself. I am working on a patch for it right now. Should have a PR open to master in the next day or two to address this. |
I think my WD Red (not PLUS) drives died in June this year, due to this problem. |
That seems unlikely. These errors could be FROM a dying drive but wouldn't hurt your drive. If you got unaligned write errors on a healthy drive the result would be a failed write meaning your drive wouldn't physically be doing anything. |
Reporting an unusual situation. Have ZFS mirror array across two 1TB SSDs. It regularly spews "Unaligned write command" errors. From reading reports here and elsewhere, this problem used to exist, was fixed years ago, not supposed to happen today. So, a puzzle.
It turns out that the two SSDs report different physical sector size, one reports 512 bytes, one reports 4096 bytes. Same vendor, same model, same firmware. (WTH?!?)
zpool reports default ashift 0 (autodetect_.
zdb reports ashift 12 (correct for 4096 sectors)
So everything seems to be correct, but the errors are there.
The "unaligned write command" errors only some from the "4096 bytes" SSD. After these write errors, "zpool scrub" runs without errors ("repaired 0B"). Two other zfs mirror pools on the same machine run without errors (2x10TB and 2x6TB disks, all report 4096 physical sector size).
K.O.
The text was updated successfully, but these errors were encountered: