-
-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore UltraDMA CRC Error Count unless it is increasing? #364
Comments
thats interesting. There's a couple of other issues where users have requested the ability to "mute" notifications for specific SMART attributes and set custom failure thresholds . I think this falls under a similar category. I'll keep this open for now, but I may merge/close this issue as a dupe in the future. |
Gonna subscribe to follow this, but I wanna point out that this can be a show stopping issue, as a bad sata cable can cause the CRC error count to rise, and unless I missed the flag to do so somewhere, there is no way to reset this. In my case I had a bad sata controller cause all 5 of the disks I currently use to increment this value. Since Scrutiny considered any number present to be a drive failure, I get no meaningful information from the dashboard, as all 5 drives continue to report as faulty from the moment I spun up Scrutiny. |
Very much agree here. There are many similar metrics which need to be overridden. a CRC Error is often a bad cable (in my case I reseated the drive in an enclosure). I have a single command timeout (due to the USB bus being reset), 13 CRC errors (due to a bad cable) and oddly a warning spin-up time of 91 (though that's also the normalized value and the same across all drives). This is identical across 8 (Seagate IronWolf) drives. I have no reason to believe they all have the identical failure. Currently scrutiny shocks me with failing drives, but these numbers aren't increasing now that the issue is addressed. The ability to either identify when they're not increasing by setting the new baseline-threshold to the current value, or otherwise, is necessary. Otherwise scrutiny status gets turned off and we rely on smart data only. |
I agree with the comments here. I used a JMicron bridge chip (JMS561) which did a bad SATA command translation and SMART registered this error at
However, it was an error but now it's not and everything is working fine. Perhaps it can be treated as information for this value rather than an error. |
This question is still important. |
@AnalogJ
|
Just come across exactly the same issue - I've setup Scrutiny for the first time and found one of my disks has a CRC error count of 27. This disk is over 5 years old, so that could've been any time. However as a result, that disk is considered to be failing, and I'm notified as such with no way of dismissing or setting this value as the 'new normal'. The concept of setting a value to be the 'new normal' seems to be used by CheckMK for these scenarios: https://forum.checkmk.com/t/udma-crc-errors-not-resetting/32068. That way the alerts don't have to be permentantly muted, nor does a different arbritrary limit have to be set. This same problem was also mentioned in the more recent #553 |
This would be super useful; I currently have to keep scrutiny's metrics off because otherwise it perpetually says failed just because I had a bad seating a few years ago. |
Hi,
I'm running the latest image (ghcr.io/analogj/scrutiny:master-omnibus) in docker on Unraid. One of my disks had an issue a long time ago that was due to a bad cable and as a result, the "UltraDMA CRC Error Count" is elevated (87). Scrutiny reports this as a failed disk even though the value is not incrementing. Should this be reported as a failed disk when it's working fine and as far as I know, at no increased risk of failure? If so, is it possible to mark it as "accepted" and then monitor for the value incrementing?
Thanks for a great app btw.
The text was updated successfully, but these errors were encountered: