-
-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Lots of disks marked as "failed" #336
Comments
Hey @joe-eklund Yeah Samsung drives seem to report some of their SMART data in a non-standard way, which definitely causes issues for some users. In your case it looks like:
This is definitely a common request, and something I'm working on (as I find time). It's currently tracked in #275 While you cannot yet configure the failure status in the dashboard, you can configure how/when you get notified -- limiting to only |
I went and looked and a handful of drives don't even have that value at all (I guess they must be a different model or have a different firmware, even though they are still all Exos 10 TBs). Others have 0 as the number of the value, some have a
Looks like I have three drives marked as failed in Scrutiny that have
I see. I will go at least turn on failure notifications for critical only. That is definitely an improvement. I will keep an eye on #275 for disabling scrutiny analysis on non critical attributes. |
I've too noticed some of my disks are reporting failed and they are all exclusively Seagate. Would it be possible to implement a warning status that would be raised for non critical metrics that are above the thresholds? I really would only want to see disks marked as failed when they are having data integrity issues or have stopped working. |
Hi!
Sorry for the bad news, but this is a known issue and there's just not much we can do on the Scrutiny side. |
Describe the bug
I have 24 Seagate 10 TB exos drives. 11 of 24 are marked as "failed" in the Scrutiny dashboard. When inspected, none of the 11 have any
critical
attributes marked as failed. They all have one or both marked as failed forHardware ECC Recovered
andHigh Fly Writes
.I have extensively read through #255, https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#seagate-drives-failing, and some other issues that referenced similar things. Looks like Seagate has been a problem child.
This makes me question if these are "incorrectly" marked as failed or not. I will say I followed the troubleshooting instructions and I had started out with 12 disks marked as failed, then it dropped to 11 after I followed the recommendations at https://github.com/AnalogJ/scrutiny/blob/master/docs/TROUBLESHOOTING_DEVICE_COLLECTOR.md#seagate-drives-failing.
So my two questions are:
Screenshots:



My collector YAML looks like:
I can provide a log file(s) if needed. Thanks!
The text was updated successfully, but these errors were encountered: