-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invoke-IcingaCheckDiskHealth error on __CreateDefaultThresholdObject #199
Invoke-IcingaCheckDiskHealth error on __CreateDefaultThresholdObject #199
Comments
Thanks for the issue. Might this be fixed with Icinga/icinga-powershell-framework#277 as well? |
unfortunately not:
|
Can you please navigate inside At Write-IcingaConsoleNotice -Message 'InputValue: "{0}"' -Objects $InputValue; Afterwards run your command again: Write-IcingaFrameworkCodeCache; icinga { Invoke-IcingaCheckDiskHealth; }; You should receive an output of all input values send to the compare function. Which of them is causing the exception? It should in general follow directly the invalid value. |
here is the shortened list:
|
Where is this hashtable collection coming from. This is interesting. Could you please modify the line to this: Write-IcingaConsoleNotice -Message 'InputValue: "{0}" Hashtable: "{1}"' -Objects $InputValue, ($InputValue | Out-String); |
|
Can you please test the linked PR for the Framework? This should resolve the issue. |
the exception is gone, but the disk operational status of some disks is "System.Collections.Hashtable+ValueCollection"
|
We need to figure out on why on certain disks the value is set to a hashtable instead of a proper value. Is this occurring on all systems or only certain ones? |
the behavior only occurs on Storage-Space-Direct (S2D) Nodes but on all of them. I've got a S2D Cluster with identical hardware. Every single node has a mirrored C Partition (hardware raid1 with hotspare, so Windows will only see 1 disk) and 5 additional SSDs for S2D. Windows should see 6 physical disks but the check shows 7 disks + _toal ( 8 ) how can i find out which disk is no. 6 ? |
it looks like disk number 6 is the S2D-Pool when i check logical disks only it shows disk 0 (C) und 6 (S2D Pool) and some errors:
|
This issue will be addressed with v1.7.0. I have a working version available, which requires additional tuning. Within the next days we can ship a PR for testing on those systems. Thanks for all the input! |
Can you please add some more detailed informations about the disks like volumename (if its a CSV) or serialnumber in the output? |
That should be something doable, as the entire disk information are already present. They just need to be added. |
with 1.7.0 the errors are gone but now the checks is always critical on a S2D Cluster on the S2D-Master i get this output:
and on the other Nodes the output is:
as you can see on the Master there are 2 more disks (Disk 6 and 7 are Cluster Shared Volumes) wich have the operational Status: |
Atleast it looks a lot better than before for the output. Now we just need to figure out on why the status is |
i've updatete the framework (1.7.1) and plugins (1.7.0) on another Server (2019 Standard) wich is using Storage Spaces (not S2D (this is not a cluster memeber)) and there is the same error in Disk 14 (wich is a healthy volume on the storage pool)
and the serial number info is missing on all of the physical disks :( so its really hard to know which number in the list represents wich physical disk |
Thank you! The problem for the missing metadata is simply that there disks do not provide them with the used functions we use. I don't know which type of disks these are, but if no data is available, it will not be represented there. Honestly I have no idea on how to resolve this problem, but adding a flag to ignore unknown states on operational status at the moment. Sadly I have no test environment to exactly test this issue and work on how to "resolve" it properly, for comparing different values or states for example. Anyone else got an idea? |
a "get-physicaldisk" shows all the metadata (friendlyname, SerialNumber etc)
such a "ignore unknown states" switch would help much so, the check is no more critical anymore |
We do not use We use CIM/WMI to fetch the data. How is the result with this command? Get-IcingaWindowsInformation MSFT_PhysicalDisk -Namespace 'root\Microsoft\Windows\Storage' | Select-Object SerialNumber, DeviceId, FriendlyName, CanPool, MediaType; |
|
Can you please test the linked PR with a new version of the plugins? I did some re-work on how the fetched data is processed. As we used the PhysicalDisk as our main entry point, all other disks were not processed with metadata. Please do a before and after comparison, to ensure the data is correct. With some luck, the Unknown on the Operational Status is fixed as well. |
here are the results on a Server using Storage Spaces:
the cmdlet now shows the friendly name of any disk but the serial number is still missing on most of the disks: disk 0-12 are disks of a storage pool (with 2 hotspares and ssd cache) - serial numbers are missing on every disk the serials numer is only shown on disks with a volume: C (disk 13), D (disk 14) and E (disk 15) and the cmdlet now runs wait more longer than before. without the PR is takes about 20 seconds to get a result, and now it runs about 80 seconds and on a Storage Space Direct (S2D) Cluster master the check is not critical anymore, but there are still no informations about the disk name/serial:
the output of
|
Thank you for the testing. I updated the linked PR - the SerialNumber issue should be fixed now. For the S2D Server: The problem here is, that the DeviceId is not matching the actual Disk id. Which I never thought would happen. Could you please run $MSFT_Disks = Get-IcingaWindowsInformation MSFT_PhysicalDisk -Namespace 'root\Microsoft\Windows\Storage' ;
$MSFT_Disks[1] | ConvertTo-Json -Depth 2; The If I can't match the disk metadata to the performance counter metrics, we might have a problem. |
And for the Performance Problem: This will be fixed once we add a patch to the Framework hopefully. Let's get the disk health to work first, then we can work on this issue. I already have a solution in mind for that :D |
The problem is - as far as I can tell - that I don't know, which data is present on Cluster Disks and Regular Disks. The Cluster counters are not present, if you have no Clusters available and would cause many more problems in addition for sorting all the objects and working together with that. I have no cluster system here to test this, but are the counters itself named identical? Because both will have to work - regular disks and cluster disks. An alternative: Use the disk metadata prioritised to the performance counters and display them enforced. In case we can map the performance counters, we add them to the disks and if not, we just add the counters without the metadata. |
here are the counter names:
you could add a switch to show only physical disk data or clusterdisk data |
I would propose the following:
I would rather not put too many things into one plugin and use a dedicated cluster disk health plugin for this instead of over-extending the disk health plugin. For performance issues, that one is fixed: Icinga/icinga-powershell-framework#388 |
that sounds like a good plan! |
i've noticed that the check is now hiding data for the virtual disk(s) created on the storage space on the Storage Space Server (not a S2D Cluster Node) it only shows this:
before the PR more meta data was listed -> #199 (comment) |
Is this issue still present? |
yep, there was no new version after my last comment |
Hello, Today Disk 4 and 5 encountered the same issue: [CRITICAL] Physical Disk Package: 3 Critical 4 Ok [CRITICAL] Disk #3, Disk #4, Disk #5 I ended up with excluding those 3 Disks from the Check, so "real" issues will be still notified, but this workaround isn't the best Idea. |
Well it turned now from Hastable Problem to the initial Thraed Topic: [CRITICAL] Physical Disk Package: 3 Critical 3 Ok [CRITICAL] Disk #0, Disk #4, Disk #5 |
Version 1.5.0
i got the following error on a single 2019 S2D Node
on the other node of that cluster the cmdlet is working, maybe its becouse this node is not the master of the cluster shared volume?
The text was updated successfully, but these errors were encountered: