-
Notifications
You must be signed in to change notification settings - Fork 836
Description
Description
We have a healthcheck endpoint and the liveness probe is failing as it is configured to use the memory health check.
Adding the memory diagnostic in AddResourceUtilizationHealthCheck
is causing container to fail on startup with the following error:
Unhandled exception. System.InvalidOperationException: We tried to read '/sys/fs/cgroup/user.slice/memory.current', and we expected to get a positive number but instead it was: '0
'.
at Microsoft.Shared.Diagnostics.Throw.InvalidOperationException(String message)
at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetMemoryUsageInBytesFromSlices(String pattern)
at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetMemoryUsageInBytes()
at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider.GetSnapshot()
at Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService..ctor(ISnapshotProvider provider, ILogger`1 logger, IOptions`1 options, IEnumerable`1 publishers, TimeProvider timeProvider)
at Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService..ctor(ISnapshotProvider provider, ILogger`1 logger, IOptions`1 options, IEnumerable`1 publishers)
The code here looks like we should be looping through all memory.current
in all *.slices.
However this line seems to imply that if any file is 0 or null, an exception will be thrown because GetNextNumber
returns -1 if cat user.slice/memory.current
returns 0.
Here is an example output from my running pod - I would expect this to succeed, as the second returns a number that is not 0 or null?
service@<pod>:/sys/fs/cgroup$ cat user.slice/memory.current
0
service@<pod>:/sys/fs/cgroup$ cat system.slice/memory.current
13601837056
service@<pod>:/sys/fs/cgroup$
Reproduction Steps
Include resource utilization health check and start the web app
services.AddHealthChecks()
.AddResourceUtilizationHealthCheck(options =>
{
options.MemoryThresholds = new ResourceUsageThresholds
{
DegradedUtilizationPercentage = 80,
UnhealthyUtilizationPercentage = 90,
};
})
Expected behavior
Application should run and health check should report the correct memory number
Actual behavior
Application crashes
Regression?
This works on windows
Known Workarounds
Remove the health check
Configuration
.NET 8.0 web app
Kubernetes
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
Other information
The code here looks like we should be looping through all memory.current
in all *.slices.
However this line seems to imply that if any file is 0 or null, an exception will be thrown because GetNextNumber
returns -1 if cat user.slice/memory.current
returns 0.
Here is an example output from my running pod - I would expect this to succeed, as the second returns a number that is not 0 or null?
service@<pod>:/sys/fs/cgroup$ cat user.slice/memory.current
0
service@<pod>:/sys/fs/cgroup$ cat system.slice/memory.current
13601837056
service@<pod>:/sys/fs/cgroup$