Skip to content

AddResourceUtilizationHealthCheck crashes on Linux '/sys/fs/cgroup/user.slice/memory.current' is 0 #6232

@THEgaDJet

Description

@THEgaDJet

Description

We have a healthcheck endpoint and the liveness probe is failing as it is configured to use the memory health check.

Adding the memory diagnostic in AddResourceUtilizationHealthCheck is causing container to fail on startup with the following error:

Unhandled exception. System.InvalidOperationException: We tried to read '/sys/fs/cgroup/user.slice/memory.current', and we expected to get a positive number but instead it was: '0
'.
   at Microsoft.Shared.Diagnostics.Throw.InvalidOperationException(String message)
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetMemoryUsageInBytesFromSlices(String pattern)
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationParserCgroupV2.GetMemoryUsageInBytes()
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.Linux.LinuxUtilizationProvider.GetSnapshot()
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService..ctor(ISnapshotProvider provider, ILogger`1 logger, IOptions`1 options, IEnumerable`1 publishers, TimeProvider timeProvider)
   at Microsoft.Extensions.Diagnostics.ResourceMonitoring.ResourceMonitorService..ctor(ISnapshotProvider provider, ILogger`1 logger, IOptions`1 options, IEnumerable`1 publishers)

The code here looks like we should be looping through all memory.current in all *.slices.
However this line seems to imply that if any file is 0 or null, an exception will be thrown because GetNextNumber returns -1 if cat user.slice/memory.current returns 0.

Here is an example output from my running pod - I would expect this to succeed, as the second returns a number that is not 0 or null?

service@<pod>:/sys/fs/cgroup$ cat user.slice/memory.current
0
service@<pod>:/sys/fs/cgroup$ cat system.slice/memory.current
13601837056
service@<pod>:/sys/fs/cgroup$

Reproduction Steps

Include resource utilization health check and start the web app

services.AddHealthChecks()
    .AddResourceUtilizationHealthCheck(options =>
    {
        options.MemoryThresholds = new ResourceUsageThresholds
        {
            DegradedUtilizationPercentage = 80,
            UnhealthyUtilizationPercentage = 90,
        };
    })

Expected behavior

Application should run and health check should report the correct memory number

Actual behavior

Application crashes

Regression?

This works on windows

Known Workarounds

Remove the health check

Configuration

.NET 8.0 web app

Kubernetes

PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Other information

The code here looks like we should be looping through all memory.current in all *.slices.
However this line seems to imply that if any file is 0 or null, an exception will be thrown because GetNextNumber returns -1 if cat user.slice/memory.current returns 0.

Here is an example output from my running pod - I would expect this to succeed, as the second returns a number that is not 0 or null?

service@<pod>:/sys/fs/cgroup$ cat user.slice/memory.current
0
service@<pod>:/sys/fs/cgroup$ cat system.slice/memory.current
13601837056
service@<pod>:/sys/fs/cgroup$

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-resourcemonitoringbugThis issue describes a behavior which is not expected - a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions