Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system.mem.used doesn't agree with procps free(1) #3106

Open
mfischer-zd opened this issue Jan 5, 2017 · 11 comments
Open

system.mem.used doesn't agree with procps free(1) #3106

mfischer-zd opened this issue Jan 5, 2017 · 11 comments

Comments

@mfischer-zd
Copy link
Contributor

mfischer-zd commented Jan 5, 2017

As of a very old commit (ca59049) system.mem.used no longer agrees with the used column of free in procps. (Oddly, the comment in the commit says the change is to be consistent with procps, but it isn't.) This should be fixed right away, because the system.mem.used metric is currently too high in many cases, impeding our diagnoses of problems.

The algorithm can be found at https://gitlab.com/procps-ng/procps/blob/master/proc/sysinfo.c#L679-797

In sum:

mem_used = kb_main_total - kb_main_free - kb_main_cached - kb_main_buffers;
if (mem_used < 0)
  mem_used = kb_main_total - kb_main_free;

See #64 where this issue was first raised.

@pjoakimsson
Copy link

This calculation is now done in system.py:

if 'MemAvailable' in meminfo:
memData['physUsable'] = int(meminfo.get('MemAvailable', 0)) / 1024
else:
# Usable is relative since cached and buffers are actually used to speed things up.
memData['physUsable'] = memData['physFree'] + memData['physBuffers'] + memData['physCached']
if memData['physTotal'] > 0:
memData['physPctUsable'] = float(memData['physUsable']) / float(memData['physTotal'])

If you are using kernel 3.14+ you shouldn't be affected as the check is using the kernel's own estimation of available memory (MemAvailable in /proc/meminfo).

Issue arise when you run on kernels without MemAvailable, where agent does not take at lest reclaimable slabs (SReclaimable) into consideration.

I've reported this issue to support back in November.

@mfischer-zd
Copy link
Contributor Author

@pjoakimsson Are we talking about the same thing? I'm referring to the system.mem.used metric, not the system.mem.available metric.

@remh
Copy link
Contributor

remh commented Jan 11, 2017

Hi @mfischer-zd . It's because at the time of this commit procps was doing

kb_main_used = kb_main_total - kb_main_free;

See:
https://gitlab.com/procps-ng/procps/blob/v3.3.0/proc/sysinfo.c#L663

That's why we introduced the system.mem.usable metric.

It seems that since procps has changed that.

I'm going to close this issue as fixing that would induce a big change in the meaning of the metric and we already have a metric that reports the right thing.

@remh remh closed this as completed Jan 11, 2017
@mfischer-zd
Copy link
Contributor Author

What is the metric that "reports the right thing"? system.mem.used is not reporting the correct value; it is that which we need fixed.

@remh
Copy link
Contributor

remh commented Jan 11, 2017

Hey sorry for the confusion.

So what happened is that, traditionally linux memory tools (procps' free) was including shared and cached memory in its total used memory (In practice it was just getting the total amount of memo memory and subtracting the amount of memory in used reported by /proc/meminfo, See old versions of procps: https://gitlab.com/procps-ng/procps/blob/v3.3.0/proc/sysinfo.c#L663).

We decided to mimic that behavior in our system.memory.used metric to be as close as possible as existing tools.

However, as you know, this metric is not that useful as cached and buffered memory can be reclaimable to be used by the OS / processes.

That's why we introduced a system.mem.usable metric:
https://github.com/DataDog/dd-agent/blob/master/checks/system/unix.py#L375-L379
For recent kernels that exposes that we use that information directly, for other we compute it itself by adding the sum of free + buffered + cached memory.

However it seems that procps changed that 2 years ago in the following commit:
https://gitlab.com/procps-ng/procps/commit/6cb75efef85f735b72e6c96f197f358f511f8ed9

We did not change our metric because:

So tl;dr:

  • system.mem.usable is the best metric to watch memory usage
  • system.mem.total - system.mem.usable would show the actual value of memory being in use.

@mfischer-zd
Copy link
Contributor Author

We think the actual amount of used memory should be graphed in system.mem.used, regardless of the other metrics available. This is not an unreasonable request. I disagree with your characterization of the current calculation as "expected" - it is documented, but that's a different thing.

Even free(1), in the version you cited, provided an adjustment (https://gitlab.com/procps-ng/procps/blob/v3.3.0/free.c#L295-302) to subtract buffers and cache.

There's an open question as to whether preserving the current formula is a good thing for your remaining customer base. My feeling is that they'd benefit more from future accuracy than in preserving the broken behavior of previous versions. Besides, past data wouldn't change; only metrics submitted after an agent upgrade would.

However, if any dashboards already correct for the existing incorrect data, those would begin to break, which is a shame. On the other hand, I don't think forcing the user to make corrections in dashboards should be encouraged as a way to solve the agents' misbehavior.

To keep multiple parties happy, perhaps it's worth making a compromise such as providing a new metric like system.mem.used_corrected or a similar name that provides the expected value.

@pjoakimsson
Copy link

@mfischer-zd
well, sort of both yes and no.

The mem_used in new procps you are refering to does not count cache and buffers as used memory. To me cache and buffers are still used memory, thus dd-agent (and kernel) is displaying the correct value.

You could use system.mem.total - system.mem.usable to get a value that matches the new procps, but it will only be correct if you use newer kernel versions.
The reason for this is that dd-agent isn't taking SReclaimable into consideration of which I have reported.

@mfischer-zd
Copy link
Contributor Author

Suppose "used" memory included buffers and cache (which Datadog's metric does today, but procps no longer does). Why would you want to graph it? What would such a value suggest when it reaches the amount of installed (total) memory? The natural conclusion, particularly for a less-experienced observer, would be that the system is running out of memory and that immediate action needs to be taken.

But that is not necessarily so, and that is the fallacy that http://www.linuxatemyram.com/ tried to dispel. That's also why procps started discounting for buffers and cache as long as 5 years ago, and no longer includes it at all in more recent versions.

We want metrics that are not merely accurate (for whatever sense of the word you want to use), but are useful. But, as @remh tacitly admits, system.mem.used is not useful as it is currently implemented.

A useful graph of memory on Linux looks like this (perhaps with some amendments):

screen shot 2017-01-12 at 8 22 48 am

The blue area, used memory, should be what system.mem.used reports. But instead I have to discount buffered and cached, then re-add them in the stack graph.

The user shouldn't have to go through these sorts of gymnastics to get something like this. Nor should the user have to risk embarrassing herself when she submits a plot of system.mem.used to make an argument that we need more memory, only to find that the value was incorrect or misleading.

@remh
Copy link
Contributor

remh commented Jan 13, 2017

Agreed, we can improve things to make clearer what's useful. Reopening this for discussion.

We are going to make the default host dashboard use memory usable as a first step. (should be done beg of next week)

We are considering adding a new metric, however it doesn't seem necessary as we have the usable memory.

We could add a new metric (that had yet to be named) that would be system.mem.total - system.mem.usable.
But it might end up confusing things even more ? Happy to hear the different points of view.

@remh remh reopened this Jan 13, 2017
@mfischer-zd
Copy link
Contributor Author

If you don't create a new metric or correct its calculation in the agent, anyone who wants to plot the discounted value of used memory (which is the correct value for most diagnostic purposes) will have to create an arithmetic expression in the UI to plot it. This is a suboptimal user experience, in my view.

There are a few different options here:

  1. Correct the computed value of used memory in the agent, and inform users that once they upgrade to the release with the change that the reported value will change.

  2. Add a configuration option to the agent that, if enabled, reports the corrected computed value for used memory. To ease the transition, set the value to false by default. Determine a transition period, after which the value will be set to true by default, and warn the users of such.

  3. Make the agent submit multiple metrics, one with the undiscounted value, and one with the discounted value under a different name.

Solution (2) seems to achieve the best balance between correcting the behavior and giving users the opportunity to adjust to it in due course.

@irabinovitch
Copy link
Contributor

@mfischer-zd I wanted to follow up on our conversation regarding this issue. My 0.02 would be that of the outlined options #3 would be preferred. As a general rule of thumb, I believe we should not change the definition or behavior of a metric once it has been released; and metrics should have consistent definitions. In the past we've renamed metrics if a change in behavior was required, and I believe we should stick with this approach.

With that in mind I think we should come to an agreement on a name for the a undiscounted memory metric, while we look into implementing it in a future agent release. I have made this similar recommendation internally.

To be transparent on timelines though:

  • 5.11.1 is being released this week is an APM only release.
  • 5.12 is scheduled to be released in Feb/March but I believe this is under feature freeze as it is primarily intended to split integrations out and provide tooling to allow installation of individual releases.

As far as next steps, I am discussing with our product team and agent engineering team what the timeline for the following release will be, and how this feature request may fit in. I believe the Integration SDK work may be be able speed this process along, but I'll check back in with you here on this issue next week with updates on my findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants