-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add cgroup memory usage/limit to OS stats on Linux #26166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cgroup memory usage/limit to OS stats on Linux #26166
Conversation
c468f38 to
79ba58f
Compare
|
The mixed cluster tests are failing in the CI build of this PR. The reason is that I am planning to backport this to 6.1 and have set up the version checks in the serialisation code accordingly, but since this is not yet merged to 6.1 it makes the BWC cluster crash. |
This change adds cgroup memory usage/limit to the OS stats section of the node stats on Linux. This information is useful because in Docker containers the standard node stats report the host memory limit, not taking account of extra restrictions that may have been applied to the container.
This matches what is done for cpu and cpuacct
79ba58f to
e0cf08a
Compare
|
One way to deal with this is to set the serialization version and skips in master to 7.0.0, then backport to 6.x with these set to 6.1.0, then push one more commit to master moving to 6.1.0. With this method, it's possible to have green builds every step of the way. This is preferred. |
|
Thanks for the tip @jasontedor. I have changed the skips to 7.0.0-alpha1 and now the CI is passing. |
* master: (278 commits) Move pre-6.0 node checkpoint to SequenceNumbers Invalid JSON request body caused endless loop (elastic#26680) added comment fix line length violation Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions. inner hits: Do not allow inner hits that use _source and have a non nested object field as parent Separate Painless Whitelist Loading from the Painless Definition (elastic#26540) convert more admin requests to writeable (elastic#26566) Handle release of 5.6.1 Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692) Update global checkpoint with permit after recovery Filter pre-6.0 nodes for checkpoint invariants Skip bad request REST test on pre-6.0 Reenable BWC tests after disabling for backport Add global checkpoint tracking on the primary [Test] Fix reference/cat/allocation/line_8 test failure [Docs] improved description for fs.total.available_in_bytes (elastic#26657) Fix discovery-file plugin to use custom config path fix testSniffNodes to use the new error message Add check for invalid index in WildcardExpressionResolver (elastic#26409) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in another channel, I think we should avoid losing the ability to compare what a user sees in /sys/fs/cgroup/memory versus the output from the stats APIs. Otherwise, I'm good with this PR.
The original idea was to store these values as Long, truncating any values outside the range of long. However, this meant that in the relatively common case of no limit being applied, users would not see the same value in the OS stats as they see by querying Linux directly. This change places a burden on consumers of the strings to convert the strings to numbers and decide what to do about extremely large values, but there will be very few consumers and they would need to have a policy for dealing with "no limit" in any case.
|
@jasontedor as discussed I changed the type of the cgroup memory stats to Please could you take another look? |
jasontedor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks for reviewing @jasontedor. |
This change adds cgroup memory usage/limit to the OS stats section of the node stats on Linux. This information is useful because in Docker containers the standard node stats report the host memory limit, not taking account of extra restrictions that may have been applied to the container. The original idea was to store these values as Long, truncating any values outside the range of long. However, this meant that in the relatively common case of no limit being applied, users would not see the same value in the OS stats as they see by querying Linux directly. So instead the values are stored as String. This change places a burden on consumers of the strings to convert the strings to numbers and decide what to do about extremely large values, but there will be very few consumers and they would need to have a policy for dealing with "no limit" in any case.
This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux. This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.