Skip to content

Conversation

@droberts195
Copy link

This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux. This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.

@droberts195 droberts195 requested a review from jasontedor August 11, 2017 15:12
@droberts195 droberts195 force-pushed the add_memory_cgroup_to_stats branch from c468f38 to 79ba58f Compare August 11, 2017 15:40
@droberts195
Copy link
Author

The mixed cluster tests are failing in the CI build of this PR. The reason is that I am planning to backport this to 6.1 and have set up the version checks in the serialisation code accordingly, but since this is not yet merged to 6.1 it makes the BWC cluster crash.

David Roberts added 6 commits August 14, 2017 10:07
This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux.  This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.
This matches what is done for cpu and cpuacct
@droberts195 droberts195 force-pushed the add_memory_cgroup_to_stats branch from 79ba58f to e0cf08a Compare August 14, 2017 09:07
@jasontedor
Copy link
Member

One way to deal with this is to set the serialization version and skips in master to 7.0.0, then backport to 6.x with these set to 6.1.0, then push one more commit to master moving to 6.1.0. With this method, it's possible to have green builds every step of the way. This is preferred.

@droberts195
Copy link
Author

Thanks for the tip @jasontedor. I have changed the skips to 7.0.0-alpha1 and now the CI is passing.

* master: (278 commits)
  Move pre-6.0 node checkpoint to SequenceNumbers
  Invalid JSON request body caused endless loop (elastic#26680)
  added comment
  fix line length violation
  Moved the check to fetch phase. This basically means that we throw a better error message instead of an AOBE and not adding more restrictions.
  inner hits: Do not allow inner hits that use _source and have a non nested object field as parent
  Separate Painless Whitelist Loading from the Painless Definition (elastic#26540)
  convert more admin requests to writeable (elastic#26566)
  Handle release of 5.6.1
  Allow `InputStreamStreamInput` array size validation where applicable (elastic#26692)
  Update global checkpoint with permit after recovery
  Filter pre-6.0 nodes for checkpoint invariants
  Skip bad request REST test on pre-6.0
  Reenable BWC tests after disabling for backport
  Add global checkpoint tracking on the primary
  [Test] Fix reference/cat/allocation/line_8 test failure
  [Docs] improved description for fs.total.available_in_bytes (elastic#26657)
  Fix discovery-file plugin to use custom config path
  fix testSniffNodes to use the new error message
  Add check for invalid index in WildcardExpressionResolver (elastic#26409)
  ...
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in another channel, I think we should avoid losing the ability to compare what a user sees in /sys/fs/cgroup/memory versus the output from the stats APIs. Otherwise, I'm good with this PR.

David Roberts added 3 commits September 22, 2017 14:32
The original idea was to store these values as Long, truncating any values
outside the range of long.  However, this meant that in the relatively common
case of no limit being applied, users would not see the same value in the OS
stats as they see by querying Linux directly.  This change places a burden on
consumers of the strings to convert the strings to numbers and decide what to
do about extremely large values, but there will be very few consumers and they
would need to have a policy for dealing with "no limit" in any case.
@droberts195
Copy link
Author

@jasontedor as discussed I changed the type of the cgroup memory stats to String because the value used for "no limit" doesn't fit in long and BigInteger isn't supported by XContent.

Please could you take another look?

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@droberts195 droberts195 merged commit a292740 into elastic:master Oct 3, 2017
@droberts195 droberts195 deleted the add_memory_cgroup_to_stats branch October 3, 2017 11:08
@droberts195
Copy link
Author

Thanks for reviewing @jasontedor.

droberts195 pushed a commit that referenced this pull request Oct 3, 2017
This change adds cgroup memory usage/limit to the OS stats section of
the node stats on Linux.  This information is useful because in Docker
containers the standard node stats report the host memory limit, not
taking account of extra restrictions that may have been applied to the
container.

The original idea was to store these values as Long, truncating any values
outside the range of long.  However, this meant that in the relatively common
case of no limit being applied, users would not see the same value in the OS
stats as they see by querying Linux directly.  So instead the values are stored
as String.  This change places a burden on consumers of the strings to
convert the strings to numbers and decide what to do about extremely large
values, but there will be very few consumers and they would need to have a
policy for dealing with "no limit" in any case.
droberts195 pushed a commit that referenced this pull request Oct 3, 2017
droberts195 pushed a commit that referenced this pull request Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants