-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Singularity, cgroup memory.limits, mmaped strangeness #5850
Comments
UPDATE: we tested this on a rhel 8 box with good results, under memory pressure, even with singularity oom was doing its job: test box:
setup:
Same test, 5 GB cgroup, 10x2GB stressors, expecting to see 2 unkilled, but...
interesting to note, oom kills ALL of the processes inside for some reason here, doesn't leave 2 running as on rhel 7:
|
The behavior where the OOM killing works on RHEL8 but not on RHEL 7 is what we've observed previously. As best we can tell at present this is a kernel version-specific issue related to cgroups memory handling with namespaces in use, and not a Singularity issue. I am able to reproduce with other means (a different test program) reliably on RHEL7, but not on RHEL8 or Fedora 33. It also does not reproduce if I manually install a mainline kernel on RHEL7. However, there is conflicting info in #5800 where a new kernel on Ubuntu 18.04 did not address the problem. |
Hello, This is a templated response that is being sent out to all open issues. We are working hard on 'rebuilding' the Singularity community, and a major task on the agenda is finding out what issues are still outstanding. Please consider the following:
Thanks, |
Hello Carter, sorry for the late reply, here is the summary of the situation regarding this issue. As David above confirmed, seems like this issue is the result of the interplay between 3.x kernel cgroups memory controller in centos7 and singularity in memory pressure situation. Since this issue kind of went stale here, we'll eventually do an upgrade to our cluster to get the kernel to at least 4.x in centos/rhel 8 where the oom subsystem is reworked and this issue doesn't exist anymore. Until then, we're also experimenting with an alternative workaround we developed here: https://github.com/pja237/kp_oom If there are any news on this topic from your side, we'd be happy to further discuss it, otherwise i guess you can close the ticket. Best, |
This issue has been automatically marked as stale because it has not had activity in over 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed because no response was provided within 7 days. |
@pja237 Have you rolled that fix to production or is there another workaround to this issue. We ran into this the hard way on our HPC cluster, after rolling out Singularity, and have yet to determine a fix. Upgrading our cluster from RHEL7 to RHEL8 is not possible, at least for the next 6 months. |
Hey @nlvw, in the lack of better (any) solutions, and needing to postpone the upgrade to rh8 on the cluster to 2022 (expected q2-3), The biggest risk of this method is that in case something goes wrong, there is no gentle way to recover, kernel panics and node reboots. We accepted it since the hard-reboot was the worst case manifestation of this issue anyways. Since May 2021, we've upgraded the kernel, slurm and singularity multiple times so atm we're running:
Experiences so far:
To give you a feel for how often we get it, i just did a
Hope this helps you. |
Apology
I'm opening this issue even though it seems to be related, or might be the one causing similar ones:
#5041
#5800
Perhaps these findings will help you help us understand what is going on and how we could mitigate against these situations.
Version of Singularity:
Reproduced successfully with two versions:
OS:
Expected behavior
When singularity processes running in cgroup reach memory.limit_in_bytes they should be killed by oom.
In some cases this happens, although we noticed that in the following specific one this does not, and it causes several quite negative effects on the nodes where it doesn't.
Actual behavior
When the running processes in question use the following mmap/memset pattern (probably not limited to):
...this somehow blocks the cgroup/oom mechanism from kicking in (details below in "steps to reproduce"), drops processes from the cgroup in uninterruptable sleep, produces sudden IO load on the node and in some cases completely render the node unusable.
In real workloads we have experienced:
note: this code pattern is the one tested as poc, but could possibly be that file backed mmap and/or other mem*() functions have same/similar effect.
Steps to reproduce this behavior
Download and compile mempoc.c (https://gist.github.com/pja237/b0e9a49be64a20ad1af905305487d41a).
NOTE: touching memory in the commented for-loop:
https://gist.github.com/pja237/b0e9a49be64a20ad1af905305487d41a#file-mempoc-c-L41
DOES NOT PRODUCE THE ISSUE, oom handles this perfectly!
Versions:
Setup cgroup:
(e.g.. 5 GB for this case)
Run mempoc without singularity
Spin up 10 children each mmaping and memsetting 2GB (total: 20GB).
Expected result: 8 get oom, 2*2GB remain alive.
observing oom kills we see the expected behaviour (8 killed):
And two remain (they fit in 5 GB cgroup):
Run mempoc with singularity
Singularity image built from:
Run mempoc with singularity and wait a bit...
Processes are now being in uninterruptible sleep instead of being killed,
, or in some cases we had oom kill fire up, but with no effect.
Iostat is showing IO load which did not exist on the machine.
Other method to reproduce
Install stress-ng package from epel (or similar) and run:
How did you install Singularity
rpm from epel:
The text was updated successfully, but these errors were encountered: