-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runc has problems due to leaked mount information #2404
Comments
The leaked mounts are most probably the result of wrong mount propagation used. The bug it causes might be workarounded by making |
Hi @kolyshkin , with wrong mount propagation: Is there something we can do about it? The first one is needed, because the CSI plugin does do mounts for other containers (prior to the other pod starts) and I think Thank you for looking into it :-) |
One other approach to workaround it would be to check the I still don't understand why |
When I debugged into it I have seen that the entries in |
In fact we should always use /sys/fs/cgroup, this seems to be the de-facto standard these days. It will still be interesting to see /proc/self/mountinfo where other cgroup entries precede those with /sys/fs/cgroup mountpotint. |
We are also experiencing the same thing with csi-rbd plugin. Found @chrischdi thread, and was able to delete the extra mounts in order for kubelet to come up. We are on coreos -- 4.19.106 (Coreos 2345.3.0). |
@kolyshkin We have the same problem too, and I find not only cgroup mount was leaked to host mount ns, all mount in the csi container which use the bidirection mount propagation were leaked.
may be there are some bugs in runc? such as prepareRoot or mountToRootfs func? |
This is some logs of leaked csi container 7d0849b82c486573d1. I observed the leak at Aug 11 14:15:59, and the container strat failed at Aug 11 14:16:07, which means leaked happen before csi container running, so I suspect the problem may be in runc.
|
I have experiencing the same leak problem, and i have spent some days to try to find out root cause but i don't found yet, i found one obvious problem after inspected my host environment where mount leak problem occurred and digging into runc code:
And after experiencing the same problem four times, I found that all containers with issues had In addition, not only cgroup mounts will leak, in my environment, all mounts in config.json leaked into the host mount namespace, such as There are some information from my problem case, hope they can be of some use:
|
I encountered the same problem. I also think that the Did you finally locate the cause? |
we found getParentMount func in the rootfsParentMountPrivate func return wrong mountPoint. In the common k8s case, it should return overlayFsMountPoint of container, while it returned "/run", which is the parent mount of overlayFsMountPoint. I suspect the root cause may be a bug in the kernel: sometimes the overlayFsMountPoint just created cannot be observed in the new mount namespace? cc @zhaodiaoer |
I think i have found the root cause of this problem, let me explain the complete picture of this problem: TL;DR: Detail version:
|
I am well aware of the mountinfo reading bug; in fact, I have a whole repo devoted to the issue: https://github.com/kolyshkin/procfs-test. This is a kernel bug, which is fixed in kernel v5.8 (see the above repo for details). Distro vendors should either upgrade their kernels, or backport the relevant patch (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9f6c61f96f2d97cbb5f7fa85607bc398f843ff0f). Theoretically, we can add a retry in getParentMount. Practically, this is very bad performance-wise. |
@zhaodiaoer thanks for investigating that. If you can figure out a reliable way to know if/when we should re-try reading mounts in
Alas, this is a kernel bug, not a mountinfo package bug (otherwise we should have it fixed by now). |
Can anyone who has seen this issue test the proposed patch in #4417 and report (in that PR, not here!) if it fixes the issue? |
Yes, after this kernel bug fix, this mount leak issue will be solved. very important information, Thanks ! |
I am still thinking of a solution, but I haven't come up with one yet... |
I'm coming over her by debugging kubernetes/kubernetes#91023 together with containerd v1.3.4 which ships runc:
We have identified that there is some kind of leak of cgroup mounts which result in e.g. the following lines in
/proc/self/mountinfo
:When such a leak does exist, runc tries to use use a wrong cgroup during
libcontainer/rootfs_linux
'sprepareRootfs
.I was able to reproduce the bug by:
This results in the above output.
I was able to debug a bit into runc here and found the following
The function
GetCgroupMounts(false)
returns in this case the wrong mountpoint for the systemd cgroup (/run/foo/rootfs/sys/fs/cgroup/systemd
insetad of/sys/fs/cgroup/systemd
).This is because in
/proc/self/mountinfo
the mount/run/foo/rootfs/sys/fs/cgroup/systemd
occured before/sys/fs/cgroup/systemd
(which seems weird for me, because having a look myself to/proc/self/mountinfo
and processing it would order them the other way around).As a POC I added the following patch to runc which fixed it for my test case:
of course this does not work for upstream, at least to fix the original leak I would need to match on something like
/run/containerd/
.The text was updated successfully, but these errors were encountered: