-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct/cg/sd: fix dbus connection leak (alternative) #2937
Conversation
131a42d
to
e4c5016
Compare
e4c5016
to
cadd360
Compare
Rebased (minor conflicts in libct/int), addressed a review nit. |
CI failure is a known flake (#2907) which is appearing a lot lately. |
d0c41d4
to
e18bc8b
Compare
I have rewritten this PR to be less intrusive (don't change any of the callers). PTAL @cyphar @AkihiroSuda @mrunalp |
Using per cgroup manager dbus connection instances means that every cgroup manager instance gets a new connection, and those connections are never closed, ultimately resulting in file descriptors limit being hit. Revert back to using a single global dbus connection for everything, without changing the callers. NOTE that it is assumed a runtime can't use both root and rootless dbus at the same time. If this happens, we panic. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add a test to check that container.Run do not leak file descriptors. Before the previous commit, it fails like this: exec_test.go:2030: extra fd 8 -> socket:[659703] exec_test.go:2030: extra fd 11 -> socket:[658715] exec_test.go:2033: found 2 extra fds after container.Run Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
e18bc8b
to
a7feb42
Compare
Rebased to include #2941 to fix CI failures. |
@kolyshkin Do you want #2936 or this one? |
Good question. @cyphar thinks #2936 is better (but he had not seen the last iteration of this one). @mrunalp thinks this one is better. I slightly favor this one -- it is less elegant than #2936 but practically makes more sense (why do we need more than 1 connection anyway?). |
Merging. We can revisit #2936 later if it turned out to be better. |
TL;DR: fixes a regression (open fd leak) caused by PR #2923. This is an alternative to #2936.
When running libcontainer/integration/TestSystemdFreeze 2000 times,
I got "too many open files" error after a while:
Indeed, systemd cgroup manager never closes its dbus connection.
This was not a problem before PR #2923 because we only had a single connection
for the whole runtime. Now it is per manager instance, so we leak a
connection (apparently two sockets) per cgroup manager instance.
The fix is to go back to using a single global dbus connection.
UPDATE this is now done to minimize the diff size.
This also makes it impossible to use both rootful and rootless dbus connections
at the same time. From what I understand, no runtime ever wants or needs to
do that, so it's not an issue in practice. An assertion is added to make sure this
never happens.
A test case is added in a separate commit. Before the fix, it always fails
like this: