-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct: Signal: honor RootlessCgroups #4395
Conversation
@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error { | |||
// leftover processes. Handle this special case here. | |||
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { | |||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | |||
if c.config.RootlessCgroups { // may not have an access to cgroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"RootlessCgroups" is misnomer, as this is set to false on modern rootless environments (cgroup v2 + systemd).
Should be renamed to something else in a separate PR.
0cd13f7
to
f35ae2c
Compare
f35ae2c
to
ca303ca
Compare
@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error { | |||
// leftover processes. Handle this special case here. | |||
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { | |||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | |||
if c.config.RootlessCgroups { // may not have an access to cgroup | |||
return c.signal(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signalAllProcesses
has a cgroupv1-friendly fallback, what is missing from there? Maybe the real issue is that we do m.Exists()
in signalAllProcesses
(I guess that's where the error is coming from?)?
@kolyshkin wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signalAllProcesses has a cgroupv1-friendly fallback
No when lacking an access to the cgroup.
It just returns ErrNotRunning
immediately :
runc/libcontainer/init_linux.go
Lines 696 to 701 in 961b803
// signalAllProcesses freezes then iterates over all the processes inside the | |
// manager's cgroups sending the signal s to them. | |
func signalAllProcesses(m cgroups.Manager, s unix.Signal) error { | |
if !m.Exists() { | |
return ErrNotRunning | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it seems correct that m.Exists()
returns false from the perspective of m
, as the cgroup actually does not exist in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but the function name doesn't imply that it's cgroup-specific (though it kind of is). Idk...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signalAllProcesses is only called when shared pid ns is used (as otherwise it's enough to kill pid 1), and to know all the PIDs we need a dedicated cgroup. Meaning, that rootless container with shared pidns won't work, as there is no way to find all PIDs (well, theoretically, we can find all the children of container init in this case, but I am not sure it's a good idea).
Not sure how it works in runc 1.1 -- most probably it just doesn't (i.e. we leak processes).
@@ -388,11 +388,18 @@ func (c *Container) Signal(s os.Signal) error { | |||
// leftover processes. Handle this special case here. | |||
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { | |||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | |||
if c.config.RootlessCgroups { // may not have an access to cgroup | |||
return c.signal(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, for a container with shared pid namespace, just only kill the init process can cause all other processes in this container killed? If not, these non-init processes will be leaked after we delete the container. Maybe these non-init processes are managed by downstream tools(containerd or rootlesskit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only container init process is killed, and other processes are leaked until the pidns init process is killed.
The leaked processes are not managed by anything, so it is still highly recommended to use cgroup v2 systemd delegation.
|
||
runc run -d --console-socket "$CONSOLE_SOCKET" attached_ctr | ||
[ "$status" -eq 0 ] | ||
testcontainer attached_ctr running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please exec another process to this container? For example:
runc exec -d attached_ctr sleep infinity
But I think it's hard to simulate the situation that runc can't access the cgroup path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And should detect whether the second process has been killed or not after we kill the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second process is just leaked when cgroup is not delegated, as in v1.1.
The scope of the PR is limited to just fix the regression #4394.
It is still quite hard to handle the leaked processes in a robust way, and it should be discussed in a separate issue.
(Probably we should walk the procfs tree to track descendants of the container init process)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container init might be already gone (see #4102).
Maybe |
Seems so. |
@@ -44,6 +44,7 @@ func destroy(c *Container) error { | |||
// and destroy is supposed to remove all the container resources, we need | |||
// to kill those processes here. | |||
if !c.config.Namespaces.IsPrivate(configs.NEWPID) { | |||
// Likely to fail when c.config.RootlessCgroups is true | |||
_ = signalAllProcesses(c.cgroupManager, unix.SIGKILL) | |||
} | |||
if err := c.cgroupManager.Destroy(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question, if runc has no access to the cgroup path, this cgroup destroy call will throw the error either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Destroy
seems to ignore ENOENT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related:
- HostPID Pod Container Cgroup path was residual after container restarts #4040
- regression: can't kill and delete the container with shared(host) pid ns when the init process has dead #4047
- Fix
runc kill
andrunc delete
for containers with no init and no private PID namespace #4102
I think this is adequate given the circumstances, but the commit message and the code needs to be amended to explain that in this case we leak processes (maybe add a warning, too).
I will open a separate PR to warn about such configuration.
`signalAllProcesses()` depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup. When `RootlessCgroups` is set to `true`, runc just ignores the error from `signalAllProcesses` and may leak some processes running. (See the comments in PR 4395) In the future, runc should walk the process tree to avoid such a leak. Note that `RootlessCgroups` is a misnomer; it is set to `false` despite the name when cgroup v2 delegation is configured. This is expected to be renamed in a separate commit. Fix issue 4394 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
ca303ca
to
429e06a
Compare
Done. Added a warning too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
@kolyshkin thanks for all the related issues/PRs, it really helps to review this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -388,11 +388,21 @@ func (c *Container) Signal(s os.Signal) error { | |||
// leftover processes. Handle this special case here. | |||
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { | |||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | |||
if c.config.RootlessCgroups { // may not have an access to cgroup | |||
logrus.WithError(err).Warn("failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shuold we remove .WithError(err)
here? Because for kill a such type running container, it will log with an error: `failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation) error=container not running". It will make the user confused.
(I'm sorry to say it after merged, because I'm in a busy state these days.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other types of errors it still makes sense to print the err?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signalAllProcesses()
depends on the cgroup and is expected to fail when runc is running in rootless without an access to the cgroup.When
RootlessCgroups
is set totrue
, runc just ignores the error fromsignalAllProcesses
and may leak some processes running. (See the comments in this PR)In the future, runc should walk the process tree to avoid such a leak.
Note that
RootlessCgroups
is a misnomer; it is set tofalse
despite the name when cgroup v2 delegation is configured.This is expected to be renamed in a separate commit.
Fix #4394