-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libct: Signal: honor RootlessCgroups #4395
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -388,11 +388,21 @@ func (c *Container) Signal(s os.Signal) error { | |||||||||||||
// leftover processes. Handle this special case here. | ||||||||||||||
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) { | ||||||||||||||
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil { | ||||||||||||||
if c.config.RootlessCgroups { // may not have an access to cgroup | ||||||||||||||
logrus.WithError(err).Warn("failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)") | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shuold we remove (I'm sorry to say it after merged, because I'm in a busy state these days.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For other types of errors it still makes sense to print the err? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||
// Some processes may leak when cgroup is not delegated | ||||||||||||||
// https://github.com/opencontainers/runc/pull/4395#pullrequestreview-2291179652 | ||||||||||||||
return c.signal(s) | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@kolyshkin wdyt? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No when lacking an access to the cgroup. runc/libcontainer/init_linux.go Lines 696 to 701 in 961b803
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And it seems correct that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but the function name doesn't imply that it's cgroup-specific (though it kind of is). Idk... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. signalAllProcesses is only called when shared pid ns is used (as otherwise it's enough to kill pid 1), and to know all the PIDs we need a dedicated cgroup. Meaning, that rootless container with shared pidns won't work, as there is no way to find all PIDs (well, theoretically, we can find all the children of container init in this case, but I am not sure it's a good idea). Not sure how it works in runc 1.1 -- most probably it just doesn't (i.e. we leak processes). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another question, for a container with shared pid namespace, just only kill the init process can cause all other processes in this container killed? If not, these non-init processes will be leaked after we delete the container. Maybe these non-init processes are managed by downstream tools(containerd or rootlesskit)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only container init process is killed, and other processes are leaked until the pidns init process is killed. The leaked processes are not managed by anything, so it is still highly recommended to use cgroup v2 systemd delegation. |
||||||||||||||
} | ||||||||||||||
return fmt.Errorf("unable to kill all processes: %w", err) | ||||||||||||||
} | ||||||||||||||
return nil | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
return c.signal(s) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
func (c *Container) signal(s os.Signal) error { | ||||||||||||||
// To avoid a PID reuse attack, don't kill non-running container. | ||||||||||||||
if !c.hasInit() { | ||||||||||||||
return ErrNotRunning | ||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,6 +44,7 @@ func destroy(c *Container) error { | |
// and destroy is supposed to remove all the container resources, we need | ||
// to kill those processes here. | ||
if !c.config.Namespaces.IsPrivate(configs.NEWPID) { | ||
// Likely to fail when c.config.RootlessCgroups is true | ||
_ = signalAllProcesses(c.cgroupManager, unix.SIGKILL) | ||
} | ||
if err := c.cgroupManager.Destroy(); err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another question, if runc has no access to the cgroup path, this cgroup destroy call will throw the error either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,3 +133,27 @@ test_host_pidns_kill() { | |
test_host_pidns_kill | ||
unset KILL_INIT | ||
} | ||
|
||
# https://github.com/opencontainers/runc/issues/4394 (cgroup v1, rootless) | ||
@test "kill KILL [shared pidns]" { | ||
update_config '.process.args = ["sleep", "infinity"]' | ||
|
||
runc run -d --console-socket "$CONSOLE_SOCKET" target_ctr | ||
[ "$status" -eq 0 ] | ||
testcontainer target_ctr running | ||
target_pid="$(__runc state target_ctr | jq .pid)" | ||
update_config '.linux.namespaces |= map(if .type == "user" or .type == "pid" then (.path = "/proc/'"$target_pid"'/ns/" + .type) else . end) | del(.linux.uidMappings) | del(.linux.gidMappings)' | ||
|
||
runc run -d --console-socket "$CONSOLE_SOCKET" attached_ctr | ||
[ "$status" -eq 0 ] | ||
testcontainer attached_ctr running | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please exec another process to this container? For example:
But I think it's hard to simulate the situation that runc can't access the cgroup path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And should detect whether the second process has been killed or not after we kill the container. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The second process is just leaked when cgroup is not delegated, as in v1.1. The scope of the PR is limited to just fix the regression #4394. It is still quite hard to handle the leaked processes in a robust way, and it should be discussed in a separate issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The container init might be already gone (see #4102). |
||
|
||
runc kill attached_ctr 9 | ||
[ "$status" -eq 0 ] | ||
|
||
runc delete --force attached_ctr | ||
[ "$status" -eq 0 ] | ||
|
||
runc delete --force target_ctr | ||
[ "$status" -eq 0 ] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"RootlessCgroups" is misnomer, as this is set to false on modern rootless environments (cgroup v2 + systemd).
Should be renamed to something else in a separate PR.