Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct: Signal: honor RootlessCgroups #4395

Merged
merged 1 commit into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -388,11 +388,21 @@ func (c *Container) Signal(s os.Signal) error {
// leftover processes. Handle this special case here.
if s == unix.SIGKILL && !c.config.Namespaces.IsPrivate(configs.NEWPID) {
if err := signalAllProcesses(c.cgroupManager, unix.SIGKILL); err != nil {
if c.config.RootlessCgroups { // may not have an access to cgroup
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"RootlessCgroups" is misnomer, as this is set to false on modern rootless environments (cgroup v2 + systemd).

Should be renamed to something else in a separate PR.

logrus.WithError(err).Warn("failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation)")
Copy link
Member

@lifubang lifubang Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shuold we remove .WithError(err) here? Because for kill a such type running container, it will log with an error: `failed to kill all processes, possibly due to lack of cgroup (Hint: enable cgroup v2 delegation) error=container not running". It will make the user confused.

(I'm sorry to say it after merged, because I'm in a busy state these days.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other types of errors it still makes sense to print the err?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Some processes may leak when cgroup is not delegated
// https://github.com/opencontainers/runc/pull/4395#pullrequestreview-2291179652
return c.signal(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses has a cgroupv1-friendly fallback, what is missing from there? Maybe the real issue is that we do m.Exists() in signalAllProcesses (I guess that's where the error is coming from?)?

@kolyshkin wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses has a cgroupv1-friendly fallback

No when lacking an access to the cgroup.
It just returns ErrNotRunning immediately :

// signalAllProcesses freezes then iterates over all the processes inside the
// manager's cgroups sending the signal s to them.
func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
if !m.Exists() {
return ErrNotRunning
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it seems correct that m.Exists() returns false from the perspective of m, as the cgroup actually does not exist in this case

Copy link
Member

@cyphar cyphar Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the function name doesn't imply that it's cgroup-specific (though it kind of is). Idk...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signalAllProcesses is only called when shared pid ns is used (as otherwise it's enough to kill pid 1), and to know all the PIDs we need a dedicated cgroup. Meaning, that rootless container with shared pidns won't work, as there is no way to find all PIDs (well, theoretically, we can find all the children of container init in this case, but I am not sure it's a good idea).

Not sure how it works in runc 1.1 -- most probably it just doesn't (i.e. we leak processes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, for a container with shared pid namespace, just only kill the init process can cause all other processes in this container killed? If not, these non-init processes will be leaked after we delete the container. Maybe these non-init processes are managed by downstream tools(containerd or rootlesskit)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only container init process is killed, and other processes are leaked until the pidns init process is killed.

The leaked processes are not managed by anything, so it is still highly recommended to use cgroup v2 systemd delegation.

}
return fmt.Errorf("unable to kill all processes: %w", err)
}
return nil
}

return c.signal(s)
}

func (c *Container) signal(s os.Signal) error {
// To avoid a PID reuse attack, don't kill non-running container.
if !c.hasInit() {
return ErrNotRunning
Expand Down
2 changes: 2 additions & 0 deletions libcontainer/init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -695,6 +695,8 @@ func setupPersonality(config *configs.Config) error {

// signalAllProcesses freezes then iterates over all the processes inside the
// manager's cgroups sending the signal s to them.
//
// signalAllProcesses returns ErrNotRunning when the cgroup does not exist.
func signalAllProcesses(m cgroups.Manager, s unix.Signal) error {
if !m.Exists() {
return ErrNotRunning
Expand Down
1 change: 1 addition & 0 deletions libcontainer/state_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ func destroy(c *Container) error {
// and destroy is supposed to remove all the container resources, we need
// to kill those processes here.
if !c.config.Namespaces.IsPrivate(configs.NEWPID) {
// Likely to fail when c.config.RootlessCgroups is true
_ = signalAllProcesses(c.cgroupManager, unix.SIGKILL)
}
if err := c.cgroupManager.Destroy(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, if runc has no access to the cgroup path, this cgroup destroy call will throw the error either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destroy seems to ignore ENOENT

Expand Down
24 changes: 24 additions & 0 deletions tests/integration/kill.bats
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,27 @@ test_host_pidns_kill() {
test_host_pidns_kill
unset KILL_INIT
}

# https://github.com/opencontainers/runc/issues/4394 (cgroup v1, rootless)
@test "kill KILL [shared pidns]" {
update_config '.process.args = ["sleep", "infinity"]'

runc run -d --console-socket "$CONSOLE_SOCKET" target_ctr
[ "$status" -eq 0 ]
testcontainer target_ctr running
target_pid="$(__runc state target_ctr | jq .pid)"
update_config '.linux.namespaces |= map(if .type == "user" or .type == "pid" then (.path = "/proc/'"$target_pid"'/ns/" + .type) else . end) | del(.linux.uidMappings) | del(.linux.gidMappings)'

runc run -d --console-socket "$CONSOLE_SOCKET" attached_ctr
[ "$status" -eq 0 ]
testcontainer attached_ctr running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please exec another process to this container? For example:

runc exec -d attached_ctr sleep infinity

But I think it's hard to simulate the situation that runc can't access the cgroup path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And should detect whether the second process has been killed or not after we kill the container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second process is just leaked when cgroup is not delegated, as in v1.1.

The scope of the PR is limited to just fix the regression #4394.

It is still quite hard to handle the leaked processes in a robust way, and it should be discussed in a separate issue.
(Probably we should walk the procfs tree to track descendants of the container init process)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The container init might be already gone (see #4102).


runc kill attached_ctr 9
[ "$status" -eq 0 ]

runc delete --force attached_ctr
[ "$status" -eq 0 ]

runc delete --force target_ctr
[ "$status" -eq 0 ]
}
Loading