Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.2 regression] [cgroup v1 + rootless] nerdctl run -d --name=bar --pid=container:foo ; nerdctl rm -f bar hangs #4394

Closed
Tracked by #4114 ...
AkihiroSuda opened this issue Sep 4, 2024 · 8 comments · Fixed by #4395

Comments

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Sep 4, 2024

nerdctl run -d --name=foo busybox sleep infinity
nerdctl run -d --name=bar --pid=container:foo busybox sleep infinity
nerdctl rm -f bar
# runc v1.1.14 works, but v1.2.0-rc.3 hangs here.
# However, `nerdctl stop bar && nerdctl rm bar` still works.

The issue is reproducible with:

  • Ubuntu 20.04 (cgroup v1) + rootless

Not reproducible with:

  • Ubuntu 20.04 (cgroup v1) + rootful
  • Ubuntu 20.04 (rebooted in cgroup v2) + rootless
  • Ubuntu 20.04 (rebooted in cgroup v2) + rootful
  • Ubuntu 24.04 (cgroup v2) + rootless
  • Ubuntu 24.04 (cgroup v2) + rootful

containerd version (v1.7.16, v2.0.0-rc.4) and nerdctl version (v1.7.6, v2.0.0-rc.1) do not seem to matter.

I haven't figured out how to reproduce the issue with a plain nerd-less bats.


Regression in 9583b3d libct: move killing logic to container.Signal :

@AkihiroSuda
Copy link
Member Author

Looks like nerdctl rm -f hangs when runc kill <ID> 9 is failing with ERRO[0000] unable to kill all processes: container not running (although runc list says the container is running):

$ containerd-rootless-setuptool.sh nsenter
# runc --root /run/containerd/runc/default list
ID                                                                 PID         STATUS      BUNDLE                                                                                                                   CREATED                          OWNER
328ab284487a2ba4edde0c6e6ca25b086c2a650bd04e61a44be8f545cdf6fb36   518750      running     /run/containerd/io.containerd.runtime.v2.task/default/328ab284487a2ba4edde0c6e6ca25b086c2a650bd04e61a44be8f545cdf6fb36   2024-09-04T17:47:10.651799954Z   root
6481b12d9fa8291603eac4868ce498f226d6b75fe7a826ee907de825ca2a5738   518905      running     /run/containerd/io.containerd.runtime.v2.task/default/6481b12d9fa8291603eac4868ce498f226d6b75fe7a826ee907de825ca2a5738   2024-09-04T17:47:11.180816681Z   root
# runc --root /run/containerd/runc/default kill 6481b12d9fa8291603eac4868ce498f226d6b75fe7a826ee907de825ca2a5738 9
ERRO[0000] unable to kill all processes: container not running

@cyphar
Copy link
Member

cyphar commented Sep 4, 2024

I guess this is caused by some of the changes we made to the stop/kill logic a while ago? Oh boy...

@AkihiroSuda
Copy link
Member Author

Regression in 9583b3d libct: move killing logic to container.Signal :

@AkihiroSuda
Copy link
Member Author

@lifubang
Copy link
Member

lifubang commented Sep 6, 2024

nerdctl rm -f bar
# runc v1.1.14 works, but v1.2.0-rc.3 hangs here.
# However, `nerdctl stop bar && nerdctl rm bar` still works.

This seems strange, it seems that both nerdctl stop bar && nerdctl rm bar and nerdctl rm -f bar call the runc kill command to kill the container.

@kolyshkin
Copy link
Contributor

So, we have cgroup v1 rootless container which shares pidns with another container.

For shared pidns case, runc kill can not just kill container init (as it's not pid 1), and have to actually kill all the processes in the container cgroup. And if there's no cgroup, it's impossible to do.

I think the best course of action is to disallow such configuration, as it's not working and won't work. I.e. runc run/start should error out (for cgroup v1 + rootless + shared pidns).

@AkihiroSuda
Copy link
Member Author

Printing a warning would be fine, but erroring out is a breaking change

@kolyshkin
Copy link
Contributor

Related:

We already have a few test cases for same/similar issue (#4047), but they all require rootless_cgroup.

Given the circumstances, I think #4395 is OK, but we also need some warnings added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants