nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

cyphar · 2019-03-08T09:18:42Z

A /proc/self/exe which is based on a read-only bind-mount can be made
read-write somewhat trivially with CAP_SYS_ADMIN. Though mounts are
blocked by the default AppArmor policy (and capability set), using
overlayfs is far more resilient to being messed with.

The main downside of this approach is that overlayfs was added in Linux
3.18, which is after memfd_create(2) was added -- and the whole point of
this exercise was to have a sane setup which worked on older kernel
versions.

Follow-up of #1984.
Signed-off-by: Aleksa Sarai asarai@suse.de

A /proc/self/exe which is based on a read-only bind-mount can be made read-write somewhat trivially with CAP_SYS_ADMIN. Though mounts are blocked by the default AppArmor policy (and capability set), using overlayfs is far more resilient to being messed with. The main downside of this approach is that overlayfs was added in Linux 3.18, which is after memfd_create(2) was added -- and the whole point of this exercise was to have a sane setup which worked on older kernel versions. Signed-off-by: Aleksa Sarai <asarai@suse.de>

cyphar · 2019-03-08T09:28:50Z

Note that we can also just decide that giving CAP_SYS_ADMIN to a privileged container is already ridiculously unsafe that we don't provide any security guarantees about such a setup (even with this patch, CAP_SYS_ADMIN in a privileged container is an insane configuration).

thaJeztah · 2019-03-08T12:19:55Z

I think CAP_SYS_ADMIN is still needed to run systemd in a container, which is not an uncommon scenario

rhatdan · 2019-03-08T12:22:07Z

@thaJeztah No systemd does not require CAP_SYS_ADMIN, if configured correctly. Podman runs systemd just fine without CAP_SYS_ADMIN. It can even run it as non root. (rootless mode)

A container with CAP_SYS_ADMIN is still blocked by SELinux, although giving a container CAP_SYS_ADMIN is pretty equivalent to --privileged.

cyphar · 2019-03-08T23:49:28Z

Yeah, my view is that CAP_SYS_ADMIN with a non-userns container is simply unsafe and it's your funeral if you plan to run a configuration like that with untrusted code. While this fix is "neat" the kernel requirement is 3.18 which means it doesn't fix the old-kernel issue.

Ace-Tang · 2019-03-11T06:06:25Z

Hi, @cyphar , I re-read your commit, if runc try-bind successful, it bind host runc to another place, and make it read-only, and get it fd, but if container has CAP_SYS_ADMIN, then it can still make fd writable, and change the real runc in host.

Am I understand right ?

cyphar mentioned this pull request Mar 8, 2019

nsenter: cloned_binary: "memfd" cleanups #1984

Merged

cyphar closed this Mar 8, 2019

This was referenced Mar 10, 2019

[release/1.2 backport] update runc to 2b18fe1d885ee5083ef9f0838fee39b62d653e30 containerd/containerd#3082

Merged

[release/1.1 backport] update runc to 2b18fe1d885ee5083ef9f0838fee39b62d653e30 containerd/containerd#3083

Merged

cyphar deleted the memfd-overlayfd branch March 10, 2019 04:27

chrischdi mentioned this pull request Mar 12, 2019

Unable to start container using low memory limit coreos/bugs#2551

Open

cyphar mentioned this pull request Mar 16, 2019

fix cloned_binary fallback corner case #2016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

cyphar commented Mar 8, 2019

cyphar commented Mar 8, 2019

thaJeztah commented Mar 8, 2019

rhatdan commented Mar 8, 2019 •

edited

Loading

cyphar commented Mar 8, 2019

Ace-Tang commented Mar 11, 2019

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

Conversation

cyphar commented Mar 8, 2019

cyphar commented Mar 8, 2019

thaJeztah commented Mar 8, 2019

rhatdan commented Mar 8, 2019 • edited Loading

cyphar commented Mar 8, 2019

Ace-Tang commented Mar 11, 2019

rhatdan commented Mar 8, 2019 •

edited

Loading