Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

Closed
wants to merge 1 commit into from
Closed

nsenter: cloned_binary: use overlayfs instead of bind-mount #2006

wants to merge 1 commit into from

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Mar 8, 2019

A /proc/self/exe which is based on a read-only bind-mount can be made
read-write somewhat trivially with CAP_SYS_ADMIN. Though mounts are
blocked by the default AppArmor policy (and capability set), using
overlayfs is far more resilient to being messed with.

The main downside of this approach is that overlayfs was added in Linux
3.18, which is after memfd_create(2) was added -- and the whole point of
this exercise was to have a sane setup which worked on older kernel
versions.

Follow-up of #1984.
Signed-off-by: Aleksa Sarai asarai@suse.de

A /proc/self/exe which is based on a read-only bind-mount can be made
read-write somewhat trivially with CAP_SYS_ADMIN. Though mounts are
blocked by the default AppArmor policy (and capability set), using
overlayfs is far more resilient to being messed with.

The main downside of this approach is that overlayfs was added in Linux
3.18, which is after memfd_create(2) was added -- and the whole point of
this exercise was to have a sane setup which worked on older kernel
versions.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cyphar
Copy link
Member Author

cyphar commented Mar 8, 2019

Note that we can also just decide that giving CAP_SYS_ADMIN to a privileged container is already ridiculously unsafe that we don't provide any security guarantees about such a setup (even with this patch, CAP_SYS_ADMIN in a privileged container is an insane configuration).

@thaJeztah
Copy link
Member

I think CAP_SYS_ADMIN is still needed to run systemd in a container, which is not an uncommon scenario

@rhatdan
Copy link
Contributor

rhatdan commented Mar 8, 2019

@thaJeztah No systemd does not require CAP_SYS_ADMIN, if configured correctly. Podman runs systemd just fine without CAP_SYS_ADMIN. It can even run it as non root. (rootless mode)

A container with CAP_SYS_ADMIN is still blocked by SELinux, although giving a container CAP_SYS_ADMIN is pretty equivalent to --privileged.

@cyphar
Copy link
Member Author

cyphar commented Mar 8, 2019

Yeah, my view is that CAP_SYS_ADMIN with a non-userns container is simply unsafe and it's your funeral if you plan to run a configuration like that with untrusted code. While this fix is "neat" the kernel requirement is 3.18 which means it doesn't fix the old-kernel issue.

@Ace-Tang
Copy link
Contributor

Hi, @cyphar , I re-read your commit, if runc try-bind successful, it bind host runc to another place, and make it read-only, and get it fd, but if container has CAP_SYS_ADMIN, then it can still make fd writable, and change the real runc in host.

Am I understand right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants