validator: ensure user doesn't try to mount /sys without userns #807

cyphar · 2016-05-08T14:28:22Z

Inside a user namespace, mount permissions become more nuanced. sysfs
requires you to have either CAP_SYS_ADMIN in the root user namespace
(for us this means that we're not using a user namespace) or that you
have CAP_SYS_ADMIN in the user namespace the network namespace was
created in. This means that having just a user namespace and no private
network namespace will result in errors when mounting sysfs1. Warn the
user about this in the validator.

Closes #799.

Signed-off-by: Aleksa Sarai asarai@suse.de

/cc @dqminh @mrunalp

Inside a user namespace, mount permissions become more nuanced. sysfs requires you to have either CAP_SYS_ADMIN in the root user namespace (for us this means that we're not using a user namespace) or that you have CAP_SYS_ADMIN in the user namespace the network namespace was created in. This means that having just a user namespace and no private network namespace will result in errors when mounting sysfs[1]. Warn the user about this in the validator. [1]: http://lists.linuxfoundation.org/pipermail/containers/2013-August/033388.html Signed-off-by: Aleksa Sarai <asarai@suse.de>

cyphar · 2016-05-16T15:36:08Z

PTAL.

/ping @mrunalp @dqminh @hqhq

dqminh · 2016-05-16T16:55:11Z

mmm, I think it's still possible that the user have cap_sys_admin on both the user namespace and the root ? ( though i dont know what's the purpose for that ). I dont think there's a lightweight method to check if an user possess cap_sys_admin either.

So i think this LGTM. I guess we can tune it if user complaints :p

cyphar · 2016-05-16T22:48:30Z

@dqminh I'm not sure how that works actually. Because you lose all capabilities in your previous user namespace if you setns into a namespace (or unshare it). Maybe there's some odd setup where this happens, but it shouldn't happen within runC -- but we can change it if someone complains (as you said).

crosbymichael · 2016-05-16T22:53:36Z

I'm not really a fan of these checks that prohibit runc from trying and letting the system return the error. The point being is that when/if this is fixed in future kernels we have to make a change and push an update for runc instead of not having this type of kernel validation and letting the kernel return the errors.

cyphar · 2016-05-16T22:57:42Z

My main justification when it comes to things like this is that the errors people get from the operating system don't make sense (you get an error when trying to mount /sys, but there's no other information apart from EACCES). And if the error happens in the new process we create, we don't even send the error text to the parent so the user only gets exit status 1 as an error message. I noticed this particularly when working with trying to implement rootless containers, I had to keep recompiling runC with debugging statements and messing around with process setup so I could actually get the error.

crosbymichael · 2016-05-16T23:18:55Z

idk, pros and cons on both sides. i don't know what is best

cyphar · 2016-05-16T23:25:38Z

At the very least, we should probably fix the current issues with not actually outputting the error you get inside the init process. Unfortunately, we still have to do a lot if we want to return errors from inside nsenter (we'll have to serialise the JSON manually in C).

mrunalp · 2016-05-16T23:26:02Z

I agree with @crosbymichael. The configuration isn't 100% foolproof and we can provide better examples on how to make certain features work.

hqhq · 2016-05-17T07:10:58Z

I also think we should not prohibit runc from trying because in theory the config could be valid and kernel rules might change. And the poor error message in this case is also pain, maybe some kinds of warning can be a compromise?

crosbymichael · 2016-05-17T22:02:38Z

Ya, if we cannot get a useful error message out of the C code then we should probably do a warning and not abort start for something like this.

cyphar · 2016-10-01T13:15:59Z

The nsenter rewrites have improved the logging part of the nsenter code (though there still is some left to be desired).

config.md: fix typo of context

GordonTheTurtle added the status/0-triage label May 8, 2016

This was referenced May 8, 2016

Cannot create user namespaced container without network namespaces #799

Open

Rootless Containers #774

Merged

cyphar closed this Oct 1, 2016

cyphar deleted the sysfs-validator-netns branch October 1, 2016 13:16

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Merge pull request opencontainers#807 from Mashimiao/config-small-fix

bc3a283

config.md: fix typo of context

rodnymolina mentioned this pull request Sep 12, 2020

K8s + Sysbox: mount sysfs fails (EPERM) during pod creation nestybox/sysbox#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

validator: ensure user doesn't try to mount /sys without userns #807

validator: ensure user doesn't try to mount /sys without userns #807

cyphar commented May 8, 2016 •

edited

Loading

cyphar commented May 16, 2016

dqminh commented May 16, 2016

cyphar commented May 16, 2016

crosbymichael commented May 16, 2016

cyphar commented May 16, 2016 •

edited

Loading

crosbymichael commented May 16, 2016

cyphar commented May 16, 2016

mrunalp commented May 16, 2016

hqhq commented May 17, 2016

crosbymichael commented May 17, 2016

cyphar commented Oct 1, 2016

validator: ensure user doesn't try to mount /sys without userns #807

validator: ensure user doesn't try to mount /sys without userns #807

Conversation

cyphar commented May 8, 2016 • edited Loading

cyphar commented May 16, 2016

dqminh commented May 16, 2016

cyphar commented May 16, 2016

crosbymichael commented May 16, 2016

cyphar commented May 16, 2016 • edited Loading

crosbymichael commented May 16, 2016

cyphar commented May 16, 2016

mrunalp commented May 16, 2016

hqhq commented May 17, 2016

crosbymichael commented May 17, 2016

cyphar commented Oct 1, 2016

cyphar commented May 8, 2016 •

edited

Loading

cyphar commented May 16, 2016 •

edited

Loading