-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
user namespace bugfixes and features #6865
Conversation
Change file related checks to use user namespaces and make sure involved uids/gids are mappable in the current namespace. Note that checks without file ownership information will still not take user namespaces into account, as some of these should be handled via 'zfs allow' (otherwise root in a user namespace could issue commands such as `zpool export`). Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Closes openzfs#6800
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
An unprivileged container usually has its own user and group list, and zfs allow should be able to both view and modify them from the outside without having to add temporary entries with the mapped uids/gids. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
When executing 'zfs allow' from within a user namespace, the uids and gids must be mapped in accordance with the namespace. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Regular users can only remove permissions from themselves in addition to requiring the allow permission to do so. It makes more sense for privileged users in a user namespace to be able to manage permissions of all users of that namespace. Thus, when the user has CAP_SYS_ADMIN in their current namespace, use the same check as for 'zfs allow' instead. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thrust of this looks good! I'm glad to see some user namespace test cases. Hopefully we'll be able to add more over time.
It's unfortunate that while we can delegate mount privileges to a user in a user namespace, the same user when running in the global namespace can't exercise those same privileges.
I think it would be worth investigating if the existing is_global_zone
logic in the ZTS could be modified to mean is_global_namespace
instead. This could potentially allow you to run the ZTS in a user namespace. Right now it's hardwired to always assume it's running in the global zone.
|
||
if (socketpair(AF_UNIX, SOCK_STREAM, 0, syncfd) != 0) { | ||
perror("socketpair"); | ||
return (errno); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning errno
here is dodgy since after a successful call to perror(3)
errno
is technically undefined. You should save errno
in a local variable if you want to use it latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that, since chg_usr_exec.c
is doing the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's not good either and should really be fixed at some point.
return (exit_code); | ||
error_errno: | ||
exit_code = errno; | ||
error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above you can't safely assume errno
is still valid here. Given that, I think it would be simpler to drop the error*
labels and do the needed error handling in each conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's just a helper for tests most of the cases don't really need errno
anyway and returning 1
would work just as much. A "failed" or "not failed" exit status should suffice after perror() printed the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
close(syncfd[0]); | ||
done: | ||
while (waitpid(child, &wstatus, 0) != child) { | ||
/* Keep it simple. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think the comment here is needed.
@@ -695,6 +695,11 @@ tags = ['functional', 'truncate'] | |||
tests = [ 'upgrade_userobj_001_pos' ] | |||
tags = ['functional', 'upgrade'] | |||
|
|||
# user_namespace_001 - https://github.com/zfsonlinux/zfs/issues/6800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to link to the issue once this functionality works and has test coverage. So this can be dropped.
dist_pkgdata_SCRIPTS = \ | ||
setup.ksh \ | ||
cleanup.ksh \ | ||
user_namespace_001.ksh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user_namespace_common.kshlib
and user_namespace_common.cfg
need to be added to the Makefile.am
. This is what caused the bots to fails. Please double check the permissions on the scripts too.
{ | ||
ASSERT3S(all, ==, B_FALSE); | ||
|
||
if (cr != CRED() && (cr != kcred)) | ||
return (err); | ||
|
||
if (!capable(capability)) | ||
if (!(ns ? ns_capable(ns, capability) : capable(capability))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's going to need to be some compatibility code added for older kernels which don't have ns_capable
or a cr->user_ns
. In which case this functionally needs to be automatically disabled.
@@ -339,4 +339,5 @@ struct file_system_type zpl_fs_type = { | |||
.get_sb = zpl_get_sb, | |||
#endif /* HAVE_MOUNT_NODEV */ | |||
.kill_sb = zpl_kill_sb, | |||
.fs_flags = FS_USERNS_MOUNT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't a valid flag until the 3.8 kernel, this functionally will need to be gracefully disabled in older kernels.
if (error == EACCES) | ||
error = dsl_deleg_access(osname, "mount", cred); | ||
if (error) | ||
return (error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you assign cred = CRED()
in the declaration.
if ((error = dsl_deleg_get(zc->zc_name, &nvp)) != 0) | ||
return (error); | ||
#ifdef CONFIG_USER_NS | ||
if ((error = deleg_map_user_ns(&nvp)) == 0) | ||
error = put_nvlist(zc, nvp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this put_nvlist
in the !CONFIG_USER_NS case as well.
* allow. | ||
*/ | ||
if (zc->zc_perm_action == B_FALSE || | ||
ns_capable(cr->user_ns, CAP_SYS_ADMIN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story, compatibility code needed for ns_capable()
on old kernels.
@Blub any updates on this? |
It's on my todo list but unfortunately not at the top currently. |
Hi, I've been testing this patch for some days now, and I didn't find a way to prohibit the host from mounting the datasets created by the containers. At each reboot, they get mounted on the host, and mounting them from the container results in an "already mounted" error. Is this something that should be handled by the user (in some way that I haven't found yet) or is it a limitation of this patch ? Thanks a lot for your work. |
Just as a suggestion, how about only allowing datasets with This prevents container mounts from being available to the host by default to prevent these kinds of issues and allows the mount points to safely reflect where they should be in the container itself. Downsides:
@pstch: the issue is that if a dataset is already mounted, you can't mount it a second time with the |
Regarding
I can push it with my next updates. |
@pstch, thanks for testing. During development I'm mostly testing with |
Yes, this is exactly what the |
Codecov Report
@@ Coverage Diff @@
## master #6865 +/- ##
==========================================
+ Coverage 75.23% 76.45% +1.22%
==========================================
Files 298 328 +30
Lines 94503 104006 +9503
==========================================
+ Hits 71100 79521 +8421
- Misses 23403 24485 +1082
Continue to review full report at Codecov.
|
Sure. Will open new ones once I rebased them and incorporated the remaining requested changes. |
Any news? |
ping^^ @Blub |
Haven't gotten around to continue with this yet... |
When rebasing this PR on current master, I can successfully mount, and the tests pass, but I get the following uid/gid, after creating a file in a dataset mounted in the global namespace, and then creating another file in the same dataset mounted in an user namespace : root@test:/mnt# ls -l
total 1
-rw-r--r-- 1 1000000 1000000 0 Nov 18 04:56 created_on_container
-rw-r--r-- 1 root root 0 Nov 18 04:56 created_on_host
root@test:/mnt# echo >> created_on_container
root@test:/mnt# echo >> created_on_host I have some trouble understanding if this is the required behaviour (and getting worried that I failed something when rebasing), and if UID/GIDs should be mapped when mounted in user namespaces. I would have personally thought that yes, they would be mapped. This LWN article says that they should not:
I don't really understand why should that mapping not happen, so I have some difficulties understanding the proper behaviour of user namespace mounts. |
I would also like to say that I don't think #7294 is related to this issue. Delegating mount/unmount to user namespaces seems to work ( |
Just an FYI for who's following this: pushed the rebase, and another patch on top to make the super block always owned by inituser_ns for now. After re-reading @DeHackEd's comment I'm thinking limiting userns mounts to the zoned property probably makes sense, too, and in this case it may even make sense to have the super block owned by the user namespace. It really change much from the perspective of a container, but it would allow switching a container to a different user namespace (or turn it into a privileged container and back) without utilizing tools such as fuidshift. Not sure if it's of much use other than that? @stgraber might have some thoughts? |
@Blub Thanks for the push, great work. Just for sure, have you solved the issue with pool structure of other user shown in |
What do you mean exactly? |
I have tested the last rebase of this PR (with the patch that makes the superblock owned by init_user_ns) on Debian, and I can say that it seems to work well. As mentioned above, it's possible to integrate userns mounts with the zoned property. There are a few different possibilities:
If some concept of zone becomes available, it becomes possible to integrate it in a few different ways with userns mounts:
The current code does not seem to handle the "superblock owned by userns" case (which is basically the situation before the patch the made it owned by init_user_ns) well, and leads to this weird situation when interacting with a dataset mounted in an user namespace:
The files appear with the same UID/GIDs when mounted in init_user_ns, so there is no UID/GID translation done when reading files and when calling chown, but there seems to be a problem with the way the owner of created files and directories is determined. I am currently testing the global/user zone approach (using userns-owned superblocks for zoned datasets), while trying to find a cause/solution for the above issue. |
So I think I'd start with a PR for the |
After reading the ZoL code more thorougly, I think that there i already some logic enabled for zones, even if it is never used at thi time (because we are always in the global zone). For example, if we use a non-global zone for user namespaces, user namespaces will ONLY be able to mount zoned dataset. This means that without changing the current logic, it is only possible to restrict userns mounts to zoned datasets (which would be the result of implementing crgetzoneid/INGLOBALZONE/zone_dataset_visible). I'll create a PR for the global/user zones approach (which does not use inode numbers, just 0/1 for init/user namespaces), trying to implement the required things in zone.h without changing the current logic, once I have working code and if the required PRs have been merged, so that it's possible to discuss which approach would make more sense. |
Does anyone know what happened to this PR? It was closed in March 2018 but conversation continued through December 2018 (previous comment) and it appears to have stopped there. Was anything done toward getting user namespaces working? I am interested regarding access to /dev/zfs inside containers: https://github.com/lxc/lxd/issues/4184 |
I also wondered what happened here, for the same reasons as @stevegilbert23 |
This work was taken up in #12263. |
thanks! |
This series can be seen as 4 separate "chunks":
Chunk 1: setgid mode bugfix & regression test:
user id range. (I saw no reason for anything more complex than that.)
Chunk 2: mounting from user namespaces (RFC):
in combination with user namespaces. Eg. giving
zfs allow
ing create+mountpermissions to a container.
since it made writing the test case of patch 6 more convenient.
Chunk 3: mapping user ids when using zfs allow from within user namespaces.
ZFS_IOC_GET_FSACL
andZFS_IOC_SET_FSACL
to perform user idmapping (as well as checking!) on the sent/received data. Otherwise root in a
user namespace would not be able to run
zfs allow
with the user IDs as seenfrom within its namespace, but would have to perform the mapping to real IDs.
This is also what easily enables users to create allow entries for user IDs
which do not exist in the host namespace's
/etc/passwd
and therefore wouldshow up empty and indistinguishable to the host (making patch 5 a
requirement).
Chunk 4: change the 'unallow' check:
root in containers) to remove permissions of others if they're also allowed
to add the permission.
Checklist:
make checkstyle
)Signed-off-by
.