Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot rm file from folder in nfs-share when folder has write permissions for others #13217

Closed
sdimitro opened this issue Mar 14, 2022 · 18 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@sdimitro
Copy link
Contributor

sdimitro commented Mar 14, 2022

Current Minimal Reproducer

Setup consists of two VMs one exposing an empty folder over NFS and the other one mounting it under ~/nfs-share. Then to reproduce the issue do the following on the client side:

~/nfs-share$ mkdir test-dir
~/nfs-share$ chmod 757 test-dir/
~/nfs-share$ cd test-dir/
~/nfs-share/test-dir$ touch foo
~/nfs-share/test-dir$ rm foo
rm: cannot remove 'foo': Operation not permitted

What is interesting is that if we change the permissions of test-dir to 755 we're able to delete the file. I tried all the cases from 750 to 757 and it seems like whenever others have write permissions (specifically 752, 753, 756, and 757) for the directory containing the file then we are not able to delete the file. This is obviously a bug because since we are the owner of the file we should be able to delete it.

Further Analysis

Looking at our internal VMs at Delphix we don't hit this issue with older VMs that don't contain the following commit:
235a856

The above commit seems to have introduced another regression at some point that was later fixed (see 66e6d3f ). Unfortunately this commit does not fix the issue filed here.

Will update the issue as I dig further but I figured I file a bug here first in case people with more context of the above commit can point out the culprit quicker than myself.

@sdimitro sdimitro added the Type: Defect Incorrect behavior (e.g. crash, hang) label Mar 14, 2022
@sdimitro
Copy link
Contributor Author

cc: @pbhenson , @anodos325 , @behlendorf , @don-brady

@anodos325
Copy link
Contributor

anodos325 commented Mar 14, 2022

Yes, I recall this is an issue with how the the internal NFSv4 ACL is generated to represent a POSIX mode.

NFSv4 everyone@ represents literally everyone, and so the only way to remove permissions to represent something odd like 757 is to set an explicit deny WRITE_DATA entry for group@.

I suppose for case of trivial NFSv4 ACL (ZFS_ACL_TRIVIAL z_pflag set), we can do a simpler access check (if we aren't doing it already) based solely on mode. Like inode_permissions().

Care does need to be taken though to make sure you don't accidentally open things up too much.

@anodos325
Copy link
Contributor

I'll see if I can throw together a quick proof-of-concept fix today (but don't stop your investigation please) :)

@sdimitro
Copy link
Contributor Author

@anodos325 thanks for the quick reply! Yeah let me know how it goes for you, I'll also continue analyzing what's happening on my end.

BTW, the machine that initially hit this issue was using NFSv3, even though my above minimal reproducer is in v4. I'll try the above in v3 too and post my results.

@anodos325
Copy link
Contributor

Sorry, using "NFSv4" here is a bit misleading. ZFS has internal ACLs that are evaluated via zfs_zaccess and friends regardless of how files are accessed.

@anodos325
Copy link
Contributor

In case of trivial ACLs (ones that can be expressed as mode without losing information), we can probably optimize by looking exclusively at inode->i_mode rather than iterating through aces in the internal ACL (which would also allow us to avoid improper EPERM here). This will require careful testing though for edge cases.

@sdimitro
Copy link
Contributor Author

@anodos325 yeah there is definitely something going on with the ACLs being processed. Looking at at module/os/linux/zfs/zfs_acl.c:

2263 static int
2264 zfs_zaccess_aces_check(znode_t *zp, uint32_t *working_mode,
2265     boolean_t anyaccess, cred_t *cr)
2266 {
...
2293         while ((acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask,
2294             &iflags, &type))) {
2295                 uint32_t mask_matched;
2296
2297                 if (!zfs_acl_valid_ace_type(type, iflags))
2298                         continue;
2299
2300                 if (S_ISDIR(ZTOI(zp)->i_mode) &&
2301                     (iflags & ACE_INHERIT_ONLY_ACE))
2302                         continue;
2303
2304                 /* Skip ACE if it does not affect any AoI */
2305                 mask_matched = (access_mask & *working_mode);
2306                 if (!mask_matched)
2307                         continue;
2308
2309                 entry_type = (iflags & ACE_TYPE_FLAGS);
2310
2311                 checkit = B_FALSE;
2312
2313                 switch (entry_type) {
2314                 case ACE_OWNER:
2315                         if (uid == fowner)
2316                                 checkit = B_TRUE;
2317                         break;
2318                 case OWNING_GROUP:
2319                         who = gowner;
2320                         zfs_fallthrough;
2321                 case ACE_IDENTIFIER_GROUP:
2322                         checkit = zfs_groupmember(zfsvfs, who, cr);
2323                         break;
...
2343                 }
2344
2345                 if (checkit) {
2346                         if (type == DENY) {
2347                                 DTRACE_PROBE3(zfs__ace__denies,
2348                                     znode_t *, zp,
2349                                     zfs_ace_hdr_t *, acep,
2350                                     uint32_t, mask_matched);
2351                                 deny_mask |= mask_matched;
2352                         } else {
...
2361                         }
2362                         *working_mode &= ~mask_matched;
2363                 }
...
2368         }
2372         /* Put the found 'denies' back on the working mode */
2373         if (deny_mask) {
2374                 *working_mode |= deny_mask;
2375                 return (SET_ERROR(EACCES));
2376         } else if (*working_mode) {
...
2378         }
2379
2380         return (0);
2381 }

I used bpftrace to trace the zfs__ace__denies above and saw the following when trying to destroy the file with 757 perms (there was no output when I tried to delete the file with 755 perms):

$ sudo bpftrace  -e 'kprobe:trace_zfs_zfs__ace__denies {printf("%p, %p, %d", arg0, arg1, arg2);}'
0xffff89564c645ee0, 0xffff8955d30cc108, 6413
^C

Using sdb I also verified the specific ACE header:

sdb> echo 0xffff8955d30cc108 | cast zfs_ace_hdr_t * | deref
(zfs_ace_hdr_t){
	.z_type = (uint16_t)1,         // <----- type = DENY
	.z_flags = (uint16_t)8256, // <----- flags = ACE_OWNING_GROUP -> (ACE_GROUP | ACE_IDENTIFIER_GROUP)
	.z_access_mask = (uint32_t)70,
}

That said, the above was the 2nd ACE in our znode's ACL. The first ACE was an ALLOW entry for the owner (which we are in this case):

sdb> echo 0xffff89564c645ee0 | cast znode_t * | deref | member z_acl_cached->z_curr_node->z_acldata | cast zfs_ace_hdr_t * | deref
(zfs_ace_hdr_t){
	.z_type = (uint16_t)0,          // <----- type = ALLOW
	.z_flags = (uint16_t)4096,  // <----- flags = ACE_OWNER
	.z_access_mask = (uint32_t)70,
}

I'm still getting familiar with the code, but from what it seems on the above, when perms are 757 a DENY ACE is generated for the owning group which make us disregard the ALLOW ACE for the owner, and makes this function return EACCESS.

Is that the expected behavior? I'd expect that if we have an ALLOW as the owner we wouldn't need to look further for the owning group.

@sdimitro
Copy link
Contributor Author

Another thing to point out here is that the above happens only when group doesn't have write perms, but others do.

@anodos325
Copy link
Contributor

anodos325 commented Mar 15, 2022

Right. That's what I was alluding to as the root cause. It is expected and correct behavior for an explicit deny entry to take precedence over an allow entry C.F. RFC 5661 Section 6.

I have WIP fix. Will do testing tomorrow and give you a link to try out. Basically, in zfs_zaccess_common() return result of generic_permission(ZTOI(zp), <kernel permissions mask>) if ZFS_ACL_TRIVIAL z_pflag is set. This should bypass the zfs aces check and give the correct behavior for trivial ACL case. It should also give an nice perf boost to for this case I think.

@anodos325
Copy link
Contributor

Sorry. It was a distracted Monday and I was juggling a few things.

I'd expect that if we have an ALLOW as the owner we wouldn't need to look further for the owning group.

Yes, owner should override. I just realized that I can't reproduce your issue on our NFS server. I added support for ZFS native ACLs there, and discovered issue with crgetuid / crgetgid (not returning fsuid / fsgid). Perhaps, the uid being checked against the ACL is wrong in your case (hence, not getting owner-override).

root@scalebuilder:/CODE/scale-build/sources/openzfs# git diff HEAD
diff --git a/module/os/linux/spl/spl-cred.c b/module/os/linux/spl/spl-cred.c
index 8fe1cc30b..82347a2c6 100644
--- a/module/os/linux/spl/spl-cred.c
+++ b/module/os/linux/spl/spl-cred.c
@@ -128,7 +128,7 @@ groupmember(gid_t gid, const cred_t *cr)
 uid_t
 crgetuid(const cred_t *cr)
 {
-       return (KUID_TO_SUID(cr->euid));
+       return (KUID_TO_SUID(cr->fsuid));
 }
 
 /* Return the real user id */
@@ -156,7 +156,7 @@ crgetfsuid(const cred_t *cr)
 gid_t
 crgetgid(const cred_t *cr)
 {
-       return (KGID_TO_SGID(cr->egid));
+       return (KGID_TO_SGID(cr->fsgid));
 }
 
 /* Return the real group id */
root@scalebuilder:/CODE/scale-build

That's part of the PR here: #13186

@sdimitro
Copy link
Contributor Author

Thanks for looking into a fix. I'll give this a try but out of curiosity why are we changing crgetuid and crgetgid instead of just using crgetfsuid and crgetfsgid that are basically the same for our intents and purposes?

@anodos325
Copy link
Contributor

anodos325 commented Mar 15, 2022

There are areas of common code between FreeBSD and Linux that call crgetuid() IIRC. Using fsuid / fsgid is correct from kernel docs I've read. Setting fsuid / fsgid always updates euid and egid, but not the other way around. knfsd sets the fsids.

@anodos325
Copy link
Contributor

anodos325 commented Mar 15, 2022

What kernel version on you testing on? I just noticed that there may be some places where we need additional plumbing for user namespaces in permissions-related areas.

@sdimitro
Copy link
Contributor Author

sdimitro commented Mar 15, 2022

There are areas of common code between FreeBSD and Linux that call crgetuid() IIRC. Using fsuid / fsgid is correct from kernel docs I've read. Setting fsuid / fsgid always updates euid and egid, but not the other way around. knfsd sets the fsids.

I was mostly talking about using the fsuid-based functions in module/os/linux/zfs/zfs_acl.c - this is linux-specific code, not common Linux+FreeBSD code and it makes the bug go away from what it seems in my preliminary testing. My fear is that changing crgetuid/crgetgid is beyond the scope of fixing this issue and affects other parts of the system.

What kernel version on you testing on? I just noticed that there may be some places where we need additional plumbing for user namespaces in permissions-related areas.

The specific VM that hit the issue is version 5.4.0.

Are you part of the OpenZFS slack workspace by any chance?

@anodos325
Copy link
Contributor

My fear is that changing crgetuid/crgetgid is beyond the scope of fixing this issue and affects other parts of the system.

The reason why I switched up crgetuid/crgetgid was that I don't think there's a legitimate case where we would use the egid and euid (instead of fsid), and due to common code shared between FreeBSD and Linux leaves around a loaded foot-gun. :)

@ghost ghost mentioned this issue Mar 15, 2022
13 tasks
@behlendorf
Copy link
Contributor

behlendorf commented Mar 15, 2022

I don't think there's a legitimate case where we would use the egid and euid (instead of fsid),

The only cases I can think of where this might be a concern are for the delegation and ioctl permission checks. In these cases, I think the expectation is that the euid would be used, though from a practical stand point I don't think it actually matters. Still it'd be great if you could take a second look and give it some thought.

It's also a bit unfortunate that with the #13221 change we no longer have a way to get the euid if we wanted it. Now if we're comfortable using the fsuid for everything let's go ahead and retire the crgetfsuid() and crgetfsgid() functions and then use crgetuid() and crgetgid() everywhere. It looks like we only have a few callers in the Linux specific code. This would at least get rid of that confusion.

@anodos325
Copy link
Contributor

Tangentially related (because of my rambling mis-diagnosis at the beginning comments), the optimization I mentioned (avoiding zfs_zaccess_aces_check() in favor of generic_permission() is here: truenas@15c11d4

It passes CI related to permissions, but I haven't tested comprehensively yet.

@stale
Copy link

stale bot commented Mar 18, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Mar 18, 2023
@stale stale bot closed this as completed Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants