-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot rm file from folder in nfs-share when folder has write permissions for others #13217
Comments
cc: @pbhenson , @anodos325 , @behlendorf , @don-brady |
Yes, I recall this is an issue with how the the internal NFSv4 ACL is generated to represent a POSIX mode. NFSv4 everyone@ represents literally everyone, and so the only way to remove permissions to represent something odd like 757 is to set an explicit deny WRITE_DATA entry for group@. I suppose for case of trivial NFSv4 ACL (ZFS_ACL_TRIVIAL z_pflag set), we can do a simpler access check (if we aren't doing it already) based solely on mode. Like Care does need to be taken though to make sure you don't accidentally open things up too much. |
I'll see if I can throw together a quick proof-of-concept fix today (but don't stop your investigation please) :) |
@anodos325 thanks for the quick reply! Yeah let me know how it goes for you, I'll also continue analyzing what's happening on my end. BTW, the machine that initially hit this issue was using NFSv3, even though my above minimal reproducer is in v4. I'll try the above in v3 too and post my results. |
Sorry, using "NFSv4" here is a bit misleading. ZFS has internal ACLs that are evaluated via zfs_zaccess and friends regardless of how files are accessed. |
In case of trivial ACLs (ones that can be expressed as mode without losing information), we can probably optimize by looking exclusively at inode->i_mode rather than iterating through aces in the internal ACL (which would also allow us to avoid improper EPERM here). This will require careful testing though for edge cases. |
@anodos325 yeah there is definitely something going on with the ACLs being processed. Looking at at module/os/linux/zfs/zfs_acl.c:
I used bpftrace to trace the
Using
That said, the above was the 2nd ACE in our znode's ACL. The first ACE was an ALLOW entry for the owner (which we are in this case):
I'm still getting familiar with the code, but from what it seems on the above, when perms are 757 a DENY ACE is generated for the owning group which make us disregard the ALLOW ACE for the owner, and makes this function return EACCESS. Is that the expected behavior? I'd expect that if we have an ALLOW as the owner we wouldn't need to look further for the owning group. |
Another thing to point out here is that the above happens only when |
Right. That's what I was alluding to as the root cause. It is expected and correct behavior for an explicit deny entry to take precedence over an allow entry C.F. RFC 5661 Section 6. I have WIP fix. Will do testing tomorrow and give you a link to try out. Basically, in zfs_zaccess_common() return result of |
Sorry. It was a distracted Monday and I was juggling a few things.
Yes, owner should override. I just realized that I can't reproduce your issue on our NFS server. I added support for ZFS native ACLs there, and discovered issue with crgetuid / crgetgid (not returning fsuid / fsgid). Perhaps, the uid being checked against the ACL is wrong in your case (hence, not getting owner-override).
That's part of the PR here: #13186 |
Thanks for looking into a fix. I'll give this a try but out of curiosity why are we changing |
There are areas of common code between FreeBSD and Linux that call crgetuid() IIRC. Using fsuid / fsgid is correct from kernel docs I've read. Setting fsuid / fsgid always updates euid and egid, but not the other way around. knfsd sets the fsids. |
What kernel version on you testing on? I just noticed that there may be some places where we need additional plumbing for user namespaces in permissions-related areas. |
I was mostly talking about using the
The specific VM that hit the issue is version 5.4.0. Are you part of the OpenZFS slack workspace by any chance? |
The reason why I switched up crgetuid/crgetgid was that I don't think there's a legitimate case where we would use the egid and euid (instead of fsid), and due to common code shared between FreeBSD and Linux leaves around a loaded foot-gun. :) |
The only cases I can think of where this might be a concern are for the delegation and ioctl permission checks. In these cases, I think the expectation is that the It's also a bit unfortunate that with the #13221 change we no longer have a way to get the |
Tangentially related (because of my rambling mis-diagnosis at the beginning comments), the optimization I mentioned (avoiding zfs_zaccess_aces_check() in favor of generic_permission() is here: truenas@15c11d4 It passes CI related to permissions, but I haven't tested comprehensively yet. |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
Current Minimal Reproducer
Setup consists of two VMs one exposing an empty folder over NFS and the other one mounting it under
~/nfs-share
. Then to reproduce the issue do the following on the client side:What is interesting is that if we change the permissions of
test-dir
to755
we're able to delete the file. I tried all the cases from750
to757
and it seems like wheneverothers
have write permissions (specifically752
,753
,756
, and757
) for the directory containing the file then we are not able to delete the file. This is obviously a bug because since we are the owner of the file we should be able to delete it.Further Analysis
Looking at our internal VMs at Delphix we don't hit this issue with older VMs that don't contain the following commit:
235a856
The above commit seems to have introduced another regression at some point that was later fixed (see 66e6d3f ). Unfortunately this commit does not fix the issue filed here.
Will update the issue as I dig further but I figured I file a bug here first in case people with more context of the above commit can point out the culprit quicker than myself.
The text was updated successfully, but these errors were encountered: