Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chown erases file permissions for v2-v4 filesystems #1264

Closed
db48x opened this issue Feb 5, 2013 · 32 comments
Closed

chown erases file permissions for v2-v4 filesystems #1264

db48x opened this issue Feb 5, 2013 · 32 comments
Milestone

Comments

@db48x
Copy link

db48x commented Feb 5, 2013

Basically, chown breaks it, but only if the file is world-readable.

Edit: This happens with zfs filesystem version 4, not version 5.

[db48x@celebdil temp]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb 5 01:13 foo
[db48x@celebdil temp]$ chown db48x foo
[db48x@celebdil temp]$ ll
total 512
----------. 1 db48x db48x 0 Feb 5 01:13 foo

[db48x@celebdil temp]$ ll
total 512
-rw-rw----. 1 db48x db48x 0 Feb 5 01:13 foo
[db48x@celebdil temp]$ chown db48x foo
[db48x@celebdil temp]$ ll
total 512
-rw-rw----. 1 db48x db48x 0 Feb 5 01:13 foo

@behlendorf
Copy link
Contributor

I'm unable to reproduce this under RHEL6.2 with 0.6.0-rc14. Try reproducing this strange behavior in an ext4 file system, it may not be due to ZFS.

-bash-4.1$ ll
total 1
-rw-rw-r-- 1 behlendo behlendo 0 Feb  5 11:20 foo
-bash-4.1$ chown behlendo foo
-bash-4.1$ ll
total 1
-rw-rw-r-- 1 behlendo behlendo 0 Feb  5 11:20 foo
-bash-4.1$ ll
total 1
-rw-rw---- 1 behlendo behlendo 0 Feb  5 11:20 foo
-bash-4.1$ chown behlendo foo
-bash-4.1$ ll
total 1
-rw-rw---- 1 behlendo behlendo 0 Feb  5 11:20 foo

@nedbass
Copy link
Contributor

nedbass commented Feb 5, 2013

@behlendorf He mentioned on zfs-discuss that it doesn't happen on ext4. Also he's running FC18 kernel-3.7.5-201.fc18.x86_64 so you may want to try it in your vm.

@ryao
Copy link
Contributor

ryao commented Feb 5, 2013

I regularly use chown on Gentoo Linux. I have yet to see this happen.

@db48x Would you try installing Gentoo Prefix on your system and using its chown?

http://www.gentoo.org/proj/en/gentoo-alt/prefix/

@behlendorf
Copy link
Contributor

Thank @nedbass, I just tried with 3.7.4-204.fc18.x86_64 and wasn't able to reproduce the issue there either. @db48x can you strace the chown and post in the results so we can see what mode bits are being passed to the kernel.

@nedbass
Copy link
Contributor

nedbass commented Feb 5, 2013

@behlendorf Did you try with xattr=sa?

Here's the strace info he posted to the list.

[db48x@celebdil temp]$ grep chownat /chown.*.strace
/home/db48x/chown.660.strace:26683 fchownat(AT_FDCWD, "foo", 500, 4294967295, 0) = 0
/home/db48x/chown.664.strace:26671 fchownat(AT_FDCWD, "foo", 500, 4294967295, 0) = 0

@nedbass
Copy link
Contributor

nedbass commented Feb 5, 2013

I notice from the ls output in his mailing list post that the file has an SELinux ACL.

[db48x@celebdil temp]$ ll    
total 512    
-rw-rw-r--. 1 db48x db48x 0 Feb  5 01:13 foo

Perhaps that is a relevant detail. @db48x what does getfattr -n security.selinux foo say?

@db48x
Copy link
Author

db48x commented Feb 5, 2013

Certainly:

[db48x@celebdil temp]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  5 15:29 foo
[db48x@celebdil temp]$ getfattr -n security.selinux foo
# file: foo
security.selinux="system_u:object_r:file_t:s0"

[db48x@celebdil temp]$ chown db48x foo
[db48x@celebdil temp]$ ll
total 512
----------. 1 db48x db48x 0 Feb  5 15:29 foo
[db48x@celebdil temp]$ getfattr -n security.selinux foo
# file: foo
security.selinux="system_u:object_r:file_t:s0"

[db48x@celebdil temp]$ ll -Z
----------. db48x db48x system_u:object_r:file_t:s0      foo
[db48x@celebdil temp]$

Looks ok to me though. Selinux is actually in permissive mode, because it stopped working (not sure why I upgraded to FC18, honestly). This temp directory is in my home directory, so it ought to be labelled home_dir_t or some-such, but restorecon thinks that it is fine the way it is. I've spent several hours over the past couple of weekends trying to figure out how selinux really works, to no avail. If that turns out to be the cause then I won't mind having been the canary in the coal mine.

Let me look into this Gentoo Prefix thing next.

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Gento Prefix failed to build GCC; some compile error. An interesting idea though.

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

I can't reproduce it either on FC18 with selinux enabled (permissive or enforcing). Just as a sanity check, what does type chown say?

@ryao
Copy link
Contributor

ryao commented Feb 6, 2013

@db48x Would you make a Gentoo chroot and see if you can reproduce this with Gentoo's chmod? You should be able to make one by doing something like the following:

zfs create -o mountpoint=/mnt/gentoo rpool/gentoo
cd /mnt/gentoo
wget "ftp://distfiles.gentoo.org/pub/gentoo/releases/amd64/current-stage3/stage3-amd64-*.tar.bz2"
tar xjpf stage3*
mount -t proc proc /mnt/gentoo/proc
mount --rbind /dev /mnt/gentoo/dev
mount --rbind /sys /mnt/gentoo/sys
cp -L /etc/resolv.conf /mnt/gentoo/etc/ 
chroot /mnt/gentoo /bin/bash
source /etc/profile
PS1="(chroot) ${PS1}"

Now try to reproduce your issue. After you are finished, you can exit the chroot and destroy it by doing:

exit
cd
umount -l /mnt/gentoo/sys /mnt/gentoo/dev /mnt/gentoo/dev/pts /mnt/gentoo/proc
zfs destroy rpool/gentoo

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Using the gentoo binaries works fine:

(gentoo) celebdil temp # ll
total 0
(gentoo) celebdil temp # touch foo
(gentoo) celebdil temp # ll
total 512
-rw-r--r-- 1 root root 0 Feb  6 06:50 foo
(gentoo) celebdil temp # chown root foo
(gentoo) celebdil temp # ll
total 512
-rw-r--r-- 1 root root 0 Feb  6 06:50 foo

I don't have an strace in here though, so I can't peek in and see what's different.

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

I notice there's no SELinux ACL indicated in your ls output for this test. Did you try it with an SELinux ACL?

I'd still like to see type chown from your native environment, just to confirm it's not wrapped in an alias or something.

What other permutations of your reproducer have you tried? Does it depend on directory, file name, or user? Can you reproduce it in a newly created filesystem?

@db48x
Copy link
Author

db48x commented Feb 6, 2013

The file does have an selinux ACL, but the gentoo ls doesn't see it.

(gentoo) celebdil temp # getfattr -n security.selinux foo
# file: foo
security.selinux="system_u:object_r:file_t:s0"

(gentoo) celebdil temp # ll -Z
total 512
-rw-r--r-- 1 root root ? 0 Feb  6 06:50 foo

oh, and

[db48x@celebdil temp]$ type chown
chown is /usr/bin/chown
[db48x@celebdil temp]$ file `which chown`
/usr/bin/chown: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=0xca2d8cf2e99b10271853c0ee4c1bb8b04e07759c, stripped

Ahh, the question about new filesystems is good; I hadn't tried that. If I do this in my newly-created gentoo filesystem using the normal tools, it doesn't happen:

[root@celebdil ~]# cd /gentoo/temp
[root@celebdil temp]# ll
total 512
-rw-r--r--. 1 db48x root 0 Feb  5 22:50 foo
[root@celebdil temp]# rm foo
rm: remove regular empty file ‘foo’? y
[root@celebdil temp]# touch foo
[root@celebdil temp]# ll
total 512
-rw-r--r--. 1 root root 0 Feb  5 23:23 foo
[root@celebdil temp]# chown root foo
[root@celebdil temp]# ll
total 512
-rw-r--r--. 1 root root 0 Feb  5 23:23 foo
[root@celebdil temp]# chown db48x foo
[root@celebdil temp]# ll
total 512
-rw-r--r--. 1 db48x root 0 Feb  5 23:23 foo
[root@celebdil temp]# 

Let me try again when I don't have to be root:

[db48x@celebdil ~]$ sudo zfs create -o mountpoint=/mnt/temp tank/temp
[sudo] password for db48x:
[db48x@celebdil ~]$ cd /mnt/temp
[db48x@celebdil temp]$ sudo chown db48x.db48x .
[db48x@celebdil temp]$ touch foo
[db48x@celebdil temp]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  5 23:28 foo
[db48x@celebdil temp]$ chown db48x foo
[db48x@celebdil temp]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  5 23:28 foo

So that must mean that something is wrong with my current filesystem. Hard to say whether that's a result of the ACL, of xattrs in general (which I enabled long ago; I couldn't use most of the filesystems in this pool for a while because until a few versions ago zfs would lock up while reading them), or of something else.

Can you suggest a way to compare the new and the old filesystems to see how they differ on disk?

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

My first suspicion would be that it's related to having xattr=sa. I know you said on the list that it still happens with xattr=on, but did you confirm that with a newly created file?

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Yes, I've created the file anew for each test.

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Also, the tank/temp filesystem I just created has xattr=sa, since it inherited it from tank... Oh, but it doesn't. I turned xattrs off entirely for a test just before I went to sleep last night, so tank/temp doesn't have xattrs. Let me do some more testing.

@db48x
Copy link
Author

db48x commented Feb 6, 2013

[db48x@celebdil mnt]$ sudo zfs create -o mountpoint=/mnt/temp -o xattr=on tank/temp-xattr
[db48x@celebdil mnt]$ sudo zfs set mountpoint=/mnt/temp-xattr tank/temp-xattr
[db48x@celebdil mnt]$ cd /mnt/temp-xattr
[db48x@celebdil temp-xattr]$ sudo chown db48x.db48x .
[db48x@celebdil temp-xattr]$ touch foo
[db48x@celebdil temp-xattr]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  6 00:06 foo
[db48x@celebdil temp-xattr]$ chown db48x foo
[db48x@celebdil temp-xattr]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  6 00:06 foo
[db48x@celebdil temp-xattr]$ sudo zfs create -o mountpoint=/mnt/temp-xattr-sa -o xattr=sa tank/temp-xattr-sa
[db48x@celebdil temp-xattr]$ cd /mnt/temp-xattr-sa/
[db48x@celebdil temp-xattr-sa]$ sudo chown db48x.db48x .
[db48x@celebdil temp-xattr-sa]$ touch foo
[db48x@celebdil temp-xattr-sa]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  6 00:07 foo
[db48x@celebdil temp-xattr-sa]$ chown db48x foo
[db48x@celebdil temp-xattr-sa]$ ll
total 512
-rw-rw-r--. 1 db48x db48x 0 Feb  6 00:07 foo

So it doesn't happen on new filesystems with or without xattrs, but it does happen on my old filesystem, with or without xattrs enabled.

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

Ah, the file system version seems to be the key. I can reproduce the issue if I create a pool with pool version 23, file system version 4, or lower. Did you upgrade your pool version at some point? That would explain why your tank filesystem, which has version=4, has the issue but new filesystems with version=5 don't.

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Yes, I upgraded to my current pool and filesystem versions some time ago, long before I noticed this bug.

@nedbass
Copy link
Contributor

nedbass commented Feb 6, 2013

@db48x You should be able to avoid this bug by doing zfs upgrade tank. This may however trigger an SPL warning and stack trace on your console (or a panic if you built with --enable-debug). See #1268.

@db48x
Copy link
Author

db48x commented Feb 6, 2013

Alas, if only that were the case. I tried it last night, but it didn't fix the problem.

@db48x
Copy link
Author

db48x commented Feb 9, 2013

Ahh. I had a brainwave, and did a zfs upgrade this time instead of a zpool upgrade. That does indeed fix it; this only happens on filesystem version 4. Thanks Ned, for your help tracking this down :)

@behlendorf
Copy link
Contributor

Thanks for the update. There's clearly an issue here so I want to leave the bug open. But since it only impacts zfs v4 file systems (which are fairly rare these days) and the fix is to upgrade them to v5 we're going to treat this one as a low priority.

@clefru
Copy link
Contributor

clefru commented Dec 31, 2013

FYI this also affects zfs v3 file systems. Problem is gone after zfs upgrade to v5.

Couple of search terms to make others find this bug: npm install fail -EPERM zfs homedir.

@mmehnert
Copy link
Contributor

This still happens to me on git HEAD with the pool upgraded to latest version. :-(
EDIT: Sorry. After rereading all the posts in this report, actually "zfs upgrade" solved this. I did not know this even existed...

@FransUrbo
Copy link
Contributor

Since this seems to only occur with filesystem four, not five, should we close this as not-applicable, or rename it to something that indicate that it only happens when fs=4?

@behlendorf behlendorf changed the title chown erases file permissions chown erases file permissions for version 4 datasets Jun 10, 2014
@behlendorf
Copy link
Contributor

I've updated the description to reflect that this only impacts version 4 filesystems. But there is a real bug here which should be fixed even if very few people run version 4 filesystems there days.

@clefru
Copy link
Contributor

clefru commented Jun 10, 2014

I saw this also on zfs v3, see my comment above.

@behlendorf behlendorf changed the title chown erases file permissions for version 4 datasets chown erases file permissions for v3,v4 filesystems Jun 11, 2014
@behlendorf
Copy link
Contributor

@clefru Thanks for keeping me honest, I've updated the description again.

@jspiros
Copy link

jspiros commented Oct 7, 2014

I just ran into this problem with zfs v2, zpool v28, after upgrading from 30b92c1 to whatever the "0.6.3-1~wheezy" Debian packages are based on.

@behlendorf behlendorf changed the title chown erases file permissions for v3,v4 filesystems chown erases file permissions for v2-v4 filesystems Oct 7, 2014
@behlendorf
Copy link
Contributor

I've updated the title accordingly. The same fix should apply, upgrade the zfs filesystem version if you can.

@behlendorf behlendorf removed this from the 0.7.0 milestone Oct 7, 2014
@behlendorf behlendorf added Bug - Minor and removed Bug labels Oct 7, 2014
@jspiros
Copy link

jspiros commented Oct 7, 2014

@behlendorf Yeah, @DeHackEd remembered this bug when I was panicking on IRC and suggested upgrading right away. Everything is at v5 now, and haven't experienced it again so far. Will report back if I notice any different.

dweeezil added a commit to dweeezil/zfs that referenced this issue Oct 7, 2014
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
but also don't typically have ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Fixes openzfs#1264
dweeezil added a commit to dweeezil/zfs that referenced this issue Oct 7, 2014
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
and also frequently have empty ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Fixes openzfs#1264
dweeezil added a commit to dweeezil/zfs that referenced this issue Oct 21, 2014
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
and also frequently have empty ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Fixes openzfs#1264
@behlendorf behlendorf added this to the 0.6.4 milestone Oct 21, 2014
ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
and also frequently have empty ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#1264
behlendorf pushed a commit that referenced this issue Dec 23, 2014
In zfs_acl_chown_setattr(), the zfs_mode_comput() function is used to
create a traditional mode value based on an ACL.  If no ACL exists, this
processing shouldn't be done.  Problems caused by this were most evident
on version 4 filesystems which not only don't have system attributes,
and also frequently have empty ACLs. On such filesystems, performing a
chown() operation could have the effect of dirtying the mode bits in
memory but not on the file system as follows:

	# create a file with typical mode of 664
	echo test > test
	chown anyuser test
	ls -l test

and the mode will show up as all zeroes.  Unmounting/mounting and/or
exporting/importing the filesystem will reveal the proper mode again.

Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1264
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants