Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS is returning corrupted pointers for posix acl #4944

Closed
lorddoskias opened this issue Aug 8, 2016 · 6 comments
Closed

ZFS is returning corrupted pointers for posix acl #4944

lorddoskias opened this issue Aug 8, 2016 · 6 comments
Milestone

Comments

@lorddoskias
Copy link
Contributor

lorddoskias commented Aug 8, 2016

So while doing my Monday evening hacking on zfs I started getting strange splats in RCU:

[   23.307300] ------------[ cut here ]------------
[   23.308024] WARNING: CPU: 3 PID: 1097 at kernel/rcu/tree.c:3115 __call_rcu.constprop.62+0x181/0x2f0
[   23.309345] Modules linked in: zfs(O) zunicode(O) zcommon(O) znvpair(O) zavl(O) spl(O)
[   23.310584] CPU: 3 PID: 1097 Comm: touch Tainted: G           O    4.7.0-nbor #23
[   23.311635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   23.312969]  0000000000000000 ffff8800364efbe0 ffffffff813dfcb3 0000000000000000
[   23.314068]  0000000000000000 ffff8800364efc20 ffffffff810508fb 00000c2b81225f9a
[   23.315163]  0000000000000008 ffff880036500009 ffffffff81cb3240 ffff880036550498
[   23.316256] Call Trace:
[   23.316613]  [<ffffffff813dfcb3>] dump_stack+0x85/0xc2
[   23.317340]  [<ffffffff810508fb>] __warn+0xcb/0xf0
[   23.318021]  [<ffffffff810509ed>] warn_slowpath_null+0x1d/0x20
[   23.318858]  [<ffffffff810bb4b1>] __call_rcu.constprop.62+0x181/0x2f0
[   23.319770]  [<ffffffff810bb63a>] kfree_call_rcu+0x1a/0x20
[   23.320557]  [<ffffffff811c8445>] generic_permission+0x185/0x190
[   23.321411]  [<ffffffff811c847b>] __inode_permission+0x2b/0xb0
[   23.322274]  [<ffffffff811c8514>] inode_permission+0x14/0x50
[   23.323085]  [<ffffffff811cb63e>] link_path_walk+0x6e/0x530
[   23.323874]  [<ffffffff811c92a9>] ? path_init+0x5c9/0x750
[   23.324561]  [<ffffffff811c9235>] ? path_init+0x555/0x750
[   23.325245]  [<ffffffff811cc32d>] path_openat+0x7d/0x960
[   23.325893]  [<ffffffff811cdd9e>] do_filp_open+0x7e/0xe0
[   23.326601]  [<ffffffff8161fa97>] ? _raw_spin_unlock+0x27/0x40
[   23.327424]  [<ffffffff811de179>] ? __alloc_fd+0xf9/0x210
[   23.328193]  [<ffffffff811bbf47>] do_sys_open+0x127/0x200
[   23.328962]  [<ffffffff811bc03e>] SyS_open+0x1e/0x20
[   23.329668]  [<ffffffff81620300>] entry_SYSCALL_64_fastpath+0x23/0xc1
[   23.330582]  [<ffffffff810a0a6f>] ? trace_hardirqs_off_caller+0x1f/0xc0
[   23.331530] ---[ end trace 0edd18e8f8682b6d ]---

So this is the following call chain: generic_permission()->acl_permission_check()->check_acl()->posix_acl_release()->kfree_rcu()

And since RCU requires the rcu_head pointer to be aligned and it's not then the aforementioned warning triggers and then the machine goes KABOOM. After some tracing it seems that zfs is returning a bogus pointer value in generic get_acl, which calls inode->i_op->get_acl(inode, type);. The pointer seems to be off-by-one: ffff880036573881, and the a_rcu is ffff880036573889, which causes the warning.

I can reliably reproduce this simply by running the zfs-test suite, the first test to pass is Test: /usr/share/zfs/zfs-tests/tests/functional/acl/posix/setup (run as root) [00:00] [PASS]
and after that everything goes south.

@behlendorf
Copy link
Contributor

@lorddoskias can you determine if it was one of the recent changes which touched the acl code which caused this.

4b908d3 Linux 4.8 compat: posix_acl_valid()
e85a639 Retire HAVE_CURRENT_UMASK and HAVE_POSIX_ACL_CACHING
938cfeb Linux 4.8 compat: new s_user_ns member of struct super_block

@behlendorf behlendorf added this to the 0.7.0 milestone Aug 8, 2016
@lorddoskias
Copy link
Contributor Author

Actually I don't think it's any of those since I was initially testing with head commit e24e62a948e1 Fix memory leak in function add_config() (10 days ago)

So the issue must be older.

@behlendorf
Copy link
Contributor

Well in that case e42d466 is a likely possibility as to why you're seeing it now. Although, in that case this change may just have exposed a long standing issue.

e42d466 Fix config for posix_acl_release() GPL test

@lorddoskias
Copy link
Contributor Author

I still get the issue with that commit reverted and autogen && configure && compile re-run

@behlendorf behlendorf added the Bug label Aug 8, 2016
@lorddoskias
Copy link
Contributor Author

I tested the patch and it fixes the issue. Thanks.

nedbass pushed a commit to nedbass/zfs that referenced this issue Aug 26, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 3, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 5, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
nedbass pushed a commit to nedbass/zfs that referenced this issue Sep 5, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
tuxoko pushed a commit to tuxoko/zfs that referenced this issue Sep 8, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 19, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
DeHackEd pushed a commit to DeHackEd/zfs that referenced this issue Oct 29, 2016
Starting from Linux 4.7, get_acl will set acl cache pointer to temporary
sentinel value before calling i_op->get_acl. Therefore we can't compare
against ACL_NOT_CACHED and return.

Since from Linux 3.14, get_acl already check the cache for us, so we
disable this in zpl_get_acl.

Linux 4.7 also does set_cached_acl for us so we disable it in zpl_get_acl.

Signed-off-by: Chunwei Chen <david.chen@osnexus.com>
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4944
Closes openzfs#4946
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants