Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIL claiming should not start user accounting #7163

Merged
merged 1 commit into from
Feb 21, 2018

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Feb 13, 2018

Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned.

Fixes: #6916

Signed-off-by: Tom Caputi tcaputi@datto.com

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes good sense and LGTM. Just a minor build issue remaining.

@tcaputi tcaputi force-pushed the more_encryption_fixes branch from 0674dbd to da6f40e Compare February 13, 2018 18:15
@behlendorf
Copy link
Contributor

On the surface it looks like the upgrade_userobj_001_pos test case will need to be updated to wait a bit longer after mount to verify the upgrade was successful.

@sempervictus
Copy link
Contributor

The project quota commit broke this one :)

@tcaputi tcaputi force-pushed the more_encryption_fixes branch 3 times, most recently from b0500b0 to ed94075 Compare February 16, 2018 18:41
@loli10K
Copy link
Contributor

loli10K commented Feb 16, 2018

The following issue seems to have something to do with user accounting, posting here in case it's relevant:

root@linux:~# cat /sys/module/zfs/version 
0.7.0-312_ged94075
root@linux:~# POOLNAME='testpool'
root@linux:~# truncate -s 512m /var/tmp/file
root@linux:~# zpool create $POOLNAME -f /var/tmp/file
root@linux:~# echo 'password' | zfs create -o encryption=on -o keyformat=passphrase $POOLNAME/sendfs
root@linux:~# zfs snap $POOLNAME/sendfs@snap1
root@linux:~# zfs snap $POOLNAME/sendfs@snap2
root@linux:~# zfs send $POOLNAME/sendfs@snap1 | zfs recv $POOLNAME/recvfs
root@linux:~# zfs send -w -i $POOLNAME/sendfs@snap1 $POOLNAME/sendfs@snap2 | zfs recv $POOLNAME/recvfs
[   59.630094] VERIFY3(0 == os->os_flags & (1ULL << 0)) failed (0 == 1)
[   59.631063] PANIC at dsl_crypt.c:2079:dsl_crypto_recv_key_check()
[   59.631910] Kernel panic - not syncing: VERIFY3(0 == os->os_flags & (1ULL << 0)) failed (0 == 1)
[   59.631910] 
[   59.633213] CPU: 0 PID: 779 Comm: zfs Tainted: P           O  3.16.0-4-amd64 #1 Debian 3.16.51-3
[   59.633959] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   59.633959]  0000000000000000 ffffffff8151ea00 ffffffffa041397e ffff880039c73788
[   59.633959]  ffffffff8151b77e ffffffff00000010 ffff880039c73798 ffff880039c73738
[   59.633959]  ffffffff8151bf2e ffff880039c737b8 0000000000000007 0000000000000006
[   59.633959] Call Trace:
[   59.633959]  [<ffffffff8151ea00>] ? dump_stack+0x5d/0x78
[   59.633959]  [<ffffffff8151b77e>] ? panic+0xc8/0x206
[   59.633959]  [<ffffffff8151bf2e>] ? printk+0x54/0x56
[   59.633959]  [<ffffffffa040924a>] ? spl_panic+0x10a/0x110 [spl]
[   59.633959]  [<ffffffff815234de>] ? mutex_lock+0xe/0x2a
[   59.633959]  [<ffffffffa054c688>] ? dmu_object_info_from_dnode+0x78/0x130 [zfs]
[   59.633959]  [<ffffffffa05348fb>] ? dbuf_verify_user+0x6b/0x160 [zfs]
[   59.633959]  [<ffffffffa0534a03>] ? dmu_buf_get_user+0x13/0x20 [zfs]
[   59.633959]  [<ffffffffa0570e1a>] ? dsl_dataset_hold_obj_flags+0xba/0xca0 [zfs]
[   59.633959]  [<ffffffffa040cebe>] ? tsd_hash_search.isra.1+0x3e/0x90 [spl]
[   59.633959]  [<ffffffffa040cf50>] ? tsd_get+0x40/0x90 [spl]
[   59.633959]  [<ffffffffa05ac905>] ? rrn_find+0x25/0x50 [zfs]
[   59.633959]  [<ffffffffa05ad156>] ? rrw_held+0x66/0xd0 [zfs]
[   59.633959]  [<ffffffff815234de>] ? mutex_lock+0xe/0x2a
[   59.633959]  [<ffffffffa0552efa>] ? dmu_objset_from_ds+0x9a/0x250 [zfs]
[   59.633959]  [<ffffffffa058634d>] ? dsl_crypto_recv_key_check+0x6fd/0x8a0 [zfs]
[   59.633959]  [<ffffffffa0585c50>] ? dsl_crypto_key_sync+0x1d0/0x1d0 [zfs]
[   59.633959]  [<ffffffffa059bb31>] ? dsl_sync_task+0xf1/0x240 [zfs]
[   59.633959]  [<ffffffffa0586700>] ? spa_keystore_wkey_hold_dd+0x1b0/0x1b0 [zfs]
[   59.633959]  [<ffffffffa0585c50>] ? dsl_crypto_key_sync+0x1d0/0x1d0 [zfs]
[   59.633959]  [<ffffffffa0586700>] ? spa_keystore_wkey_hold_dd+0x1b0/0x1b0 [zfs]
[   59.633959]  [<ffffffffa058a835>] ? dsl_crypto_recv_key+0x35/0x40 [zfs]
[   59.633959]  [<ffffffffa055e933>] ? dmu_recv_stream+0x363/0x13c0 [zfs]
[   59.633959]  [<ffffffffa059bc39>] ? dsl_sync_task+0x1f9/0x240 [zfs]
[   59.633959]  [<ffffffffa0556580>] ? receive_cksum+0x30/0x30 [zfs]
[   59.633959]  [<ffffffffa0557100>] ? dmu_recv_begin_sync+0xb80/0xb80 [zfs]
[   59.633959]  [<ffffffffa0556580>] ? receive_cksum+0x30/0x30 [zfs]
[   59.633959]  [<ffffffffa055e516>] ? dmu_recv_begin+0x196/0x250 [zfs]
[   59.633959]  [<ffffffffa06136a5>] ? zfs_ioc_recv_impl+0x275/0xfa0 [zfs]
[   59.633959]  [<ffffffff81161805>] ? zone_statistics+0x85/0x90
[   59.633959]  [<ffffffff8114bc1f>] ? get_page_from_freelist+0x57f/0x910
[   59.633959]  [<ffffffffa06149b4>] ? zfs_ioc_recv_new+0x344/0x3d0 [zfs]
[   59.633959]  [<ffffffffa047251c>] ? nvlist_copy_pairs.isra.52+0x6c/0x80 [znvpair]
[   59.633959]  [<ffffffffa04022cd>] ? spl_kmem_alloc_impl+0xbd/0x160 [spl]
[   59.633959]  [<ffffffffa04706e5>] ? nv_mem_zalloc.isra.12+0x25/0x30 [znvpair]
[   59.633959]  [<ffffffffa0615bac>] ? zfsdev_ioctl+0x38c/0x790 [zfs]
[   59.633959]  [<ffffffff811c1b2f>] ? do_vfs_ioctl+0x2cf/0x4b0
[   59.633959]  [<ffffffff810593c7>] ? __do_page_fault+0x177/0x410
[   59.633959]  [<ffffffff811c1d91>] ? SyS_ioctl+0x81/0xa0
[   59.633959]  [<ffffffff81526d58>] ? async_page_fault+0x28/0x30
[   59.633959]  [<ffffffff81524d0d>] ? system_call_fast_compare_end+0x10/0x15
[   59.633959] Rebooting in 1 seconds..

We are failing here:

2074         /*
2075          * Useraccounting is not portable and must be done with the keys loaded.
2076          * Therefore, whenever we do any kind of receive the useraccounting
2077          * must not be present.
2078          */
2079         ASSERT0(os->os_flags & OBJSET_FLAG_USERACCOUNTING_COMPLETE);
2080         ASSERT0(os->os_flags & OBJSET_FLAG_USEROBJACCOUNTING_COMPLETE);

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 17, 2018

@loli10K I see what the issue is and should have a fix soon.

@phreaker0
Copy link

I just imported a dd copy from my broken pool (#6916) and it worked fine with this patch.

Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned for write. As part of this work, the
zfsvfs and zvol code was updated so that it no longer lies
about owning objsets readonly.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
if (error)
return (error);

error = zfsvfs_parse_options(zm->mnt_data, &zfsvfs->z_vfs);
error = zfsvfs_create(osname, vfs->vfs_readonly, &zfsvfs);
if (error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error path and the next one appear to leak vfs.

*/
if (!readonly &&
dmu_objset_incompatible_encryption_version(zfsvfs->z_os))
return (SET_ERROR(EROFS));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this check is no longer needed since it's now handled properly in dmu_objset_own_impl() since we're now tell the truth about the dataset being r/w.

@tcaputi tcaputi force-pushed the more_encryption_fixes branch from ed94075 to f26b531 Compare February 20, 2018 18:17
@behlendorf behlendorf merged commit 163a8c2 into openzfs:master Feb 21, 2018
@codecov
Copy link

codecov bot commented Feb 21, 2018

Codecov Report

Merging #7163 into master will increase coverage by <.01%.
The diff coverage is 86.95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7163      +/-   ##
==========================================
+ Coverage   76.35%   76.35%   +<.01%     
==========================================
  Files         327      327              
  Lines      103775   103773       -2     
==========================================
+ Hits        79235    79240       +5     
+ Misses      24540    24533       -7
Flag Coverage Δ
#kernel 76.18% <86.95%> (+0.02%) ⬆️
#user 65.64% <0%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b30ee6...f26b531. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

recovery import (zpool import -F) issue with encrypted pool
5 participants