Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption and Raw Send Stability Improvements #6864

Closed
wants to merge 5 commits into from

Conversation

tcaputi
Copy link
Contributor

@tcaputi tcaputi commented Nov 13, 2017

The current on-disk format for encrypted datasets protects
not only the encrypted and authenticated blocks, but also
the order and interpretation of these blocks. In order to
make this work while maintaining the ability to do raw sends
the indirect bps maintain a secure checksum of all the MACs
in the block below it, along with a few other fields that
determine how the data is interpretted.

Unfortunately, the current on-disk format erroniously
includes some fields which are not portable and thus cannot
support raw sends. It is also not possible to easily work
around this issue due to a separate and much smaller bug
which causes indirect blocks for encrypted dnodes to not
be compressed, which conflicts with the previous bug. In
addition, raw send streams do not currently include
dn_maxblkid which is needed in order to ensure that we are
correctly maintaining the portable objset MAC.

This patch zero's out the offending fields when computing the
bp MAC (as they should have been) and registers an errata for
the on-disk format bug. We detect the errata by adding a
"version" field to newly created DSL Crypto Keys. We allow
datasets without a version (version 0) to only be mounted for
read so that they can easily be migrated. We also now include
dn_maxblkid in raw send streams to ensure the MAC can be
maintained correctly.

Note that this fix has not yet been finalized and should not be used until it is tested, reviewed, and merged unless you are ok with losing your data.

Signed-off-by: Tom Caputi tcaputi@datto.com

How Has This Been Tested?

I have added a new test for raw sends that essentially stresses as many edge cases as I could think of. In addition, I have manually tested that the recovery process laid out in zfsonlinux/zfsonlinux.github.com#35 works as advertised, and that both old and new datasets function predictably.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

case ZPOOL_ERRATA_ZOL_6845_ENCRYPTION:
(void) printf(gettext("action: To correct the issue "
"revert to an earlier version and backup\n\tand "
"destroy any existing encrypted datasets before "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think existing is redundant (same in show_import()).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, see show_import comment.

"corrected. Revert to an eariler"
"version and\n\tbackup and destroy any "
"existing encrypted datasets\n\tbefore "
"updating.\n"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem consistent with the PR description - can these old encrypted datasets be mounted readonly? If so then it shouldn't be necessary to revert to an earlier version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message needs to up updated, version zero encrypted datasets can be mounted read-only. How about something like this which is inline with the information posted about the errata.

action: The pool can be imported but due to an on-disk incompatibility existing
        encrypted datasets can only be mounted read-only.  After import affected
        datasets can be be converted to the new version by using 'zfs send' and
        'zfs receive'.

@@ -219,7 +219,8 @@ typedef struct dmu_replay_record {
uint8_t drr_flags;
uint32_t drr_raw_bonuslen;
uint64_t drr_toguid;
/* only nonzero for raw streams */
/* only (possibly) nonzero for raw streams */
uint64_t drr_maxblkid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're OK with moving the location of the following fields because we're breaking send/recv compatibility anyway? How is the send stream incompatibility detected on the receiving side? E.g. new send stream feature flag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcaputi can you comment on this. Since this send stream format never was part of any tagged release we have some flexibility here about about changing it.

case ZPOOL_ERRATA_ZOL_6845_ENCRYPTION:
(void) printf(gettext(" action: Existing "
"encrypted datasets contain an on-disk "
"incompaitibility which\n\tneeds to be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/incompaitibility/incompatibility,/

ASSERT3U(version, ==, ZIO_CRYPT_KEY_CURRENT_VERSION);

/*
* The hole_birth feature might set these fields even if this is bp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/this is bp/this bp/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcaputi please fix.

@tcaputi tcaputi force-pushed the crypt_disk_fmt2 branch 4 times, most recently from 426a446 to 64148a5 Compare December 3, 2017 17:38
@tcaputi tcaputi changed the title Fix for #6845 Encryption and Raw Send Stability Improvements Dec 4, 2017
"corrected. Revert to an eariler"
"version and\n\tbackup and destroy any "
"existing encrypted datasets\n\tbefore "
"updating.\n"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message needs to up updated, version zero encrypted datasets can be mounted read-only. How about something like this which is inline with the information posted about the errata.

action: The pool can be imported but due to an on-disk incompatibility existing
        encrypted datasets can only be mounted read-only.  After import affected
        datasets can be be converted to the new version by using 'zfs send' and
        'zfs receive'.

case ZPOOL_ERRATA_ZOL_6845_ENCRYPTION:
(void) printf(gettext("action: To correct the issue "
"revert to an earlier version and backup\n\tand "
"destroy any existing encrypted datasets before "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, see show_import comment.

@@ -217,6 +217,7 @@ boolean_t dmu_objset_userobjused_enabled(objset_t *os);
boolean_t dmu_objset_userobjspace_upgradable(objset_t *os);
void dmu_objset_userobjspace_upgrade(objset_t *os);
boolean_t dmu_objset_userobjspace_present(objset_t *os);
boolean_t dmu_objset_bad_encryption_version(objset_t *os);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than dmu_objset_bad_encryption_version(), which I admit is descriptive, let's make the generic interface return the version number even if we don't expect to have to bump this version again. A convenience wrapper function can be added to check if that version is compatible, or not, and include an explanation why. Calling it bad is misleading.

@@ -219,7 +219,8 @@ typedef struct dmu_replay_record {
uint8_t drr_flags;
uint32_t drr_raw_bonuslen;
uint64_t drr_toguid;
/* only nonzero for raw streams */
/* only (possibly) nonzero for raw streams */
uint64_t drr_maxblkid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcaputi can you comment on this. Since this send stream format never was part of any tagged release we have some flexibility here about about changing it.

module/zfs/dmu.c Outdated
dnode_new_blkid(dn, maxblkid, tx, B_FALSE);
rw_exit(&dn->dn_struct_rwlock);
dnode_rele(dn, FTAG);
return (err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: functionally it doesn't matter but it'd be nice to explicitly return (0) here, the other newly added crypto functions written in the same style arguably should do the same as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from dmu_object_set_nlevels() and dmu_object_set_blocksize(). Do you want me to change these too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please do. It looks like you got a little unlucky and happened to pick a non-typical example function. The code isn't as consistent as one might like about this.

@@ -663,6 +663,9 @@ dmu_objset_own_impl(dsl_dataset_t *ds, dmu_objset_type_t type,
return (SET_ERROR(EINVAL));
} else if (!readonly && dsl_dataset_is_snapshot(ds)) {
return (SET_ERROR(EROFS));
} else if (!readonly && decrypt &&
dsl_dir_bad_encryption_version(ds->ds_dir)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what do you think about using the dmu_objset_* version of the function here since we've just gotten the objset_t with dmu_objset_from_ds().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I feel about that. Its easy enough to add a wrapper, but the same could be said for many other things, such as the call to dsl_dataset_is_snapshot() 3 lines above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I don't feel strongly about this I'm happy with whatever you prefer.

@@ -1952,6 +1970,12 @@ dsl_crypto_recv_key_check(void *arg, dmu_tx_t *tx)
goto error;
}

ret = nvlist_lookup_uint64(nvl, DSL_CRYPTO_KEY_VERSION, &version);
if (ret != 0 || version != ZIO_CRYPT_KEY_CURRENT_VERSION) {
/* don't receive old on-disk formats */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an explanation of why this old format can't be received. Referencing the errata would be a reasonable way to handle this.

@@ -2254,6 +2291,14 @@ dsl_crypto_populate_key_nvlist(dsl_dataset_t *ds, nvlist_t **nvl_out)
if (ret != 0)
goto error;

ret = zap_lookup(mos, dckobj, DSL_CRYPTO_KEY_VERSION, 8, 1, &version);
if (ret != 0 || version != ZIO_CRYPT_KEY_CURRENT_VERSION) {
/* We don't support raw sends of legacy on-disk formats */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add expand on why this isn't supported. I'd also pull the block comment up above the zap_lookup() call.

ASSERT3U(version, ==, ZIO_CRYPT_KEY_CURRENT_VERSION);

/*
* The hole_birth feature might set these fields even if this is bp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcaputi please fix.

@@ -644,7 +644,8 @@ tests = ['rsend_001_pos', 'rsend_002_pos', 'rsend_003_pos', 'rsend_004_pos',
'send-c_lz4_disabled', 'send-c_recv_lz4_disabled',
'send-c_mixed_compression', 'send-c_stream_size_estimate', 'send-cD',
'send-c_embedded_blocks', 'send-c_resume', 'send-cpL_varied_recsize',
'send-c_recv_dedup', 'send_encrypted_heirarchy', 'send_freeobjects']
'send-c_recv_dedup', 'send_encrypted_files', 'send_encrypted_heirarchy',
'send_freeobjects']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case needs to be added to check for this errata. You can create a tiny encrypted version 0 pool and include it with the ZTS for testing purposes. The test case should.

  • Verify version=0 filesystems be mounted read-only but not read-write.
  • Verify version=0 volumes can opened read-only but not read-write.
  • Verify the recommended zfs send | zfs recv conversion process works as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this. Would I just put the pool in the zpool_import test dir?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a similar thing is already done in the zpool import tests.

@tcaputi
Copy link
Contributor Author

tcaputi commented Dec 5, 2017

@behlendorf : github won't let me comment on some of your notes.....

@tcaputi please fix.

What exactly is wrong here?

@tcaputi can you comment on this. Since this send stream format never was part of any tagged release we have some flexibility here about about changing it.

The short version is we needed to send the maxblkid before but we weren't. In testing this didnt matter, since we just created new files with new data and sent them so the maxblkid was always the same on both sides. The problem comes when you create a large file, truncate it, and then send it.

@tcaputi tcaputi force-pushed the crypt_disk_fmt2 branch 2 times, most recently from 635a1ce to 6b58637 Compare December 6, 2017 01:14
@codecov
Copy link

codecov bot commented Dec 7, 2017

Codecov Report

Merging #6864 into master will increase coverage by 0.1%.
The diff coverage is 81.48%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #6864     +/-   ##
=========================================
+ Coverage   75.26%   75.37%   +0.1%     
=========================================
  Files         296      296             
  Lines       95453    95332    -121     
=========================================
+ Hits        71845    71853      +8     
+ Misses      23608    23479    -129
Flag Coverage Δ
#kernel 74.63% <82.4%> (+0.04%) ⬆️
#user 67.76% <57.87%> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1ce23dc...6b58637. Read the comment docs.

@qinq-net
Copy link

qinq-net commented Dec 11, 2017

So will I be able to send my currently encrypted datasets from the master branch today to these improvements here, or will I have to sync files in userspace then? I currently have datasets to be my root and /home filesystems and attempts to send them nearly always creates some targets that will return internal error: Invalid exchange when tried to be mounted. Will I be able to use send to migrate from old datasets and how should I perform such migration?

@tcaputi
Copy link
Contributor Author

tcaputi commented Dec 11, 2017

@qinq-net
Unfortunately there were some issues with the on-disk format of encrypted datasets which caused several errors (such as the ones you have seen). This patch will allow you to read these datasets and back them up with a non-raw zfs send. If you zfs receive them into an encrypted dataset, this will re-encrypt the data using the new format which (hopefully) shouldn't have any further issues. I would not recommend doing this until the patch has been merged at least (we are still working on one last bug at the moment).

@prometheanfire
Copy link
Contributor

I'm going to be testing a fresh pool from this patchset soonish

@tcaputi
Copy link
Contributor Author

tcaputi commented Dec 16, 2017

@prometheanfire We are aware of one more issue with raw sends which we are working on (hence why this hasn't been merged yet). This should manifest as the zfs recv command failing with a strange message about the stream being invalid if it happens. Once that is solved we will also review the on-disk format change on last time to ensure we are happy with it. As of now, we are NOT committing to this change just yet, so use the patch at your own risk.

@prometheanfire
Copy link
Contributor

ya, I'm aware that ymmv and all that, I can survive for a little bit without a send (or recover via other methods).

@@ -187,6 +187,12 @@
(MIN(zfs_key_max_salt_uses, ZFS_KEY_MAX_SALT_USES_DEFAULT))
unsigned long zfs_key_max_salt_uses = ZFS_KEY_MAX_SALT_USES_DEFAULT;

typedef struct blkptr_auth_buf {
uint64_t bab_prop; /* blk_prop - portable mask */
uint8_t bab_mac[ZIO_DATA_MAC_LEN]; /* MAC from blk_cksum */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line had a whitespace error (clrf error) ^M

@prometheanfire
Copy link
Contributor

prometheanfire commented Dec 17, 2017

The system I'm building and testing on may have problems booting since I'm testing with passphrase encryption. The only way I know of to load the key wrapped by a passphrase will also mount all the datasets, which is generally not desired in early boot. Is there a way to load the key via passphrase without mounting the datasets?

edit: looks like I can zpool import -N -R /chroot dataset && zfs load-key -a or the like, seems like an easy change to make if needed.

@tcaputi
Copy link
Contributor Author

tcaputi commented Dec 17, 2017

zfs load-key should do exactly what you're asking for

@tcaputi
Copy link
Contributor Author

tcaputi commented Dec 17, 2017

There is also zpool import -l

@prometheanfire
Copy link
Contributor

ya, figured that out. at the moment I have to manually load the key in dracut and recover to move on, but I do have a system booted at least.

zfs load-key -a
systemctl start sysroot.mount
systemctl restart initrd-switch-root.service

Then I'm good.

As far as actually using the patch, working so far...

@prometheanfire
Copy link
Contributor

Not sure if it's related, but on shutdown I get the following. I only have a picture, no raw text unfortunately.

https://photos.app.goo.gl/YxI7YK81eWm5reRz2

@sjau
Copy link

sjau commented Feb 2, 2018

@tcaputi

Well couldn't try raw sending yet since I only have my homeserver using this patch as of now but I'll apply it to my notebook on the weekend.

@behlendorf behlendorf closed this in ae76f45 Feb 2, 2018
behlendorf pushed a commit that referenced this pull request Feb 2, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6864
Closes #6916
behlendorf pushed a commit that referenced this pull request Feb 2, 2018
Currently, when a raw zfs send file includes a DRR_OBJECT record
that would decrease the number of levels of an existing object,
the object is reallocated with dmu_object_reclaim() which
creates the new dnode using the old object's nlevels. For non-raw
sends this doesn't really matter, but raw sends require that
nlevels on the receive side match that of the send side so that
the checksum-of-MAC tree can be properly maintained. This patch
corrects the issue by freeing the object completely before
allocating it again in this case.

This patch also corrects several issues with dnode_hold_impl()
and related functions that prevented dnodes (particularly
multi-slot dnodes) from being reallocated properly due to
the fact that existing dnodes were not being fully cleaned up
when they were freed.

This patch adds a test to make sure that zfs recv functions
properly with incremental streams containing dnodes of different
sizes.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6821
Closes #6864
behlendorf pushed a commit that referenced this pull request Feb 2, 2018
Currently, os_next_write_raw is a single boolean used for determining
whether or not the next call to dmu_objset_sync() should write out
the objset_phys_t as a raw buffer. Since the boolean is not associated
with a txg, the work simply happens during the next txg, which is not
necessarily the correct one. In the current implementation this issue
was misdiagnosed, resulting in a small hack in dmu_objset_sync() which
seemed to resolve the problem.

This patch changes os_next_write_raw to be an array of booleans, one
for each txg in TXG_OFF and removes the hack.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6864
@behlendorf
Copy link
Contributor

Thank you for everyone's help on this! This PR is ready to go and has been merged.

@qinq-net
Copy link

qinq-net commented Feb 4, 2018

@tcaputi

I have noticed this patch merged and have applied to my (testing) production system. I have smoothly upgraded my datasets. After upgrading I tried raw sends. Full send looks fine and no longer suffers from kernel oops, but incremental raw sends sometimes still fail to be mounted for Invalid exchange, especially for large datasets. I have two datasets fail to be mounted after incremental raw send, both about 22 GiB.

By the way, I sent those datasets using something like zfs send -vwR rpool@snapshot | zfs recv -u tank/BACKUP/rpool. Sending was finished smoothly.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 4, 2018

@qinq-net

Hmmmm. Do you have any steps to reproduce this? Or sample send files that cause the issue (that you don't mind sharing with me)? We had done fairly extensive testing for these kinds of issues before merging, but it's definitely possible that another case slipped by.

Can you also confirm that BOTH the send and receive side are actually running the new code? You can do this by checking ‘cat /sys/module/zfs/version‘ and checking that the code at the end matches the commit hash.

@qinq-net
Copy link

qinq-net commented Feb 6, 2018

@tcaputi
Seems to be problems of old datasets and incomplete patches. Now I have manually migrated them by userspace tools such as rsync -aAxXH and solved the problem.

@lnicola
Copy link
Contributor

lnicola commented Feb 9, 2018

@tcaputi I'm not sure this is at fault, but the might be some fallback from this PR:

#6366 (comment)
#7147

The second one might be related to #7059, which is older than this PR.

I have the encryption feature enabled, but inactive. I'm using large_dnodes, however.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 9, 2018

@lnicola I commented on each of the individual issues.

@qinq-net
Copy link

@tcaputi Unfortunately the Invalid exchange errno occurs again while trying to use incremental raw send. I actually have two main encrypted datasets, one for /home filesystem and one for / filesystem. This time /home works fine but / refuses to be mounted. I upgraded my Gentoo base system between the snapshots, which leads to a hugh amount of files replaced. Maybe we can try installing an old release of some Linux distributions in an encrypted dataset and upgrade it between snapshots to try to reproduce this issue.

@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 11, 2018

@qinq-net If you could give me some steps to reproduce the problem and confirm that you are running this patch on at least the receive side, that would be very helpful. We should probably make a new issue for this once you have that since this is a closed PR. I'll see what i can do on Monday.

Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
The on-disk format for encrypted datasets protects not only
the encrypted and authenticated blocks themselves, but also
the order and interpretation of these blocks. In order to
make this work while maintaining the ability to do raw
sends, the indirect bps maintain a secure checksum of all
the MACs in the block below it along with a few other
fields that determine how the data is interpreted.

Unfortunately, the current on-disk format erroneously
includes some fields which are not portable and thus cannot
support raw sends. It is not possible to easily work around
this issue due to a separate and much smaller bug which
causes indirect blocks for encrypted dnodes to not be
compressed, which conflicts with the previous bug. In
addition, the current code generates incompatible on-disk
formats on big endian and little endian systems due to an
issue with how block pointers are authenticated. Finally,
raw send streams do not currently include dn_maxblkid when
sending both the metadnode and normal dnodes which are
needed in order to ensure that we are correctly maintaining
the portable objset MAC.

This patch zero's out the offending fields when computing
the bp MAC and ensures that these MACs are always
calculated in little endian order (regardless of the host
system's byte order). This patch also registers an errata
for the old on-disk format, which we detect by adding a
"version" field to newly created DSL Crypto Keys. We allow
datasets without a version (version 0) to only be mounted
for read so that they can easily be migrated. We also now
include dn_maxblkid in raw send streams to ensure the MAC
can be maintained correctly.

This patch also contains minor bug fixes and cleanups.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#6845
Closes openzfs#6864
Closes openzfs#7052
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#6864
Closes openzfs#6916
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
Currently, when a raw zfs send file includes a DRR_OBJECT record
that would decrease the number of levels of an existing object,
the object is reallocated with dmu_object_reclaim() which
creates the new dnode using the old object's nlevels. For non-raw
sends this doesn't really matter, but raw sends require that
nlevels on the receive side match that of the send side so that
the checksum-of-MAC tree can be properly maintained. This patch
corrects the issue by freeing the object completely before
allocating it again in this case.

This patch also corrects several issues with dnode_hold_impl()
and related functions that prevented dnodes (particularly
multi-slot dnodes) from being reallocated properly due to
the fact that existing dnodes were not being fully cleaned up
when they were freed.

This patch adds a test to make sure that zfs recv functions
properly with incremental streams containing dnodes of different
sizes.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#6821
Closes openzfs#6864
Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this pull request Feb 13, 2018
Currently, os_next_write_raw is a single boolean used for determining
whether or not the next call to dmu_objset_sync() should write out
the objset_phys_t as a raw buffer. Since the boolean is not associated
with a txg, the work simply happens during the next txg, which is not
necessarily the correct one. In the current implementation this issue
was misdiagnosed, resulting in a small hack in dmu_objset_sync() which
seemed to resolve the problem.

This patch changes os_next_write_raw to be an array of booleans, one
for each txg in TXG_OFF and removes the hack.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#6864
@tcaputi
Copy link
Contributor Author

tcaputi commented Feb 16, 2018

@qinq-net I think I have a fix for your problems. I will be making a PR soon.

dweeezil added a commit to dweeezil/zfs that referenced this pull request Aug 27, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Closes openzfs#6821
Closes openzfs#6864

Should close openzfs#6366
dweeezil added a commit to dweeezil/zfs that referenced this pull request Aug 28, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Aug 28, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Aug 28, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Aug 30, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Aug 30, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

This also includes a one-liner fix from loli10K to fix a test failure:
openzfs#7792 (comment)

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.

Add stuff
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Aug 30, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

This also includes a one-liner fix from loli10K to fix a test failure:
openzfs#7792 (comment)

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 5, 2018
This is a port of 047116a - Raw sends must be able to decrease nlevels,
to the zfs-0.7-stable branch.  It includes the various fixes to the
problem of receiving incremental streams which include reallocated dnodes
in which the number of dnode slots has changed but excludes the parts
which are related to raw streams.

From 047116a:

    Currently, when a raw zfs send file includes a
    DRR_OBJECT record that would decrease the number of
    levels of an existing object, the object is reallocated
    with dmu_object_reclaim() which creates the new dnode
    using the old object's nlevels. For non-raw sends this
    doesn't really matter, but raw sends require that
    nlevels on the receive side match that of the send
    side so that the checksum-of-MAC tree can be properly
    maintained. This patch corrects the issue by freeing
    the object completely before allocating it again in
    this case.

    This patch also corrects several issues with
    dnode_hold_impl() and related functions that prevented
    dnodes (particularly multi-slot dnodes) from being
    reallocated properly due to the fact that existing
    dnodes were not being fully cleaned up when they
    were freed.

    This patch adds a test to make sure that zfs recv
    functions properly with incremental streams containing
    dnodes of different sizes.

This also includes a one-liner fix from loli10K to fix a test failure:
openzfs#7792 (comment)

Authored-by: Tom Caputi <tcaputi@datto.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Ported-by: Tim Chase <tim@chase2k.com>

Closes openzfs#6821
Closes openzfs#6864

NOTE: This is the first of the port of 3 related patches patches to the
zfs-0.7-release branch of ZoL.  The other two patches should immediately
follow this one.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.