Skip to content

Commit

Permalink
Fix send/recv lost spill block
Browse files Browse the repository at this point in the history
When receiving an object in a send stream the receive_object()
function must determine if it is an existing or new object.  This
is normally straight forward since that object number will usually
not be allocated, and therefore it must be a new object.

However, when the object exists there are two possible scenarios.

1) The object may have been freed and an entirely new object
   allocated.  In which case it needs to be reallocated to free
   any attached spill block and to set the new attributes (i.e.
   block size, bonus size, etc).  Or,

2) The object's attributes, like block size, we're modified at
   the source but it is the same original object.  In which case
   only those attributes should be updated, and everything else
   preserved.

The issue is that this determination is accomplished using a set
of heuristics from the OBJECT record.  Unfortunately, these fields
aren't sufficient to always distinguish between these two cases.

The result of which is that a change in the objects block size will
result it in being reallocated.  As part of this reallocation any
spill block associated with the object will be freed.

When performing a normal send/recv this issue will most likely
manifest itself as a file with missing xattrs.  This is because
when the xattr=sa property is set the xattrs can be stored in
this lost spill block.

If this issue occurs when performing a raw send then the missing
spill block will trigger an authentication error.  This error will
prevent the receiving side for accessing the damaged dnode block.
Furthermore, if first dnode block is damaged in this way it will
make it impossible to mount the received snapshot.

This change resolves the issue by updating the sender to always
include a SPILL record for each OBJECT record with a spill block.
This allows the missing spill block to be recreated if it's freed
during receive_object().

The major advantage of this approach is that it is backwards
compatible with existing versions of 'zfs receive'.   This means
there's no need to add an incompatible feature flag which is only
understood by the latest versions.  Older versions of the software
which already know how to handle spill blocks will do the right thing.

The downside to this approach is that it can increases the size of
the stream due to the additional spill blocks.  Additionally, since
new spill blocks will be written the received snapshot will consume
more capacity.  These drawbacks can be largely mitigated by using
the large dnode feature which reduces the need for spill blocks.

Both the send_realloc_files and send_realloc_encrypted_files ZTS
test cases were updated to create xattrs in order to force spill
blocks.  As part of validating an incremental receive the contents
of all received xattrs are verified against the source snapshot.

OpenZFS-issue: https://www.illumos.org/issues/9952
FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233277

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#6224
  • Loading branch information
behlendorf committed Apr 26, 2019
1 parent b43a27f commit 0974327
Show file tree
Hide file tree
Showing 16 changed files with 344 additions and 39 deletions.
3 changes: 2 additions & 1 deletion include/sys/dmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,8 @@ int dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,
int blocksize, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *txp);
int dmu_object_reclaim_dnsize(objset_t *os, uint64_t object,
dmu_object_type_t ot, int blocksize, dmu_object_type_t bonustype,
int bonuslen, int dnodesize, dmu_tx_t *txp);
int bonuslen, int dnodesize, boolean_t keep_spill, dmu_tx_t *tx);
int dmu_object_rm_spill(objset_t *os, uint64_t object, dmu_tx_t *tx);

/*
* Free an object from this objset.
Expand Down
1 change: 1 addition & 0 deletions include/sys/dmu_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,7 @@ typedef struct dmu_sendarg {
objset_t *dsa_os;
zio_cksum_t dsa_zc;
uint64_t dsa_toguid;
uint64_t dsa_fromtxg;
int dsa_err;
dmu_pendop_t dsa_pending_op;
uint64_t dsa_featureflags;
Expand Down
7 changes: 4 additions & 3 deletions include/sys/dnode.h
Original file line number Diff line number Diff line change
Expand Up @@ -267,8 +267,8 @@ typedef struct dnode_phys {
};
} dnode_phys_t;

#define DN_SPILL_BLKPTR(dnp) (blkptr_t *)((char *)(dnp) + \
(((dnp)->dn_extra_slots + 1) << DNODE_SHIFT) - (1 << SPA_BLKPTRSHIFT))
#define DN_SPILL_BLKPTR(dnp) ((blkptr_t *)((char *)(dnp) + \
(((dnp)->dn_extra_slots + 1) << DNODE_SHIFT) - (1 << SPA_BLKPTRSHIFT)))

struct dnode {
/*
Expand Down Expand Up @@ -420,7 +420,8 @@ void dnode_sync(dnode_t *dn, dmu_tx_t *tx);
void dnode_allocate(dnode_t *dn, dmu_object_type_t ot, int blocksize, int ibs,
dmu_object_type_t bonustype, int bonuslen, int dn_slots, dmu_tx_t *tx);
void dnode_reallocate(dnode_t *dn, dmu_object_type_t ot, int blocksize,
dmu_object_type_t bonustype, int bonuslen, int dn_slots, dmu_tx_t *tx);
dmu_object_type_t bonustype, int bonuslen, int dn_slots,
boolean_t keep_spill, dmu_tx_t *tx);
void dnode_free(dnode_t *dn, dmu_tx_t *tx);
void dnode_byteswap(dnode_phys_t *dnp);
void dnode_buf_byteswap(void *buf, size_t size);
Expand Down
10 changes: 7 additions & 3 deletions include/sys/zfs_ioctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,13 @@ typedef enum drr_headertype {
/* flag #18 is reserved for a Delphix feature */
#define DMU_BACKUP_FEATURE_LARGE_BLOCKS (1 << 19)
#define DMU_BACKUP_FEATURE_RESUMING (1 << 20)
/* flag #21 is reserved for a Delphix feature */
/* flag #21 is reserved for the redacted send/receive feature */
#define DMU_BACKUP_FEATURE_COMPRESSED (1 << 22)
#define DMU_BACKUP_FEATURE_LARGE_DNODE (1 << 23)
#define DMU_BACKUP_FEATURE_RAW (1 << 24)
/* flag #25 is reserved for the ZSTD compression feature */
#define DMU_BACKUP_FEATURE_HOLDS (1 << 26)
#define DMU_BACKUP_FEATURE_SPILL_BLOCK (1 << 27)

/*
* Mask of all supported backup features
Expand All @@ -116,7 +117,8 @@ typedef enum drr_headertype {
DMU_BACKUP_FEATURE_EMBED_DATA | DMU_BACKUP_FEATURE_LZ4 | \
DMU_BACKUP_FEATURE_RESUMING | DMU_BACKUP_FEATURE_LARGE_BLOCKS | \
DMU_BACKUP_FEATURE_COMPRESSED | DMU_BACKUP_FEATURE_LARGE_DNODE | \
DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS)
DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS | \
DMU_BACKUP_FEATURE_SPILL_BLOCK)

/* Are all features in the given flag word currently supported? */
#define DMU_STREAM_SUPPORTED(x) (!((x) & ~DMU_BACKUP_FEATURE_MASK))
Expand All @@ -131,7 +133,7 @@ typedef enum dmu_send_resume_token_version {
*
* 64 56 48 40 32 24 16 8 0
* +-------+-------+-------+-------+-------+-------+-------+-------+
* | reserved | feature-flags |C|S|
* | reserved | feature-flags |C|S|
* +-------+-------+-------+-------+-------+-------+-------+-------+
*
* The low order two bits indicate the header type: SUBSTREAM (0x1)
Expand Down Expand Up @@ -167,9 +169,11 @@ typedef enum dmu_send_resume_token_version {
*/
#define DRR_CHECKSUM_DEDUP (1<<0) /* not used for DRR_SPILL blocks */
#define DRR_RAW_BYTESWAP (1<<1)
#define DRR_SPILL_BLOCK (1<<2)

#define DRR_IS_DEDUP_CAPABLE(flags) ((flags) & DRR_CHECKSUM_DEDUP)
#define DRR_IS_RAW_BYTESWAPPED(flags) ((flags) & DRR_RAW_BYTESWAP)
#define DRR_HAS_SPILL_BLOCK(flags) ((flags) & DRR_SPILL_BLOCK)

/* deal with compressed drr_write replay records */
#define DRR_WRITE_COMPRESSED(drrw) ((drrw)->drr_compressiontype != 0)
Expand Down
20 changes: 20 additions & 0 deletions man/man5/zfs-module-parameters.5
Original file line number Diff line number Diff line change
Expand Up @@ -2349,6 +2349,26 @@ must be at least twice the maximum block size in use.
Default value: \fB16,777,216\fR.
.RE

.sp
.ne 2
.na
\fBzfs_send_spill_block_bit\fR (int)
.ad
.RS 12n
Allow the DRR_SPILL_BLOCK bit to be set in the OBJECT records of non-raw
send streams. When enabled the receiving system must include support for
this feature.
.sp
When disabled additional DRR_SPILL records are included in order to recreate
a spill block if it was incorrectly removed. This may occur when certain
attributes of the object change (block size, bonus size, etc). These send
streams are backwards compatible with previous versions for the ZFS software.
.sp
This feature will be enabled by default in a future release.
.sp
Use \fB1\fR for yes and \fB0\fR for no (default).
.RE

.sp
.ne 2
.na
Expand Down
2 changes: 1 addition & 1 deletion module/zfs/dbuf.c
Original file line number Diff line number Diff line change
Expand Up @@ -2466,7 +2466,7 @@ dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx)
ASSERT(db->db_level == 0);
ASSERT3U(dbuf_is_metadata(db), ==, arc_is_metadata(buf));
ASSERT(buf != NULL);
ASSERT(arc_buf_lsize(buf) == db->db.db_size);
ASSERT3U(arc_buf_lsize(buf), ==, db->db.db_size);
ASSERT(tx->tx_txg != 0);

arc_return_buf(buf, db);
Expand Down
31 changes: 28 additions & 3 deletions module/zfs/dmu_object.c
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* Copyright 2014 HybridCluster. All rights reserved.
*/

#include <sys/dbuf.h>
#include <sys/dmu.h>
#include <sys/dmu_objset.h>
#include <sys/dmu_tx.h>
Expand Down Expand Up @@ -304,13 +305,13 @@ dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,
int blocksize, dmu_object_type_t bonustype, int bonuslen, dmu_tx_t *tx)
{
return (dmu_object_reclaim_dnsize(os, object, ot, blocksize, bonustype,
bonuslen, DNODE_MIN_SIZE, tx));
bonuslen, DNODE_MIN_SIZE, 0, tx));
}

int
dmu_object_reclaim_dnsize(objset_t *os, uint64_t object, dmu_object_type_t ot,
int blocksize, dmu_object_type_t bonustype, int bonuslen, int dnodesize,
dmu_tx_t *tx)
boolean_t keep_spill, dmu_tx_t *tx)
{
dnode_t *dn;
int dn_slots = dnodesize >> DNODE_SHIFT;
Expand All @@ -327,7 +328,30 @@ dmu_object_reclaim_dnsize(objset_t *os, uint64_t object, dmu_object_type_t ot,
if (err)
return (err);

dnode_reallocate(dn, ot, blocksize, bonustype, bonuslen, dn_slots, tx);
dnode_reallocate(dn, ot, blocksize, bonustype, bonuslen, dn_slots,
keep_spill, tx);

dnode_rele(dn, FTAG);
return (err);
}

int
dmu_object_rm_spill(objset_t *os, uint64_t object, dmu_tx_t *tx)
{
dnode_t *dn;
int err;

err = dnode_hold_impl(os, object, DNODE_MUST_BE_ALLOCATED, 0,
FTAG, &dn);
if (err)
return (err);

rw_enter(&dn->dn_struct_rwlock, RW_WRITER);
if (dn->dn_phys->dn_flags & DNODE_FLAG_SPILL_BLKPTR) {
dbuf_rm_spill(dn, tx);
dnode_rm_spill(dn, tx);
}
rw_exit(&dn->dn_struct_rwlock);

dnode_rele(dn, FTAG);
return (err);
Expand Down Expand Up @@ -489,6 +513,7 @@ EXPORT_SYMBOL(dmu_object_claim);
EXPORT_SYMBOL(dmu_object_claim_dnsize);
EXPORT_SYMBOL(dmu_object_reclaim);
EXPORT_SYMBOL(dmu_object_reclaim_dnsize);
EXPORT_SYMBOL(dmu_object_rm_spill);
EXPORT_SYMBOL(dmu_object_free);
EXPORT_SYMBOL(dmu_object_next);
EXPORT_SYMBOL(dmu_object_zapify);
Expand Down
46 changes: 38 additions & 8 deletions module/zfs/dmu_recv.c
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,10 @@ dmu_recv_begin_check(void *arg, dmu_tx_t *tx)
if (!spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_ENCRYPTION))
return (SET_ERROR(ENOTSUP));

/* raw receives require spill block allocation flag */
if (!(featureflags & DMU_BACKUP_FEATURE_SPILL_BLOCK))
return (SET_ERROR(ENOTSUP));

/* embedded data is incompatible with encryption and raw recv */
if (featureflags & DMU_BACKUP_FEATURE_EMBED_DATA)
return (SET_ERROR(EINVAL));
Expand Down Expand Up @@ -835,7 +839,8 @@ struct receive_writer_arg {
/* A map from guid to dataset to help handle dedup'd streams. */
avl_tree_t *guid_to_ds_map;
boolean_t resumable;
boolean_t raw;
boolean_t raw; /* DMU_BACKUP_FEATURE_RAW set */
boolean_t spill; /* DMU_BACKUP_FEATURE_SPILL_BLOCK set */
uint64_t last_object;
uint64_t last_offset;
uint64_t max_object; /* highest object ID referenced in stream */
Expand Down Expand Up @@ -1151,10 +1156,19 @@ receive_object(struct receive_writer_arg *rwa, struct drr_object *drro,
drro->drr_raw_bonuslen)
return (SET_ERROR(EINVAL));
} else {
if (drro->drr_flags != 0 || drro->drr_raw_bonuslen != 0 ||
drro->drr_indblkshift != 0 || drro->drr_nlevels != 0 ||
drro->drr_nblkptr != 0)
/*
* The DRR_SPILL_BLOCK flag is only valid when the
* DMU_BACKUP_FEATURE_SPILL_BLOCK feature is enabled.
*/
if (((drro->drr_flags & ~(DRR_SPILL_BLOCK))) ||
(!rwa->spill && (drro->drr_flags & DRR_SPILL_BLOCK))) {
return (SET_ERROR(EINVAL));
}

if (drro->drr_raw_bonuslen != 0 || drro->drr_nblkptr != 0 ||
drro->drr_indblkshift != 0 || drro->drr_nlevels != 0) {
return (SET_ERROR(EINVAL));
}
}

err = dmu_object_info(rwa->os, drro->drr_object, &doi);
Expand Down Expand Up @@ -1312,7 +1326,7 @@ receive_object(struct receive_writer_arg *rwa, struct drr_object *drro,
}

if (object == DMU_NEW_OBJECT) {
/* currently free, want to be allocated */
/* Currently free, wants to be allocated */
err = dmu_object_claim_dnsize(rwa->os, drro->drr_object,
drro->drr_type, drro->drr_blksz,
drro->drr_bonustype, drro->drr_bonuslen,
Expand All @@ -1321,11 +1335,19 @@ receive_object(struct receive_writer_arg *rwa, struct drr_object *drro,
drro->drr_blksz != doi.doi_data_block_size ||
drro->drr_bonustype != doi.doi_bonus_type ||
drro->drr_bonuslen != doi.doi_bonus_size) {
/* currently allocated, but with different properties */
/* Currently allocated, but with different properties */
err = dmu_object_reclaim_dnsize(rwa->os, drro->drr_object,
drro->drr_type, drro->drr_blksz,
drro->drr_bonustype, drro->drr_bonuslen,
dn_slots << DNODE_SHIFT, tx);
dn_slots << DNODE_SHIFT,
rwa->spill ? DRR_HAS_SPILL_BLOCK(drro->drr_flags) : 0, tx);
} else if (rwa->spill && (DRR_HAS_SPILL_BLOCK(drro->drr_flags) == 0)) {
/*
* Currently allocated, the existing version of this object
* may reference a spill block that is no longer allocated
* at the source and needs to be freed.
*/
err = dmu_object_rm_spill(rwa->os, drro->drr_object, tx);
}

if (err != 0) {
Expand Down Expand Up @@ -1699,9 +1721,16 @@ receive_spill(struct receive_writer_arg *rwa, struct drr_spill *drrs,
return (err);
}

if (db_spill->db_size < drrs->drr_length)
/*
* Spill blocks may both grow and shrink. When a change in size
* occurs any existing dbuf must be updated to match the logical
* size of the provided arc_buf_t.
*/
if (db_spill->db_size != drrs->drr_length) {
dmu_buf_will_fill(db_spill, tx);
VERIFY(0 == dbuf_spill_set_blksz(db_spill,
drrs->drr_length, tx));
}

if (rwa->byteswap && !arc_is_encrypted(abuf) &&
arc_get_compression(abuf) == ZIO_COMPRESS_OFF) {
Expand Down Expand Up @@ -2575,6 +2604,7 @@ dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
rwa->byteswap = drc->drc_byteswap;
rwa->resumable = drc->drc_resumable;
rwa->raw = drc->drc_raw;
rwa->spill = !!(featureflags & DMU_BACKUP_FEATURE_SPILL_BLOCK);
rwa->os->os_raw_receive = drc->drc_raw;

(void) thread_create(NULL, 0, receive_writer_thread, rwa, 0, curproc,
Expand Down
Loading

0 comments on commit 0974327

Please sign in to comment.