Skip to content

Commit

Permalink
File incorrectly zeroed when receiving incremental stream that toggle…
Browse files Browse the repository at this point in the history
…s -L

Background:

By increasing the recordsize property above the default of 128KB, a
filesystem may have "large" blocks.  By default, a send stream of such a
filesystem does not contain large WRITE records, instead it decreases
objects' block sizes to 128KB and splits the large blocks into 128KB
blocks, allowing the large-block filesystem to be received by a system
that does not support the `large_blocks` feature.  A send stream
generated by `zfs send -L` (or `--large-block`) preserves the large
block size on the receiving system, by using large WRITE records.

When receiving an incremental send stream for a filesystem with large
blocks, if the send stream's -L flag was toggled, a bug is encountered
in which the file's contents are incorrectly zeroed out.  The contents
of any blocks that were not modified by this send stream will be lost.
"Toggled" means that the previous send used `-L`, but this incremental
does not use `-L` (-L to no-L); or that the previous send did not use
`-L`, but this incremental does use `-L` (no-L to -L).

Changes:

This commit addresses the problem with several changes to the semantics
of zfs send/receive:

1. "-L to no-L" incrementals are rejected.  If the previous send used
`-L`, but this incremental does not use `-L`, the `zfs receive` will
fail with this error message:

    incremental send stream requires -L (--large-block), to match
    previous receive.

2. "no-L to -L" incrementals are handled correctly, preserving the
smaller (128KB) block size of any already-received files that used large
blocks on the sending system but were split by `zfs send` without the
`-L` flag.

3. A new send stream format flag is added, `SWITCH_TO_LARGE_BLOCKS`.
This feature indicates that we can correctly handle "no-L to -L"
incrementals.  This flag is currently not set on any send streams.  In
the future, we intend for incremental send streams of snapshots that
have large blocks to use `-L` by default, and these streams will also
have the `SWITCH_TO_LARGE_BLOCKS` feature set. This ensures that streams
from the default use of `zfs send` won't encounter the bug mentioned
above, because they can't be received by software with the bug.

Implementation notes:

To facilitate accessing the ZPL's generation number,
`zfs_space_delta_cb()` has been renamed to `zpl_get_file_info()` and
restructured to fill in a struct with ZPL-specific info including owner
and generation.

In the "no-L to -L" case, if this is a compressed send stream (from
`zfs send -cL`), large WRITE records that are being written to small
(128KB) blocksize files need to be decompressed so that they can be
written split up into multiple blocks.  The zio pipeline will recompress
each smaller block individually.

A new test case, `send-L_toggle`, is added, which tests the "no-L to -L"
case and verifies that we get an error for the "-L to no-L" case.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #6224 
Closes #10383
  • Loading branch information
ahrens authored Jun 9, 2020
1 parent 6722be2 commit 7bcb7f0
Show file tree
Hide file tree
Showing 15 changed files with 500 additions and 165 deletions.
4 changes: 2 additions & 2 deletions cmd/zhack/zhack.c
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,8 @@ fatal(spa_t *spa, void *tag, const char *fmt, ...)

/* ARGSUSED */
static int
space_delta_cb(dmu_object_type_t bonustype, void *data,
uint64_t *userp, uint64_t *groupp, uint64_t *projectp)
space_delta_cb(dmu_object_type_t bonustype, const void *data,
zfs_file_info_t *zoi)
{
/*
* Is it a valid type of object to track?
Expand Down
13 changes: 10 additions & 3 deletions include/sys/dmu.h
Original file line number Diff line number Diff line change
Expand Up @@ -1013,10 +1013,17 @@ extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
uint64_t *idp, uint64_t *offp);

typedef int objset_used_cb_t(dmu_object_type_t bonustype,
void *bonus, uint64_t *userp, uint64_t *groupp, uint64_t *projectp);
typedef struct zfs_file_info {
uint64_t zfi_user;
uint64_t zfi_group;
uint64_t zfi_project;
uint64_t zfi_generation;
} zfs_file_info_t;

typedef int file_info_cb_t(dmu_object_type_t bonustype, const void *data,
struct zfs_file_info *zoi);
extern void dmu_objset_register_type(dmu_objset_type_t ost,
objset_used_cb_t *cb);
file_info_cb_t *cb);
extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
extern void *dmu_objset_get_user(objset_t *os);

Expand Down
2 changes: 2 additions & 0 deletions include/sys/dmu_objset.h
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,8 @@ boolean_t dmu_objset_projectquota_enabled(objset_t *os);
boolean_t dmu_objset_projectquota_present(objset_t *os);
boolean_t dmu_objset_projectquota_upgradable(objset_t *os);
void dmu_objset_id_quota_upgrade(objset_t *os);
int dmu_get_file_info(objset_t *os, dmu_object_type_t bonustype,
const void *data, zfs_file_info_t *zfi);

int dmu_fsname(const char *snapname, char *buf);

Expand Down
1 change: 1 addition & 0 deletions include/sys/fs/zfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -1336,6 +1336,7 @@ typedef enum {
ZFS_ERR_EXPORT_IN_PROGRESS,
ZFS_ERR_BOOKMARK_SOURCE_NOT_ANCESTOR,
ZFS_ERR_STREAM_TRUNCATED,
ZFS_ERR_STREAM_LARGE_BLOCK_MISMATCH,
} zfs_errno_t;

/*
Expand Down
18 changes: 17 additions & 1 deletion include/sys/zfs_ioctl.h
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,22 @@ typedef enum drr_headertype {
#define DMU_BACKUP_FEATURE_RAW (1 << 24)
/* flag #25 is reserved for the ZSTD compression feature */
#define DMU_BACKUP_FEATURE_HOLDS (1 << 26)
/*
* The SWITCH_TO_LARGE_BLOCKS feature indicates that we can receive
* incremental LARGE_BLOCKS streams (those with WRITE records of >128KB) even
* if the previous send did not use LARGE_BLOCKS, and thus its large blocks
* were split into multiple 128KB WRITE records. (See
* flush_write_batch_impl() and receive_object()). Older software that does
* not support this flag may encounter a bug when switching to large blocks,
* which causes files to incorrectly be zeroed.
*
* This flag is currently not set on any send streams. In the future, we
* intend for incremental send streams of snapshots that have large blocks to
* use LARGE_BLOCKS by default, and these streams will also have the
* SWITCH_TO_LARGE_BLOCKS feature set. This ensures that streams from the
* default use of "zfs send" won't encounter the bug mentioned above.
*/
#define DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS (1 << 27)

/*
* Mask of all supported backup features
Expand All @@ -116,7 +132,7 @@ typedef enum drr_headertype {
DMU_BACKUP_FEATURE_RESUMING | DMU_BACKUP_FEATURE_LARGE_BLOCKS | \
DMU_BACKUP_FEATURE_COMPRESSED | DMU_BACKUP_FEATURE_LARGE_DNODE | \
DMU_BACKUP_FEATURE_RAW | DMU_BACKUP_FEATURE_HOLDS | \
DMU_BACKUP_FEATURE_REDACTED)
DMU_BACKUP_FEATURE_REDACTED | DMU_BACKUP_FEATURE_SWITCH_TO_LARGE_BLOCKS)

/* Are all features in the given flag word currently supported? */
#define DMU_STREAM_SUPPORTED(x) (!((x) & ~DMU_BACKUP_FEATURE_MASK))
Expand Down
29 changes: 14 additions & 15 deletions include/sys/zfs_quota.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,23 +24,22 @@

#include <sys/dmu.h>
#include <sys/fs/zfs.h>
#include <sys/zfs_vfsops.h>

extern int zfs_space_delta_cb(dmu_object_type_t bonustype, void *data,
uint64_t *userp, uint64_t *groupp, uint64_t *projectp);
struct zfsvfs;
struct zfs_file_info_t;

extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
const char *domain, uint64_t rid, uint64_t *valuep);
extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
uint64_t *cookiep, void *vbuf, uint64_t *bufsizep);
extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
const char *domain, uint64_t rid, uint64_t quota);
extern int zpl_get_file_info(dmu_object_type_t,
const void *, struct zfs_file_info *);

extern boolean_t zfs_id_overobjquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
uint64_t id);
extern boolean_t zfs_id_overblockquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
uint64_t id);
extern boolean_t zfs_id_overquota(zfsvfs_t *zfsvfs, uint64_t usedobj,
uint64_t id);
extern int zfs_userspace_one(struct zfsvfs *, zfs_userquota_prop_t,
const char *, uint64_t, uint64_t *);
extern int zfs_userspace_many(struct zfsvfs *, zfs_userquota_prop_t,
uint64_t *, void *, uint64_t *);
extern int zfs_set_userquota(struct zfsvfs *, zfs_userquota_prop_t,
const char *, uint64_t, uint64_t);

extern boolean_t zfs_id_overobjquota(struct zfsvfs *, uint64_t, uint64_t);
extern boolean_t zfs_id_overblockquota(struct zfsvfs *, uint64_t, uint64_t);
extern boolean_t zfs_id_overquota(struct zfsvfs *, uint64_t, uint64_t);

#endif
6 changes: 6 additions & 0 deletions lib/libzfs/libzfs_sendrecv.c
Original file line number Diff line number Diff line change
Expand Up @@ -4802,6 +4802,12 @@ zfs_receive_one(libzfs_handle_t *hdl, int infd, const char *tosnap,
ioctl_err == ECKSUM);
(void) zfs_error(hdl, EZFS_BADSTREAM, errbuf);
break;
case ZFS_ERR_STREAM_LARGE_BLOCK_MISMATCH:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"incremental send stream requires -L "
"(--large-block), to match previous receive."));
(void) zfs_error(hdl, EZFS_BADSTREAM, errbuf);
break;
case ENOTSUP:
zfs_error_aux(hdl, dgettext(TEXT_DOMAIN,
"pool must be upgraded to receive this stream."));
Expand Down
2 changes: 1 addition & 1 deletion module/os/freebsd/zfs/zfs_vfsops.c
Original file line number Diff line number Diff line change
Expand Up @@ -2202,7 +2202,7 @@ zfs_init(void)
*/
zfs_vnodes_adjust();

dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info);

zfsvfs_taskq = taskq_create("zfsvfs", 1, minclsyspri, 0, 0, 0);
}
Expand Down
2 changes: 1 addition & 1 deletion module/os/linux/zfs/zfs_vfsops.c
Original file line number Diff line number Diff line change
Expand Up @@ -2131,7 +2131,7 @@ zfs_init(void)
{
zfsctl_init();
zfs_znode_init();
dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
dmu_objset_register_type(DMU_OST_ZFS, zpl_get_file_info);
register_filesystem(&zpl_fs_type);
}

Expand Down
49 changes: 28 additions & 21 deletions module/zfs/dmu_objset.c
Original file line number Diff line number Diff line change
Expand Up @@ -1728,19 +1728,29 @@ dmu_objset_is_dirty(objset_t *os, uint64_t txg)
return (!multilist_is_empty(os->os_dirty_dnodes[txg & TXG_MASK]));
}

static objset_used_cb_t *used_cbs[DMU_OST_NUMTYPES];
static file_info_cb_t *file_cbs[DMU_OST_NUMTYPES];

void
dmu_objset_register_type(dmu_objset_type_t ost, objset_used_cb_t *cb)
dmu_objset_register_type(dmu_objset_type_t ost, file_info_cb_t *cb)
{
used_cbs[ost] = cb;
file_cbs[ost] = cb;
}

int
dmu_get_file_info(objset_t *os, dmu_object_type_t bonustype, const void *data,
zfs_file_info_t *zfi)
{
file_info_cb_t *cb = file_cbs[os->os_phys->os_type];
if (cb == NULL)
return (EINVAL);
return (cb(bonustype, data, zfi));
}

boolean_t
dmu_objset_userused_enabled(objset_t *os)
{
return (spa_version(os->os_spa) >= SPA_VERSION_USERSPACE &&
used_cbs[os->os_phys->os_type] != NULL &&
file_cbs[os->os_phys->os_type] != NULL &&
DMU_USERUSED_DNODE(os) != NULL);
}

Expand All @@ -1754,7 +1764,7 @@ dmu_objset_userobjused_enabled(objset_t *os)
boolean_t
dmu_objset_projectquota_enabled(objset_t *os)
{
return (used_cbs[os->os_phys->os_type] != NULL &&
return (file_cbs[os->os_phys->os_type] != NULL &&
DMU_PROJECTUSED_DNODE(os) != NULL &&
spa_feature_is_enabled(os->os_spa, SPA_FEATURE_PROJECT_QUOTA));
}
Expand Down Expand Up @@ -2089,9 +2099,6 @@ dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx)
objset_t *os = dn->dn_objset;
void *data = NULL;
dmu_buf_impl_t *db = NULL;
uint64_t *user = NULL;
uint64_t *group = NULL;
uint64_t *project = NULL;
int flags = dn->dn_id_flags;
int error;
boolean_t have_spill = B_FALSE;
Expand Down Expand Up @@ -2145,23 +2152,23 @@ dmu_objset_userquota_get_ids(dnode_t *dn, boolean_t before, dmu_tx_t *tx)
return;
}

if (before) {
ASSERT(data);
user = &dn->dn_olduid;
group = &dn->dn_oldgid;
project = &dn->dn_oldprojid;
} else if (data) {
user = &dn->dn_newuid;
group = &dn->dn_newgid;
project = &dn->dn_newprojid;
}

/*
* Must always call the callback in case the object
* type has changed and that type isn't an object type to track
*/
error = used_cbs[os->os_phys->os_type](dn->dn_bonustype, data,
user, group, project);
zfs_file_info_t zfi;
error = file_cbs[os->os_phys->os_type](dn->dn_bonustype, data, &zfi);

if (before) {
ASSERT(data);
dn->dn_olduid = zfi.zfi_user;
dn->dn_oldgid = zfi.zfi_group;
dn->dn_oldprojid = zfi.zfi_project;
} else if (data) {
dn->dn_newuid = zfi.zfi_user;
dn->dn_newgid = zfi.zfi_group;
dn->dn_newprojid = zfi.zfi_project;
}

/*
* Preserve existing uid/gid when the callback can't determine
Expand Down
Loading

0 comments on commit 7bcb7f0

Please sign in to comment.