Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement corruption correcting recv #9372

Merged
merged 1 commit into from
Jul 28, 2022
Merged

implement corruption correcting recv #9372

merged 1 commit into from
Jul 28, 2022

Conversation

alek-p
Copy link
Contributor

@alek-p alek-p commented Sep 27, 2019

This patch implements a new type of zfs receive: corrective receive (-c). This type of recv is used to heal corrupted data when a replica of the data already exists (in the form of a sendfile for example).
Metadata can not be healed using a corrective receive.

This patch enables us to receive a send stream into an existing snapshot for the purpose of correcting data corruption.

This is the updated version of the patch in #9323

Motivation and Context

In the past in the rare cases where ZFS has experienced permanent data corruption, full recovery of the dataset(s) has not always been possible even if replicas existed.
This patch makes recovery from permanent data corruption possible.

Description

For every write record in the send stream, we read the corresponding block from disk and if that read fails with a checksum error we overwrite that block with data from the send stream.
After the data is healed we reread the block to make sure it's healed and remove the healed blocks form the corruption lists seen in zpool status.

To makes sure will have correctly matched the data in the send stream to the right dataset to heal there is a restriction that the GUID for the snapshot being received into must match the GUID in the send stream. There are likely several snapshots referring to the same potentially corrupted data so there may be many snapshots with the above condition holding that are able to heal a single block.

The other thing to point out is that we can only correct data. Specifically, we are only able to heal records of type DRR_WRITE.

To help with the review you can see my OpenZFS dev summit 2019 talk for more context on this work:
video: https://www.youtube.com/watch?v=JldbtDATrOo
slides: https://drive.google.com/file/d/1Ysc_3bJWmsJCETFNTRCzyvpseDpzjjf2/view

How Has This Been Tested?

I've been running unit testing very similar to the test that I've added to the zfs-tests

Future Work

Since DRR_SPILL record also (like DRR_WRITE) contains all of the data needed to recreate the damaged block - a future project could add support for healing of DRR_SPILL records.
The next logical extension for part two of this work is to provide a way for a corrupted pool to tell a backup system to generate a minimal send stream in such a way as to enable the corrupted pool to be healed with this generated send stream.
The interface could be something like the following, but maybe there are better suggestions?

# dumps spa err list that are part of this snapshot and the snapshot guid
zfs send -C data/fs@snap > /tmp/errlist 

# on replica system generates healing sendfile based on the errors list
zfs send -cc /tmp/errlist backup_data > /tmp/healing_sendfile

# heal our data with the minimal healing sendfile
zfs recv -c data/fs@snap < /tmp/healing_sendfile

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Sep 30, 2019
@ahrens ahrens mentioned this pull request Oct 1, 2019
12 tasks
@alek-p alek-p added the Status: Work in Progress Not yet ready for general review label Oct 2, 2019
@megari
Copy link
Contributor

megari commented Oct 7, 2019

This feature is indeed really nice to have. However, I am curious about whether it would - even theoretically - be possible to heal metadata using a corrective receive. Support for that would definitely be a killer feature.

@alek-p
Copy link
Contributor Author

alek-p commented Oct 7, 2019

This feature is indeed really nice to have. However, I am curious about whether it would - even theoretically - be possible to heal metadata using a corrective receive. Support for that would definitely be a killer feature.

I agree that it would be great to be able to heal metadata but as far as I know, there isn't enough information in the send file to do that.
We are only able to heal records of type DRR_WRITE and DRR_SPILL since those are the only ones (again afaik) that contain all of the data needed to recreate damaged blocks.

@alek-p alek-p added Component: Send/Recv "zfs send/recv" feature and removed Status: Work in Progress Not yet ready for general review labels Nov 2, 2019
@alek-p
Copy link
Contributor Author

alek-p commented Nov 5, 2019

I've fixed the re-encryption code so this is ready for review now.

@codecov
Copy link

codecov bot commented Nov 7, 2019

Codecov Report

Merging #9372 into master will decrease coverage by <1%.
The diff coverage is 70%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9372    +/-   ##
========================================
- Coverage      80%      79%   -<1%     
========================================
  Files         384      384            
  Lines      121788   122069   +281     
========================================
- Hits        96900    96897     -3     
- Misses      24888    25172   +284
Flag Coverage Δ
#kernel 80% <72%> (ø) ⬇️
#user 67% <15%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a340316...d3ec54e. Read the comment docs.

@ahrens ahrens added the Type: Feature Feature request or new feature label Nov 13, 2019
@alek-p alek-p requested review from behlendorf, ahrens, a user and tcaputi January 27, 2020 22:37
@alek-p alek-p added the Status: Code Review Needed Ready for review and testing label Jan 27, 2020
Comment on lines 5227 to 5228
"key must be loaded to do a non-raw correc"
"tive recv on an encrypted dataset."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For human readability of the source code, I'd recommend line-breaking the string after a space.

Comment on lines 5270 to 5271
"corrective receive was not able to recon"
"struct the data needed for healing."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For human readability of the source code, I'd recommend line-breaking the string after a space.

Comment on lines +188 to +612

/*
* Removes all of the recv healed errors from both on-disk error logs
*/
static void
spa_remove_healed_errors(spa_t *spa, avl_tree_t *s, avl_tree_t *l, dmu_tx_t *tx)
{
char name[NAME_MAX_LEN];
spa_error_entry_t *se;
void *cookie = NULL;

ASSERT(MUTEX_HELD(&spa->spa_errlog_lock));

while ((se = avl_destroy_nodes(&spa->spa_errlist_healed,
&cookie)) != NULL) {
remove_error_from_list(spa, s, &se->se_bookmark);
remove_error_from_list(spa, l, &se->se_bookmark);
bookmark_to_name(&se->se_bookmark, name, sizeof (name));
kmem_free(se, sizeof (spa_error_entry_t));
(void) zap_remove(spa->spa_meta_objset,
spa->spa_errlog_last, name, tx);
(void) zap_remove(spa->spa_meta_objset,
spa->spa_errlog_scrub, name, tx);
}
}

/*
* Stash away healed bookmarks to remove them from the on-disk error logs
* later in spa_remove_healed_errors().
*/
void
spa_remove_error(spa_t *spa, zbookmark_phys_t *zb)
{
char name[NAME_MAX_LEN];

bookmark_to_name(zb, name, sizeof (name));

spa_add_healed_error(spa, spa->spa_errlog_last, zb);
spa_add_healed_error(spa, spa->spa_errlog_scrub, zb);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced of the utility of this, because (AIUI) we only remove errors that were reported against the snapshot that we are doing the healing receive into. A block with a checksum error will often be referenced by multiple datasets (due to snapshots), and an error will be reported on any datasets via which we access the bad block. Typical cases are:

  • running a scrub, in which case the error will be reported against the first snapshot that references it
  • reading from a filesystem, in which case the error will be reported against the filesystem.

In either case, the dataset that we are receiving into may be a different dataset then the one that the error was reported against, in which case the "remove healed errors" logic accomplishes nothing.

Running a scrub after the healing receive (as recommended in the manpage additions) is really the only way to get an updated list of errors.

That said, with the addition of #9175, we could potentially remove all relevant error reports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove the error clean up easily if we think that's cleaner.
I figured a relatively common use case for healing recv would be trying to heal using the snapshot that a scrub has IDed as corrupted. We then take this snapshot from the remote side and use it for healing. In this scenario, I figured it would make sense to then remove the errlog entry associated with the fixed blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that does seem like a realistic use case.

Comment on lines 2047 to 2048
int buf_size = MIN(drrw->drr_logical_size, 32);
void *buf = kmem_alloc(buf_size, KM_SLEEP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would drr_logical_size be less than 32? I think it couldn't since the minimum block size is 512. Why are we reading 32 bytes? Why do we need to kmem_alloc 32 bytes, vs allocating on the stack? Why not just one byte? Do we want to check that drr_logical_size is the same as the object's block size (which I think it might not be if we are toggling the --large-block flag).

Comment on lines 2060 to 2061
* We only try to heal when dmu_read() returns a ECKSUMs.
* Other errors (even EIO) get returned to caller
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explains what is happening, which the code also makes fairly obvious. It would be helpful for the comment to explain why we want to do this. For example, EIO indicates that the device is not present/accessible, so writing to it will likely fail. And if the block is healthy, we don't want the added i/o cost, and/or we don't want to overwrite stuff unnecessarily.

return (SET_ERROR(EACCES));
}

err = zio_do_crypt_abd(B_TRUE, &dck->dck_key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] there's an extra space (2 spaces) after the first comma:

,  &

Comment on lines 2079 to 2080
return (do_corrective_recv(rwa, drrw->drr_object, abuf,
drrw->drr_logical_size, bp, blkid, drrw->drr_offset));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might simplify the error handling in do_corrective_recv() if we expected it to never consume the abuf, in which case we would do if (err == 0) dmu_return_arcbuf(abuf); here

dsl_dataset_rele_flags(ds, DS_HOLD_FLAG_DECRYPT, FTAG);
dsl_pool_config_exit(dp, FTAG);

if (err != 0 || no_crypt) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err == 0 but no_crypt != 0, I think that we are expected to consume arc_buf (per our caller's requirements). However, in practice I don't think that we can get no_crypt !=0 since we are not dealing with DMU_OT_[INTENT_LOG,DNODE]. I think we could instead ASSERT0(no_crypt), and simply return (err) below.

Comment on lines 1453 to 1431
if (BP_GET_COMPRESS(bp) != ZIO_COMPRESS_OFF) {
/* Recompress the data */
if (buf != NULL)
abd_free(abd);
size = zio_compress_data(BP_GET_COMPRESS(bp), abd, buf,
lsize);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that compression happens before encryption. Compressing the encrypted data will rarely yield any savings. Assuming I'm right, it seems that these code paths have not been tested. It would be good to add some tests to the test suite to exercise them. e.g.:

  • uncompressed stream heals compressed block
  • unencrypted stream heals encrypted block
  • stream w/different compression heals differently-compressed block
  • uncompressed (& unencrypted) stream heals compressed & encrypted block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 1462 to 1452
io = zio_rewrite(NULL, rwa->os->os_spa, 0, bp, abd,
size, NULL, NULL, ZIO_PRIORITY_SYNC_WRITE, flags, &zb);

/* compute new bp checksum value and make sure it matches the old one */
zio_checksum_compute(io, BP_GET_CHECKSUM(bp), abd, size);
if (size != BP_GET_PSIZE(bp) ||
!ZIO_CHECKSUM_EQUAL(bp_cksum, io->io_bp->blk_cksum)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be simplified by using zio_checksum_error_impl(), in which case you could verify the checksum before creation the zio, and not have to save the bp_cksum.

Copy link
Contributor Author

@alek-p alek-p Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that using this rather complicated function is simpler than stashing the BP and using the macro

@alek-p
Copy link
Contributor Author

alek-p commented Jan 29, 2020

Thanks for the review Matt! I'll start working through these comments this weekend.

Comment on lines 59 to 64
datasetexists $TESTPOOL/$TESTFS1 && \
log_must zfs destroy -r $TESTPOOL/$TESTFS1
datasetexists $TESTPOOL/$TESTFS2 && \
log_must zfs destroy -r $TESTPOOL/$TESTFS2
datasetexists $TESTPOOL/testfs3 && \
log_must zfs destroy -r $TESTPOOL/testfs3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already destroying the pool, this isn't necessary.

Comment on lines 66 to 68
for f in $ibackup $backup; do
[[ -f $f ]] && log_must rm -f $f
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just rm -f $ibackup $raw_backup $backup.

typeset snap2="$TESTPOOL/$TESTFS1@snap2"
typeset file="/$TESTPOOL/$TESTFS1/$TESTFILE0"

log_must zpool destroy $TESTPOOL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a different pool name and leave $TESTPOOL, so you don't have to worry about destroying and recreating it? Otherwise, this should be a bit more tolerant, with poolexists $TESTPOOL && destroy_pool $TESTPOOL here and in cleanup, and cleanup should recreate the pool more like it is set up in setup.ksh to keep surprises to a minimum for the next test (when added).

@alek-p
Copy link
Contributor Author

alek-p commented Apr 9, 2020

Thanks for the review Ryan, I need to expand the the testing to include spill block healing and will try to incorporate your feedback then.

@alek-p
Copy link
Contributor Author

alek-p commented May 15, 2020

I've rebased this code, it's close to ready but it still needs more work with regards to how abd/zio is handled. There are some leaks present in the current version...

@alek-p alek-p added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels May 15, 2020
@codecov-commenter
Copy link

codecov-commenter commented May 29, 2020

Codecov Report

Merging #9372 (54451af) into master (161ed82) will increase coverage by 4.41%.
The diff coverage is 69.27%.

❗ Current head 54451af differs from pull request most recent head b9fce85. Consider uploading reports for the commit b9fce85 to get more accurate results

@@            Coverage Diff             @@
##           master    #9372      +/-   ##
==========================================
+ Coverage   75.17%   79.59%   +4.41%     
==========================================
  Files         402      395       -7     
  Lines      128071   125378    -2693     
==========================================
+ Hits        96283    99789    +3506     
+ Misses      31788    25589    -6199     
Flag Coverage Δ
kernel 80.39% <69.14%> (+1.63%) ⬆️
user 64.92% <15.54%> (+17.49%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/zfs/zio.c 88.77% <ø> (+3.17%) ⬆️
module/zfs/dmu.c 86.49% <50.00%> (+2.76%) ⬆️
module/zfs/dmu_recv.c 74.01% <60.88%> (+5.34%) ⬆️
lib/libzfs/libzfs_sendrecv.c 76.66% <87.80%> (+12.33%) ⬆️
module/zfs/spa_errlog.c 91.27% <92.68%> (-2.62%) ⬇️
cmd/zfs/zfs_main.c 82.82% <100.00%> (+1.21%) ⬆️
lib/libzfs_core/libzfs_core.c 84.86% <100.00%> (+1.66%) ⬆️
module/zfs/spa.c 87.34% <100.00%> (+2.86%) ⬆️
module/zfs/zfs_ioctl.c 86.39% <100.00%> (+1.06%) ⬆️
include/os/linux/zfs/sys/trace_acl.h 33.33% <0.00%> (-33.34%) ⬇️
... and 263 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34aa0f0...b9fce85. Read the comment docs.

@tony-zfs
Copy link
Contributor

Hi @alek-p - in regards to the memory leaks, the receive_process_write_record (with rwa->heal set) is returning EAGAIN, but it does not add the receive_record_arg structure to the rwa->write_batch listhead. The kmem_free was also removed when it doesn't hit the EAGAIN cases for non corrective use cases.

I modified the writer thread to circumvent this situation in a local branch:

diff --git a/module/zfs/dmu_recv.c b/module/zfs/dmu_recv.c
index 7d29547..816632d 100644
--- a/module/zfs/dmu_recv.c
+++ b/module/zfs/dmu_recv.c
@@ -2955,9 +2955,12 @@ receive_writer_thread(void *arg)
                 * raw->write_batch), and will be used again, so we don't
                 * free it.
                 */
-               if (err != EAGAIN) {
+               if (rwa->heal) {
+                       kmem_free(rrd, sizeof (*rrd));
+               } else if (err != EAGAIN) {
                        if (rwa->err == 0)
                                rwa->err = err;
+                       kmem_free(rrd, sizeof (*rrd));
                }
        }
        kmem_free(rrd, sizeof (*rrd));

@alek-p
Copy link
Contributor Author

alek-p commented Jun 17, 2020

Hi @alek-p - in regards to the memory leaks, the receive_process_write_record (with rwa->heal set) is returning EAGAIN, but it does not add the receive_record_arg structure to the rwa->write_batch listhead. The kmem_free was also removed when it doesn't hit the EAGAIN cases for non corrective use cases.

I modified the writer thread to circumvent this situation in a local branch:

diff --git a/module/zfs/dmu_recv.c b/module/zfs/dmu_recv.c
index 7d29547..816632d 100644
--- a/module/zfs/dmu_recv.c
+++ b/module/zfs/dmu_recv.c
@@ -2955,9 +2955,12 @@ receive_writer_thread(void *arg)
                 * raw->write_batch), and will be used again, so we don't
                 * free it.
                 */
-               if (err != EAGAIN) {
+               if (rwa->heal) {
+                       kmem_free(rrd, sizeof (*rrd));
+               } else if (err != EAGAIN) {
                        if (rwa->err == 0)
                                rwa->err = err;
+                       kmem_free(rrd, sizeof (*rrd));
                }
        }
        kmem_free(rrd, sizeof (*rrd));

thanks for looking into this Tony! Afaik the only thing left now is making sure the testing is robust enough and included spill records healing.

@alek-p alek-p added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Jun 26, 2020
@alek-p
Copy link
Contributor Author

alek-p commented Jun 26, 2020

I've hard a hard time trying to get spill block generated so I did a manual test by running the added corrective recv test with the following patch applied:

diff --git a/module/zfs/sa.c b/module/zfs/sa.c
index 977e729fe..8e4a8c388 100644
--- a/module/zfs/sa.c
+++ b/module/zfs/sa.c
@@ -1016,6 +1016,7 @@ sa_setup(objset_t *os, uint64_t sa_obj, sa_attr_reg_t *reg_attrs, int count,
        sa = kmem_zalloc(sizeof (sa_os_t), KM_SLEEP);
        mutex_init(&sa->sa_lock, NULL, MUTEX_NOLOCKDEP, NULL);
        sa->sa_master_obj = sa_obj;
+       sa->sa_force_spill = B_TRUE;

        os->os_sa = sa;
        mutex_enter(&sa->sa_lock);

I think this is ready for the next round of reviews.

@alek-p alek-p removed the request for review from tcaputi June 26, 2020 17:05
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
andrewc12 pushed a commit to andrewc12/openzfs that referenced this pull request Sep 23, 2022
This type of recv is used to heal corrupted data when a replica
of the data already exists (in the form of a send file for example).
With the provided send stream, corrective receive will read from
disk blocks described by the WRITE records. When any of the reads
come back with ECKSUM we use the data from the corresponding WRITE
record to rewrite the corrupted block.

Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Paul Zuchowski <pzuchowski@datto.com>
Signed-off-by: Alek Pinchuk <apinchuk@axcient.com>
Closes openzfs#9372
@pepsinio
Copy link

Would this fix make it to release soon?

@GregorKopka
Copy link
Contributor

Would this fix make it to release soon?

@behlendorf ?

@behlendorf
Copy link
Contributor

This feature will make it in to the OpenZFS 2.2 release.

@pepsinio
Copy link

This is a huge improvement. The last missing piece for my use case with offsite backup, no RaidZ and having low throughput link which makes it difficult to recreate datasets with ease in case of errors. Quite certain i am not the only one on this boat.

@FlorianHeigl
Copy link

@pepsinio you're not the only one on that boat. I follow this topic since before the PR was opened. no WAN use case but the generally understanding that this is a major resilience feature that stabilizes many related bits and bytes.

behlendorf added a commit that referenced this pull request Jun 30, 2023
New features:
- Fully adaptive ARC eviction (#14359)
- Block cloning (#13392)
- Scrub error log (#12812, #12355)
- Linux container support (#14070, #14097, #12263)
- BLAKE3 Checksums (#12918)
- Corrective "zfs receive" (#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
New features:
- Fully adaptive ARC eviction (openzfs#14359)
- Block cloning (openzfs#13392)
- Scrub error log (openzfs#12812, openzfs#12355)
- Linux container support (openzfs#14070, openzfs#14097, openzfs#12263)
- BLAKE3 Checksums (openzfs#12918)
- Corrective "zfs receive" (openzfs#9372)

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Send/Recv "zfs send/recv" feature Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.