-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify dRAID empty sectors #12857
Verify dRAID empty sectors #12857
Conversation
e401764
to
d2a8b2a
Compare
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
d2a8b2a
to
1a659cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works well with the update to FreeBSD's abd_allocate_zero_scatter()
since that now calls bzero()
and is used when allocating the dRAID map for writes. This is also make sure that any previous FreeBSD dRAID writes that were not zero'ing out the skip sectors will do so during a scrub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12857
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12857
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12857
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #12857
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12857
Verify that all empty sectors are zero filled before using them to calculate parity. Failure to do so can result in incorrect parity columns being generated and written to disk if the contents of an empty sector are non-zero. This was possible because the checksum only protects the data portions of the buffer, not the empty sector padding. This issue has been addressed by updating raidz_parity_verify() to check that all dRAID empty sectors are zero filled. Any sectors which are non-zero will be fixed, repair IO issued, and a checksum error logged. They can then be safely used to verify the parity. This specific type of damage is unlikely to occur since it requires a disk to have silently returned bad data, for an empty sector, while performing a scrub. However, if a pool were to have been damaged in this way, scrubbing the pool with this change applied will repair both the empty sector and parity columns as long as the data checksum is valid. Checksum errors will be reported in the `zpool status` output for any repairs which are made. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#12857
This is not associated with a specific upstream commit but apparently a local diff applied as part of: commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e Merge: 3c3df3660072 17b2ae0 Author: Martin Matuska <mm@FreeBSD.org> Date: Sat Jan 22 23:05:15 2022 +0100 zfs: merge openzfs/zfs@17b2ae0b2 (master) into main Notable upstream pull request merges: openzfs#12766 Fix error propagation from lzc_send_redacted openzfs#12805 Updated the lz4 decompressor openzfs#12851 FreeBSD: Provide correct file generation number openzfs#12857 Verify dRAID empty sectors openzfs#12874 FreeBSD: Update argument types for VOP_READDIR openzfs#12896 Reduce number of arc_prune threads openzfs#12934 FreeBSD: Fix zvol_*_open() locking openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520 openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending openzfs#12985 Avoid memory allocations in the ARC eviction thread Obtained from: OpenZFS OpenZFS commit: 17b2ae0
Motivation and Context
This issue was discovered when investigating a checksum error
which was reported against the wrong leaf vdev when manually
testing corrupting a vdev by overwriting it with
dd
.Description
Verify that all empty sectors are zero filled before using them to
calculate parity. Failure to do so can result in incorrect parity
columns being generated and written to disk if the contents of an
empty sector are non-zero. This was possible because the checksum
only protects the data portions of the buffer, not the empty sector
padding.
This issue has been addressed by updating
raidz_parity_verify()
tocheck that all dRAID empty sectors are zero filled. Any sectors
which are non-zero will be fixed, repair IO issued, and a checksum
error logged. They can then be safely used to verify the parity.
This specific type of damage is unlikely to occur since it requires
a disk to have silently returned bad data, for an empty sector, while
performing a scrub. However, if a pool were to have been damaged
in this way, scrubbing the pool with this change applied will repair
both the empty sector and parity columns as long as the data columns
are intact as verify by the checksum. Checksum errors will be reported
in the
zpool status
output for any repairs which are made.How Has This Been Tested?
Manually constructed a pool, selectively damaged an empty sector,
scrubbed the pool, and verified via instrumentation that:
The following steps can be used.
It also happens to be the case the existing
zpool_events_error.ksh
test case reliably hits this issue when extended to verify dRAID pools.
The blocks are reliably laid out on disk such that we're highly likely
to overwrite an empty sector, and the test case strictly checks for an
exact number of checksum errors to be reported against specified
vdevs.
Furthermore, the existing
redundancy_draid*
tests provide additionalcoverage for these code path. I've run the "redundancy" test group 100
times and observed no failures. Between these tests case, and the manual
testing which was done, I don't think there's a need to add a new test
case.
Types of changes
Checklist:
Signed-off-by
.