Improved zpool status output, list all the dataset affected by errlog. #9175

TulsiJain · 2019-08-18T01:20:09Z

Motivation and Context

This is a substantial improvement in diagnosability and recovery. Currently, determining which datasets are affected by corruption can be a heavily manual and tedious process. An automated solution would greatly improve the quality of life for support and escalation team members. This project involves changes to the zpool status command, libzfs, the ioctl interface, the error reporting logic, the ondisk format, the clone promotion logic, the filesystem deletion logic, and more.

Description

The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot that the error originally occurred in may have been deleted. To solve this issue, the proposed plan is to add the ID of the head dataset of the original snapshot in which the error was detected to the stored error report. Then any time a filesystem is deleted, we delete the errors associated with it. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. We will need to restructure the stored error reports to be primarily keyed by this head ID, add more information about the block in which the error occurred (birth time), as well as some information about the error itself. A new feature flag will be required to mark the presence of this change, as well as promotion and backwards compatibility logic.

Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then BFSing forwards through the snapshots of the clone family, terminating a branch early if the block was replaced in a given snapshot. Then we can report this information back to libzfs, and further back into the zpool status command, where it is displayed in a useful format.

How Has This Been Tested?

Added new test cases.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.

codecov · 2019-08-18T10:21:21Z

Codecov Report

Merging #9175 into master will decrease coverage by 0.12%.
The diff coverage is 75.81%.

@@            Coverage Diff             @@
##           master    #9175      +/-   ##
==========================================
- Coverage   79.31%   79.18%   -0.13%     
==========================================
  Files         400      400              
  Lines      121942   122239     +297     
==========================================
+ Hits        96716    96796      +80     
- Misses      25226    25443     +217

Flag	Coverage Δ
#kernel	`79.7% <75.73%> (-0.15%)`	⬇️
#user	`66.83% <21.9%> (-0.26%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f09fda5...92793c1. Read the comment docs.

Signed-off-by: TulsiJain <tulsi.jain@delphix.com>

codecov · 2019-08-21T07:22:56Z

Codecov Report

Merging #9175 into master will decrease coverage by 6.29%.
The diff coverage is 74.63%.

@@            Coverage Diff            @@
##           master    #9175     +/-   ##
=========================================
- Coverage   79.11%   72.82%   -6.3%     
=========================================
  Files         400      371     -29     
  Lines      122000   119727   -2273     
=========================================
- Hits        96524    87193   -9331     
- Misses      25476    32534   +7058

Flag	Coverage Δ
#kernel	`72.4% <70.41%> (-7.32%)`	⬇️
#user	`63.68% <80.95%> (-3.3%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 325d288...fd550d9. Read the comment docs.

allanjude · 2019-09-10T01:00:21Z

man/man5/zpool-features.5

+This feature enables the upgraded version of errlog. This required on disk
+error format change. Now, errorlog of each head dataset is stored
+seprately in zap object and primarily keyed by this head Id.
+By this feature every dataset affected by an error block is listed under


I think this sounds better as:
"With this feature enabled, every dataset affected by an error block is listed in the output of
\fBzpool status\fR.

allanjude · 2019-09-10T01:00:48Z

man/man5/zpool-features.5

+By this feature every dataset affected by an error block is listed under
+\fBzpool status\fR command.
+
+The feature can only be set at the time of pool creation.


This feature can only be enabled at pool creation time.

allanjude · 2019-09-10T01:02:28Z

module/zcommon/zfeature_common.c

+	{
+	zfeature_register(SPA_FEATURE_HEAD_ERRLOG,
+	    "com.delphix:head_errlog", "head_errlog",
+	    "Improved zpool status command output.",


The feature is actually the logging of the errors to the ZAP, so maybe:
"Introduce an on-disk block error log"

gamanakis · 2021-12-01T11:32:09Z

Started a new effort to merge this in #12812. I will incorporate the changes proposed here in the man pages.

Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of openzfs#9175. Signed-off-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>

Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of openzfs#9175. Rebase required fixing the tests, updating the ABI of libzfs, and updating the man pages. Signed-off-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>

Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of openzfs#9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Co-authored-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com>

* Improve the inline descriptions of the ARC module parameters These are displayed as the descriptions of the sysctl's on FreeBSD Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Allan Jude <allan@klarasystems.com> Closes openzfs#13334 * linux: module: weld all but spl.ko into zfs.ko Originally it was thought it would be useful to split up the kmods by functionality. This would allow external consumers to only load what was needed. However, in practice we've never had a case where this functionality would be needed, and conversely managing multiple kmods can be awkward. Therefore, this change merges all but the spl.ko kmod in to a single zfs.ko kmod. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13274 * scripts: zfs.sh: remove cat Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13274 * scripts: zfs.sh: make usage make sense We don't pass the arguments as arguments Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13274 * scripts: zfs.sh: unload zfs with dependencies Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13274 * linux: module: uninstall legacy modules on (un)installation This can be reverted once we're sure nobody's using them anymore (post-3.0 release?) Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13274 * zpool_history_unpack: return correct errno on nvlist_unpack failure Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: WHR <msl0000023508@gmail.com> Closes openzfs#13321 * Document zfs inherit -S's interaction with noninheritable properties Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#11894 Closes openzfs#13335 * rpm -> deb doesn't fail when optional packages are missing Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes openzfs#13331 Closes openzfs#13336 * man: ... -> … again zfs-program.8 is left, but that's literal Lua syntax Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13255 * FreeBSD: Fix translation from ABD to physical pages In hypothetical case of non-linear ABD with single segment, multiple to page size but not aligned to it, vdev_geom_fill_unmap_cb() could fill one page less into bio_ma array. I am not sure it is exploitable, but better to be safe than sorry. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reported-by: Mark Johnston <markj@FreeBSD.org> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Closes openzfs#13345 * Corrected oversight in ZERO_RANGE behavior It turns out, no, in fact, ZERO_RANGE and PUNCH_HOLE do have differing semantics in some ways - in particular, one requires KEEP_SIZE, and the other does not. Also added a zero-range test to catch this, corrected a flaw that made the punch-hole test succeed vacuously, and a typo in file_write. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes openzfs#13329 Closes openzfs#13338 * contrib: dracut: parse-zfs: drop initqueue-finished for i/f The switch was released in dracut 009 in March 2011, we can safely get rid of the compatibility hook Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: parse-zfs: stop pretending we support FILESYSTEM= It was added in the original ae26d04 ("Add dracut support") commit in 2011, and was then broken a bit later with the advent of dracut-zfs-generator, or maybe earlier as part of other churn Either way, it's broken, and has been in 2.0+ as well, and no-one complained. Stop pretending we support it at all Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib; dracut: centralise root= parsing, actually support root=s So far, everything parsed root= manually, which meant that while zfs-parse.sh was updated, and supposedly supported + -> ' ' conversion, it meant nothing Instead, centralise parsing, and allow: root= root=zfs root=zfs: root=zfs:AUTO root=ZFS=data/set root=zfs:data/set root=zfs:ZFS=data/set (as a side-effect; allowed but undocumented) rootfstype=zfs AND root=data/set <=> root=data/set rootfstype=zfs AND root= <=> root=zfs:AUTO So rootfstype=zfs /also/ behaves as expected, and + decoding works Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib; dracut: flatten zfs-load-key, simplify zfs-env-bootfs Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: zfs-lib: simplify ask_for_password The only user is mount-zfs.sh (non-systemd systems), so reduce it to what it needs Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: zfs-lib: remove find_bootfs Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: inline single-use import_pool, move single-use ask_for_password Also don't set ROOTFS_MOUNTED; the final mention was removed in dracut 011 from July 2011 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: don't require essentials to be under the same encroot Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: zfs-{rollback,snapshot}-bootfs: order after key loading This fixes at least one race I got with an encrypted root Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: zfs-needshutdown: don't list Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * Add dracut.zfs.7 Thorough documentation with a dracut.bootup(7)-style flowchart, dracut.cmdline(7)-style cmdline listing, and per-file docs like the old README Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * contrib: dracut: remove getargbool polyfill It was originally released in dracut 008 in February 2011; we can probably drop it now Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13291 * Strengthen Linux kernel capabilities detection - Add `CONFIG_BLOCK` Linux config requirement to `ZFS_AC_KERNEL_CONFIG_DEFINED`. OpenZFS won't compile without that block device support due to large amount of functional dependencies on it. - Remove dependency on `groups_alloc()` in `ZFS_AC_KERNEL_SRC_GROUP_INFO_GID` to circumvent the missing stub in Linux 4.X kernel headers. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes openzfs#13351 * scripts: zfs.sh: explicitly ignore unloaded modules when unloading Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13356 * scripts: zfs.sh: explicitly unload all modules via rmmod modprobe -r only works for depmodded modules, but this also means we have to re-iterate legacy modules, and in the right order Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13356 * linux: module: zfs: sysfs: constify types and attrs Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13357 * Linux 5.18 compat: kobj_type.default_attrs replaced with default_groups Upstream-commit: cdb4f26a63c391317e335e6e683a614358e70aeb ("kobject: kobj_type: remove default_attrs") Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13357 * zvol_wait: Ignore locked zvols "When an encrypted zvol is locked the zfs-volume-wait service does not start. The /sbin/zvol_wait should not wait for links when the volume has property keystatus=unavailable." -- https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/1888405 Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Damian Szuberski <szuberskidamian@gmail.com> Thanks: James Dingwall <james-launchpad@dingwall.me.uk> Signed-off-by: Richard Laager <rlaager@wiktel.com> Closes openzfs#10662 * man: zfs-send.8: fix -X synopses and description Also clean up the horrendously verbose -X handling in zfs_main() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13352 * tests: cli_user: zfs_001_neg: print the problematic lines Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13352 * Improve zpool status output, list all affected datasets Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of openzfs#9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#9175 Closes openzfs#12812 * Improve log spacemap load time Previous flushing algorithm limited only total number of log blocks to the minimum of 256K and 4x number of metaslabs in the pool. As result, system with 1500 disks with 1000 metaslabs each, touching several new metaslabs each TXG could grow spacemap log to huge size without much benefits. We've observed one of such systems importing pool for about 45 minutes. This patch improves the situation from five sides: - By limiting maximum period for each metaslab to be flushed to 1000 TXGs, that effectively limits maximum number of per-TXG spacemap logs to load to the same number. - By making flushing more smooth via accounting number of metaslabs that were touched after the last flush and actually need another flush, not just ms_unflushed_txg bump. - By applying zfs_unflushed_log_block_pct to the number of metaslabs that were touched after the last flush, not all metaslabs in the pool. - By aggressively prefetching per-TXG spacemap logs up to 16 TXGs in advance, making log spacemap load process for wide HDD pool CPU-bound, accelerating it by many times. - By reducing zfs_unflushed_log_block_max from 256K to 128K, reducing single-threaded by nature log processing time from ~10 to ~5 minutes. As further optimization we could skip bumping ms_unflushed_txg for metaslabs not touched since the last flush, but that would be an incompatible change, requiring new pool feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes openzfs#12789 * autoconf: Pretend `CONFIG_MODULES` is always on - Unconditionally inject `CONFIG_MODULES` make variable and `#define CONFIG_MODULES` to Kbuild in `ZFS_LINUX_COMPILE` autoconf function to emulate loadable kernel modules support. This allows OpenZFS to perform Linux checks despite `CONFIG_MODULES=n` in the actual Linux config. - Add `ZFS_AC_KERNEL_CONFIG_MODULES` check which encompasses the logic from `ZFS_AC_KERNEL_TEST_MODULE` with additional diagnostic messages to the user - Removed `ZFS_AC_KERNEL_TEST_MODULE` as it merely duplicates every check in `ZFS_AC_KERNEL_CONFIG_DEFINED` - Moved `ZFS_AC_MODULE_SYMVERS` after `ZFS_AC_KERNEL_CONFIG_DEFINED` so the user has a chance to see the proper diagnostic from the steps before. A workaround for Linux's ``` commit 3e3005df73b535cb849cf4ec8075d6aa3c460f68 Author: Masahiro Yamada <masahiroy@kernel.org> Date: Wed Mar 31 22:38:03 2021 +0900 kbuild: unify modules(_install) for in-tree and external modules If you attempt to build or install modules ('make modules(_install)' with CONFIG_MODULES disabled, you will get a clear error message, but nothing for external module builds. Factor out the modules and modules_install rules into the common part, so you will get the same error message when you try to build external modules with CONFIG_MODULES=n. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> ``` Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes openzfs#10832 Closes openzfs#13361 * PPC get_user workaround Linux 5.12 PPC 5.12 get_user() and __copy_from_user_inatomic() inline helpers very indirectly include a reference to the GPL'd array mmu_feature_keys[] and fails to build. Workaround this by using copy_from_user() and throwing EFAULT for any calls to __copy_from_user_inatomic(). This is a workaround until a fix for Linux commit 7613f5a66becfd0e43a0f34de8518695888f5458 "powerpc/64s/kuap: Use mmu_has_feature()" is fully addressed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Authored-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: szubersk <szuberskidamian@gmail.com> Closes openzfs#11958 Closes openzfs#12590 Closes openzfs#13367 * zfs: holds: general cleanup Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13373 * zfs: holds: dequadratify Before: 15 0m0.177s 30 0m0.653s 45 0m1.289s 60 0m2.129s 75 0m3.264s 90 0m4.397s 100 0m5.996s 117 0m8.552s After: 30 0m0.053s 117 0m0.125s Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Closes openzfs#13372 Closes openzfs#13373 * Linux 5.18 compat: replace __set_page_dirty_nobuffers Replace __set_page_dirty_nobuffers with filemap_dirty_folio. Upstream-commit: 6b1f86f8e9c7f9de7ca1cb987b2cf25e99b1ae3a ("Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache ") Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Authored-by: Satadru Pramanik <satadru@gmail.com> Signed-off-by: Satadru Pramanik <satadru@gmail.com> Closes openzfs#13325 Closes openzfs#13380 * Fix O_APPEND for Linux 3.15 and older kernels When using a Linux kernel which predates the iov_iter interface the O_APPEND flag should be applied in zpl_aio_write() via the call to generic_write_checks(). The updated pos variable was incorrectly ignored resulting in the current offset being used. This issue should only realistically impact the RHEL/CentOS 7.x kernels which are based on Linux 3.10. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#13370 Closes openzfs#13377 Co-authored-by: Allan Jude <allan@klarasystems.com> Co-authored-by: наб <nabijaczleweli@nabijaczleweli.xyz> Co-authored-by: Low-power <msl0000023508@gmail.com> Co-authored-by: Damian Szuberski <szuberskidamian@gmail.com> Co-authored-by: Alexander Motin <mav@FreeBSD.org> Co-authored-by: Rich Ercolani <214141+rincebrain@users.noreply.github.com> Co-authored-by: Richard Laager <rlaager@wiktel.com> Co-authored-by: George Amanakis <gamanakis@gmail.com> Co-authored-by: Satadru Pramanik <satadru@gmail.com> Co-authored-by: Brian Behlendorf <behlendorf1@llnl.gov>

Currently, determining which datasets are affected by corruption is a manual process. The primary difficulty in reporting the list of affected snapshots is that since the error was initially found, the snapshot where the error originally occurred in, may have been deleted. To solve this issue, we add the ID of the head dataset of the original snapshot which the error was detected in, to the stored error report. Then any time a filesystem is deleted, the errors associated with it are deleted as well. Any time a clone promote occurs, we modify reports associated with the original head to refer to the new head. The stored error reports are identified by this head ID, the birth time of the block which the error occurred in, as well as some information about the error itself are also stored. Once this information is stored, we can find the set of datasets affected by an error by walking back the list of snapshots in the given head until we find one with the appropriate birth txg, and then traverse through the snapshots of the clone family, terminating a branch if the block was replaced in a given snapshot. Then we report this information back to libzfs, and to the zpool status command, where it is displayed as follows: pool: test state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A scan: scrub repaired 0B in 00:00:00 with 800 errors on Fri Dec 3 08:27:57 2021 config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 sdb ONLINE 0 0 1.58K errors: Permanent errors have been detected in the following files: test@1:/test.0.0 /test/test.0.0 /test/1clone/test.0.0 A new feature flag is introduced to mark the presence of this change, as well as promotion and backwards compatibility logic. This is an updated version of openzfs#9175. Rebase required fixing the tests, updating the ABI of libzfs, updating the man pages, fixing bugs, fixing the error returns, and updating the old on-disk error logs to the new format when activating the feature. Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Co-authored-by: TulsiJain <tulsi.jain@delphix.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes openzfs#9175 Closes openzfs#12812

TulsiJain force-pushed the head_errlog branch from d33a1bc to 92793c1 Compare August 18, 2019 01:56

TulsiJain force-pushed the head_errlog branch 2 times, most recently from cd2d875 to 3a926ff Compare August 19, 2019 23:05

Improved zpool status output, list all affected datasets

fd550d9

Signed-off-by: TulsiJain <tulsi.jain@delphix.com>

TulsiJain force-pushed the head_errlog branch from 3a926ff to fd550d9 Compare August 20, 2019 20:42

behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 30, 2019

allanjude reviewed Sep 10, 2019

View reviewed changes

behlendorf added Status: Design Review Needed Architecture or design is under discussion Status: Inactive Not being actively updated Type: Feature Feature request or new feature and removed Status: Code Review Needed Ready for review and testing labels Jan 10, 2020

ahrens mentioned this pull request Jan 28, 2020

implement corruption correcting recv #9372

Merged

12 tasks

ahrens assigned mmaybee Jun 10, 2021

ahrens mentioned this pull request Nov 9, 2021

Teach zpool scrub to scrub only blocks in error log #12355

Closed

13 tasks

gamanakis mentioned this pull request Dec 1, 2021

Improved zpool status output, list all affected datasets #12812

Merged

13 tasks

behlendorf closed this in 0409d33 Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved zpool status output, list all the dataset affected by errlog. #9175

Improved zpool status output, list all the dataset affected by errlog. #9175

TulsiJain commented Aug 18, 2019 •

edited

Loading

codecov bot commented Aug 18, 2019

codecov bot commented Aug 21, 2019 •

edited

Loading

allanjude Sep 10, 2019

allanjude Sep 10, 2019

allanjude Sep 10, 2019

gamanakis commented Dec 1, 2021

Improved zpool status output, list all the dataset affected by errlog. #9175

Improved zpool status output, list all the dataset affected by errlog. #9175

Conversation

TulsiJain commented Aug 18, 2019 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Aug 18, 2019

Codecov Report

codecov bot commented Aug 21, 2019 • edited Loading

Codecov Report

allanjude Sep 10, 2019

Choose a reason for hiding this comment

allanjude Sep 10, 2019

Choose a reason for hiding this comment

allanjude Sep 10, 2019

Choose a reason for hiding this comment

gamanakis commented Dec 1, 2021

TulsiJain commented Aug 18, 2019 •

edited

Loading

codecov bot commented Aug 21, 2019 •

edited

Loading