-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MMP write frequency for large pools #7289
Conversation
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and thus could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 thus are unlikely to be impacted. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#7205
@tonyhutter @ofaaland we should also consider this one for 0.7.7. |
Codecov Report
@@ Coverage Diff @@
## master #7289 +/- ##
==========================================
- Coverage 76.5% 76.4% -0.11%
==========================================
Files 328 328
Lines 104005 104005
==========================================
- Hits 79572 79461 -111
- Misses 24433 24544 +111
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find!
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7205 Closes openzfs#7289
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7205 Closes openzfs#7289
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7205 Closes openzfs#7289
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7205 Closes openzfs#7289
This is a squashed patchset for zfs-0.7.7. The individual commits are in the tonyhutter:zfs-0.7.7-hutter branch. I squashed the commits so that buildbot wouldn't have to run against each one, and because github/builbot seem to have a maximum limit of 30 commits they can test from a PR. - Fix MMP write frequency for large pools openzfs#7205 openzfs#7289 - Handle zio_resume and mmp => off openzfs#7286 - Fix zfs-kmod builds when using rpm >= 4.14 openzfs#7284 - zdb and inuse tests don't pass with real disks openzfs#6939 openzfs#7261 - Take user namespaces into account in policy checks openzfs#6800 openzfs#7270 - Detect long config lock acquisition in mmp openzfs#7212 - Linux 4.16 compat: get_disk_and_module() openzfs#7264 - Change checksum & IO delay ratelimit values openzfs#7252 - Increment zil_itx_needcopy_bytes properly openzfs#6988 openzfs#7176 - Fix some typos openzfs#7237 - Fix zpool(8) list example to match actual format openzfs#7244 - Add SMART self-test results to zpool status -c openzfs#7178 - Add scrub after resilver zed script openzfs#4662 openzfs#7086 - Fix free memory calculation on v3.14+ openzfs#7170 - Report duration and error in mmp_history entries openzfs#7190 - Do not initiate MMP writes while pool is suspended openzfs#7182 - Linux 4.16 compat: use correct *_dec_and_test() - Allow modprobe to fail when called within systemd openzfs#7174 - Add SMART attributes for SSD and NVMe openzfs#7183 openzfs#7193 - Correct count_uberblocks in mmp.kshlib openzfs#7191 - Fix config issues: frame size and headers openzfs#7169 - Clarify zinject(8) explanation of -e openzfs#7172 - OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio openzfs#7168 - 'zfs receive' fails with "dataset is busy" openzfs#7129 openzfs#7154 - contrib/initramfs: add missing conf.d/zfs openzfs#7158 - mmp should use a fixed tag for spa_config locks openzfs#6530 openzfs#7155 - Handle zap_add() failures in mixed case mode openzfs#7011 openzfs#7054 - Fix zdb -ed on objset for exported pool openzfs#7099 openzfs#6464 - Fix zdb -E segfault openzfs#7099 - Fix zdb -R decompression openzfs#7099 openzfs#4984 - Fix racy assignment of zcb.zcb_haderrors openzfs#7099 - Fix zle_decompress out of bound access openzfs#7099 - Fix zdb -c traverse stop on damaged objset root openzfs#7099 - Linux 4.11 compat: avoid refcount_t name conflict openzfs#7148 - Linux 4.16 compat: inode_set_iversion() openzfs#7148 - OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable openzfs#7141 - Remove deprecated zfs_arc_p_aggressive_disable openzfs#7135 - Fix default libdir for Debian/Ubuntu openzfs#7083 openzfs#7101 - Bug fix in qat_compress.c for vmalloc addr check openzfs#7125 - Fix systemd_ RPM macros usage on Debian-based distributions openzfs#7074 openzfs#7100 - Emit an error message before MMP suspends pool openzfs#7048 - ZTS: Fix create-o_ashift test case openzfs#6924 openzfs#6977 - Fix --with-systemd on Debian-based distributions (openzfs#6963) openzfs#6591 openzfs#6963 - Remove vn_rename and vn_remove dependency openzfs/spl#648 openzfs#6753 - Fix "--enable-code-coverage" debug build openzfs#6674 - Update codecov.yml openzfs#6669 - Add support for "--enable-code-coverage" option openzfs#6670 - Make "-fno-inline" compile option more accessible openzfs#6605 - Add configure option to enable gcov analysis openzfs#6642 - Implement --enable-debuginfo to force debuginfo openzfs#2734 - Make --enable-debug fail when given bogus args openzfs#2734 Signed-off-by: Tony Hutter <hutter2@llnl.gov> Requires-spl: refs/pull/690/head
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays. This issue was reported on Arch Linux where HZ defaults to only 100 and this could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 and thus are unlikely to be impacted. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov> Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7205 Closes #7289
Description
When a single pool contains more vdevs than the CONFIG_HZ for for the kernel the mmp thread will not delay properly. Switch to using cv_timedwait_sig_hires() to handle higher resolution delays.
This issue was reported on Arch Linux where HZ defaults to only 100 and thus could be fairly easily reproduced with a reasonably large pool. Most distribution kernels set CONFIG_HZ=250 or CONFIG_HZ=1000 thus are unlikely to be impacted.
Motivation and Context
Issue #7205
How Has This Been Tested?
Locally created a pool with 2000 vdevs on a system with CONFIG_HZ=1000, enabled
multihost
and observed an IO rate of ~2000 IO/s (one per-vdev per second). With out the patch the mmp thread would spin.Types of changes
Checklist:
Signed-off-by
.