Resilver performance tuning #14428

behlendorf · 2023-01-25T01:10:38Z

Motivation and Context

Update the default values for two resilver tunings to maximize performance. These changes do increase the possible memory footprint but my feeling is the performance improvements are worth the tradeoff. Note that neither the allowed number of outstanding I/Os has changed (zfs_vdev_scrub_max_active) nor the non-interactive I/O tunables so this should not change interactive performance.

Description

97cda28 - Increase default zfs_rebuild_vdev_limit to 64MB

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage.

467fd50 - Increase default zfs_scan_vdev_limit to 16MB

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk.

How Has This Been Tested?

Local scrub, sequential resilver, and sequential rebuild tests using a HDD based dRAID pool (draid2:11d:106c:2s-0). Our updated test results show that allowing additional memory to be used for the scan/rebuild queues can significantly improve performance. Earlier performance testing was done with less capable hardware originally obscured this due to other bottlenecks in the system.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

behlendorf · 2023-01-25T18:47:55Z

@akashb-22 you may be interested in this PR as well. Our testing matches what you were seeing, we're able to get the vast majority of the performance improvements with these two changes to the default tunings.

amotin · 2023-01-25T20:15:33Z

As I've said in private, considering how high this is going, we should better limit all those to a reasonable fraction of ARC. Otherwise for small systems it may any up too much. And even if such systems should better be properly configured, there should be some safety belts.

behlendorf · 2023-01-25T20:58:28Z

Agreed! I've updated the PR accordingly.

jumbi77 · 2023-01-25T21:05:37Z

In case this gets upstreamed we should also update module parameter description for zfs_scan_vdev_limit and maybe add a new section for zfs_rebuild_vdev_limit.

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

behlendorf · 2023-01-26T00:17:37Z

Refreshed with a minor update to the rebuild code to divide the maximum 1/4 of arc_c_max by the number of top-level vdev is in the pool.

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14428

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14428

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #14428

For HDD based pools the default zfs_scan_vdev_limit of 4M per-vdev can significantly limit the maximum scrub performance. Increasing the default to 16M can double the scrub speed from 80 MB/s per disk to 160 MB/s per disk. This does increase the memory footprint during scrub/resilver but given the performance win this is a reasonable trade off. Memory usage is capped at 1/4 of arc_c_max. Note that number of outstanding I/Os has not changed and is still limited by zfs_vdev_scrub_max_active. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428 Conflicts: man/man4/zfs.4 module/zfs/dsl_scan.c

When testing distributed rebuild performance with more capable hardware it was observed than increasing the zfs_rebuild_vdev_limit to 64M reduced the rebuild time by 17%. Beyond 64MB there was some improvement (~2%) but it was not significant when weighed against the increased memory usage. Memory usage is capped at 1/4 of arc_c_max. Additionally, vr_bytes_inflight_max has been moved so it's updated per-metaslab to allow the size to be adjust while a rebuild is running. Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#14428 Conflicts: man/man4/zfs.4 module/zfs/vdev_rebuild.c

behlendorf added Type: Performance Performance improvement or performance problem Status: Code Review Needed Ready for review and testing labels Jan 25, 2023

behlendorf force-pushed the resilver-tuning branch from 97cda28 to c48ed10 Compare January 25, 2023 18:39

behlendorf requested a review from amotin January 25, 2023 18:40

behlendorf force-pushed the resilver-tuning branch from c48ed10 to b4aad96 Compare January 25, 2023 20:57

behlendorf force-pushed the resilver-tuning branch from b4aad96 to 07f3fd0 Compare January 25, 2023 21:00

tonynguien approved these changes Jan 25, 2023

View reviewed changes

behlendorf added 2 commits January 25, 2023 16:15

behlendorf force-pushed the resilver-tuning branch from 07f3fd0 to 3084208 Compare January 26, 2023 00:16

amotin approved these changes Jan 26, 2023

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 27, 2023

akashb-22 approved these changes Jan 27, 2023

View reviewed changes

behlendorf closed this in c0aea7c Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resilver performance tuning #14428

Resilver performance tuning #14428

behlendorf commented Jan 25, 2023

behlendorf commented Jan 25, 2023

amotin commented Jan 25, 2023

behlendorf commented Jan 25, 2023

jumbi77 commented Jan 25, 2023

behlendorf commented Jan 26, 2023

Resilver performance tuning #14428

Resilver performance tuning #14428

Conversation

behlendorf commented Jan 25, 2023

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

behlendorf commented Jan 25, 2023

amotin commented Jan 25, 2023

behlendorf commented Jan 25, 2023

jumbi77 commented Jan 25, 2023

behlendorf commented Jan 26, 2023