Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize RAIDZ expansion #16819

Merged
merged 1 commit into from
Dec 6, 2024
Merged

Optimize RAIDZ expansion #16819

merged 1 commit into from
Dec 6, 2024

Conversation

amotin
Copy link
Member

@amotin amotin commented Nov 29, 2024

  • Instead of copying one ashift-sized block per ZIO, copy as much as we have contiguous data up to 16MB per old vdev. To avoid data moves use gang ABDs, so that read ZIOs can directly fill buffers for write ZIOs. ABDs have much smaller overhead than ZIOs in both memory usage and processing time, plus big I/Os do not depend on I/O aggregation and scheduling to reach decent performance on HDDs.
  • Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
  • Use 32bit range tree when possible (practically always now) to slightly reduce memory usage.
  • Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as for main ones.
  • Fix rate overflows in zpool status reporting.

Closes #15680

How Has This Been Tested?

With these changes expanding RAIDZ1 from 4 to 5 children I am able to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are limited by devices instead of CPU.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@snajpa
Copy link
Contributor

snajpa commented Nov 29, 2024

Fix rate overflows in zpool status reporting.

so that'll probably fix this one #16803, right? I'm thinking if it wouldn't be better to have that as a separate commit which could be backported to older releases without the rest of the change?

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Nov 30, 2024
@amotin
Copy link
Member Author

amotin commented Nov 30, 2024

so that'll probably fix this one #16803, right? I'm thinking if it wouldn't be better to have that as a separate commit which could be backported to older releases without the rest of the change?

@snajpa It won't. This should fix improperly reported expansion rate and estimated completion time. It has nothing to do with percentage completed. And I am not sure couple line change of cosmetic issue worth a commit, though might be.

 - Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
@snajpa
Copy link
Contributor

snajpa commented Nov 30, 2024

And I am not sure couple line change of cosmetic issue worth a commit, though might be.

If those are going to get us new open tickets I'd say it might be worth it :D

@behlendorf
Copy link
Contributor

This should however address #15680.

@liyimeng
Copy link

liyimeng commented Dec 3, 2024

Will this be part of 2.3 release? :)

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 6, 2024
@behlendorf behlendorf merged commit a44eaf1 into openzfs:master Dec 6, 2024
24 checks passed
@amotin amotin deleted the raidz_rt branch December 6, 2024 16:51
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Dec 6, 2024
- Instead of copying one ashift-sized block per ZIO, copy as much
as we have contiguous data up to 16MB per old vdev.  To avoid data
moves use gang ABDs, so that read ZIOs can directly fill buffers
for write ZIOs.  ABDs have much smaller overhead than ZIOs in both
memory usage and processing time, plus big I/Os do not depend on
I/O aggregation and scheduling to reach decent performance on HDDs.
 - Reduce raidz_expand_max_copy_bytes to 16MB on 32bit platforms.
 - Use 32bit range tree when possible (practically always now) to
slightly reduce memory usage.
 - Use ZIO_PRIORITY_REMOVAL for early stages of expansion, same as
for main ones.
 - Fix rate overflows in `zpool status` reporting.

With these changes expanding RAIDZ1 from 4 to 5 children I am able
to reach 6-12GB/s rate on SSDs and ~500MB/s on HDDs, both are
limited by devices instead of CPU.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#15680
Closes openzfs#16819
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve RAIDZ expansion performance
4 participants