Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize several operations during zpool import #11470

Closed
wants to merge 3 commits into from

Conversation

asomers
Copy link
Contributor

@asomers asomers commented Jan 16, 2021

Motivation and Context

zpool import is too slow for large pools on spinning disks. It can sometimes take over an hour. Most of that time is occupied by simply waiting for serialized disk I/O. The worst offender is metaslab_init, which must be repeated hundreds of times for each top-level vdev.

Description

This PR removes the worst bottlenecks during the import process:

  • Parallelizes vdev_load by using a taskqueue to load each top-level vdev.
  • Reads all four disk labels concurrently in vdev_label_read_config, similarly to what's already done in vdev_geom_read_config.
  • Parallelizes vdev_validate by using a taskqueue for all vdevs.

How Has This Been Tested?

Performance tested using various large pools on FreeBSD 13. Regression tested using the FreeBSD ZFS test suite. Shortens the zpool import time by about 6x for large pools. When combined by the changes in PRs #11469 and #11467 , shortens the import time by about 8x.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

asomers added a commit to asomers/freebsd-src that referenced this pull request Jan 19, 2021
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

openzfs/zfs#11470

Sponsored by:   Axcient
asomers added a commit to asomers/freebsd-src that referenced this pull request Jan 19, 2021
This is similar to what we already do in vdev_geom_read_config.

openzfs/zfs#11470

Sponsored by:   Axcient
asomers added a commit to asomers/freebsd-src that referenced this pull request Jan 19, 2021
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel.

openzfs/zfs#11470

Sponsored by:   Axcient
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jan 20, 2021
@behlendorf
Copy link
Contributor

behlendorf commented Jan 20, 2021

Great idea! But just in case you hadn't noticed this does appear to introduce a locking problem with the spa config lock.

http://build.zfsonlinux.org/builders/CentOS%207%20x86_64%20%28TEST%29/builds/14599/steps/shell_4/logs/console

@asomers
Copy link
Contributor Author

asomers commented Jan 20, 2021

Great idea! But just in case you hadn't noticed this does appear to introduce a locking problem with the spa config lock.

http://build.zfsonlinux.org/builders/CentOS%207%20x86_64%20%28TEST%29/builds/14599/steps/shell_4/logs/console

Yep. I didn't notice that at first because assertions were accidentally disabled in my build. I'm testing a fix now. AFAICT it's safe for multiple threads to call vdev_validate simultaneously, because each only access's one struct vdev. It doesn't try to modify the parents or children.

@asomers
Copy link
Contributor Author

asomers commented Jan 25, 2021

Of the test cases that failed, none of them failed on more than one run. So I think they're all intermittent. They are:
zfs_destroy_dev_removal_condense
l2arc/persist_l2arc_006_pos
cli_root/zfs_send/zfs_send-b
cli_root/zfs_send/zfs_send_003_pos
cli_root/zfs_send/zfs_send_005_pos
cli_root/zfs_send/zfs_send_encrypted
cli_root/zfs_send/zfs_send_raw
pyzfs/pyzfs_unittest
redacted_send/redacted_negative
removal/removal_with_send

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement! I agree the failures look unrelated but let's be sure. If you rebase this on the the latest master and force update the PR you should be a clean run including on FreeBSD HEAD.

@asomers
Copy link
Contributor Author

asomers commented Jan 25, 2021

Here we go!

@behlendorf behlendorf requested a review from sdimitro January 25, 2021 22:32
@behlendorf
Copy link
Contributor

@asomers thanks. Since this is all related any objections to us squashing this in to a single commit when merged. Keeping it split up is fine with me too as long as we squash the top two bug fixes f1221c0 in f11a497 the relevant previous commit.

@asomers
Copy link
Contributor Author

asomers commented Jan 25, 2021

Yes, I think it should be split into three commits. But the final commit actually needs to be squashed to the first, not the third. Would you like me to do it now, or wait until CI finishes?

@behlendorf
Copy link
Contributor

@asomers it'd be great if you could sort it out now so each of the commits really can stand by itself.

metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f66).  The lock wasn't added until
3dfb57a, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by:	Axcient
Signed-off-by:	Alan Somers <asomers@gmail.com>
This is similar to what we already do in vdev_geom_read_config.

Sponsored by:	Axcient
Signed-off-by:	Alan Somers <asomers@gmail.com>
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel.

Sponsored by:	Axcient
Signed-off-by:	Alan Somers <asomers@gmail.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jan 26, 2021
behlendorf pushed a commit that referenced this pull request Jan 27, 2021
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
behlendorf pushed a commit that referenced this pull request Jan 27, 2021
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes #11470
@asomers asomers deleted the vdev_load branch January 27, 2021 16:21
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f66).  The lock wasn't added until
3dfb57a, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
metaslab_init is the slowest part of importing a mature pool, and it
must be repeated hundreds of times for each top-level vdev.  But its
speed is dominated by a few serialized disk accesses.  That can lead to
import times of > 1 hour for pools with many top-level vdevs on spinny
disks.

Speed up the import by using a taskqueue to parallelize vdev_load across
all top-level vdevs.

This also requires adding mutex protection to
metaslab_class_t.mc_historgram.  The mc_histogram fields were
unprotected when that code was first written in "Illumos 4976-4984 -
metaslab improvements" (OpenZFS
f3a7f66).  The lock wasn't added until
3dfb57a, though it's unclear exactly
which fields it's supposed to protect.  In any case, it wasn't until
vdev_load was parallelized that any code attempted concurrent access to
those fields.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
This is similar to what we already do in vdev_geom_read_config.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
The runtime of vdev_validate is dominated by the disk accesses in
vdev_label_read_config.  Speed it up by validating all vdevs in
parallel using a taskq.

Sponsored by: Axcient
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alan Somers <asomers@gmail.com>
Closes openzfs#11470
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants