Skip to content

Commit

Permalink
Increase default zfs_rebuild_vdev_limit to 64MB
Browse files Browse the repository at this point in the history
When testing distributed rebuild performance with more capable
hardware it was observed than increasing the zfs_rebuild_vdev_limit
to 64M reduced the rebuild time by 17%.  Beyond 64MB there was
some improvement (~2%) but it was not significant when weighed
against the increased memory usage. Memory usage is capped at 1/4
of arc_c_max.

Additionally, vr_bytes_inflight_max has been moved so it's updated
per-metaslab to allow the size to be adjust while a rebuild is
running.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
  • Loading branch information
behlendorf committed Jan 25, 2023
1 parent cd74e5e commit 07f3fd0
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 10 deletions.
2 changes: 1 addition & 1 deletion man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -1769,7 +1769,7 @@ completes in order to verify the checksums of all blocks which have been
resilvered.
This is enabled by default and strongly recommended.
.
.It Sy zfs_rebuild_vdev_limit Ns = Ns Sy 33554432 Ns B Po 32 MiB Pc Pq u64
.It Sy zfs_rebuild_vdev_limit Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq u64
Maximum amount of I/O that can be concurrently issued for a sequential
resilver per leaf device, given in bytes.
.
Expand Down
24 changes: 15 additions & 9 deletions module/zfs/vdev_rebuild.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#include <sys/zio.h>
#include <sys/dmu_tx.h>
#include <sys/arc.h>
#include <sys/arc_impl.h>
#include <sys/zap.h>

/*
Expand Down Expand Up @@ -116,13 +117,12 @@ static uint64_t zfs_rebuild_max_segment = 1024 * 1024;
* segment size is also large (zfs_rebuild_max_segment=1M). This helps keep
* the queue depth short.
*
* 32MB was selected as the default value to achieve good performance with
* a large 90-drive dRAID HDD configuration (draid2:8d:90c:2s). A sequential
* rebuild was unable to saturate all of the drives using smaller values.
* With a value of 32MB the sequential resilver write rate was measured at
* 800MB/s sustained while rebuilding to a distributed spare.
* 64MB was observed to deliver the best performance and set as the default.
* Testing was performed with a 106-drive dRAID HDD pool (draid2:11d:106c)
* and a rebuild rate of 1.2GB/s was measured to the distribute spare.
* Smaller values were unable to fully saturate the available pool I/O.
*/
static uint64_t zfs_rebuild_vdev_limit = 32 << 20;
static uint64_t zfs_rebuild_vdev_limit = 64 << 20;

/*
* Automatically start a pool scrub when the last active sequential resilver
Expand Down Expand Up @@ -786,9 +786,6 @@ vdev_rebuild_thread(void *arg)
vr->vr_pass_bytes_scanned = 0;
vr->vr_pass_bytes_issued = 0;

vr->vr_bytes_inflight_max = MAX(1ULL << 20,
zfs_rebuild_vdev_limit * vd->vdev_children);

uint64_t update_est_time = gethrtime();
vdev_rebuild_update_bytes_est(vd, 0);

Expand All @@ -804,6 +801,15 @@ vdev_rebuild_thread(void *arg)
metaslab_t *msp = vd->vdev_ms[i];
vr->vr_scan_msp = msp;

/*
* Calculate the max number of in-flight bytes for top vdev
* scanning operations (minimum 1MB / maximum 1/4 of arc_c_max).
* Limits for the issuing phase are done per top-level vdev and
* are handled separately.
*/
vr->vr_bytes_inflight_max = MIN(arc_c_max / 4, MAX(1ULL << 20,
zfs_rebuild_vdev_limit * vd->vdev_children));

/*
* Removal of vdevs from the vdev tree may eliminate the need
* for the rebuild, in which case it should be canceled. The
Expand Down

0 comments on commit 07f3fd0

Please sign in to comment.