Skip to content

Commit

Permalink
Improve resilver ETAs
Browse files Browse the repository at this point in the history
When resilvering the estimated time remaining is calculated using
the average issue rate over the current pass.  Where the current
pass starts when a scan was started or restarted when the pool
was exported/imported.

For dRAID pools in particular this can result in wildly optimistic
estimates since the issue rate will be very high while scanning
when non-degraded regions of the pool are scanned.  Once repair
I/O starts being issued performance drops to a realistic number
be the estimated performance is still significantly skewed.

To address this we redefine a pass such that it starts after a
scanning phase completes so the issue rate is more reflective of
recent performance.  Additionally, the zfs_scan_report_txgs
module option can be set to reset the pass statistics more often.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
  • Loading branch information
behlendorf committed Jan 20, 2023
1 parent 7197494 commit 3db2f02
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 2 deletions.
2 changes: 1 addition & 1 deletion cmd/zpool/zpool_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -7619,7 +7619,7 @@ print_scan_scrub_resilver_status(pool_scan_stat_t *ps)

if (pause == 0) {
if (total_secs_left != UINT64_MAX &&
issue_rate >= 10 * 1024 * 1024) {
issue_rate >= 10 * 1024 * 1024 && ps->pss_processed > 0) {
(void) printf(gettext(", %s to go\n"), time_buf);
} else {
(void) printf(gettext(", no estimated "
Expand Down
7 changes: 7 additions & 0 deletions man/man4/zfs.4
Original file line number Diff line number Diff line change
Expand Up @@ -1890,6 +1890,13 @@ I/O.
In this case (unless the metadata scan is done) we stop issuing verification I/O
and start scanning metadata again until we get to the hard limit.
.
.It Sy zfs_scan_report_txgs Ns = Ns Sy 0 Ns | Ns 1 Pq uint
When reporting resilver throughput and estimated completion time use the
performance observed over roughly the last
.Sy zfs_scan_report_txgs
TXGs.
When set to zero performance is calculated over the time between checkpoints.
.
.It Sy zfs_scan_strict_mem_lim Ns = Ns Sy 0 Ns | Ns 1 Pq int
Enforce tight memory limits on pool scans when a sequential scan is in progress.
When disabled, the memory limit may be exceeded by fast disks.
Expand Down
25 changes: 25 additions & 0 deletions module/zfs/dsl_scan.c
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,15 @@ static uint64_t dsl_scan_count_data_disks(vdev_t *vd);
extern uint_t zfs_vdev_async_write_active_min_dirty_percent;
static int zfs_scan_blkstats = 0;

/*
* 'zpool status' uses bytes processed per pass to report throughput and
* estimate time remaining. We define a pass to start when the scanning
* phase completes for a sequential resilver. Optionally, this value
* may be used to reset the pass statistics every N txgs to provide an
* estimated completion time based on currently observed performance.
*/
static uint_t zfs_scan_report_txgs = 0;

/*
* By default zfs will check to ensure it is not over the hard memory
* limit before each txg. If finer-grained control of this is needed
Expand Down Expand Up @@ -604,6 +613,8 @@ dsl_scan_init(dsl_pool_t *dp, uint64_t txg)
}

spa_scan_stat_init(spa);
vdev_scan_stat_init(spa->spa_root_vdev);

return (0);
}

Expand Down Expand Up @@ -763,6 +774,7 @@ dsl_scan_setup_sync(void *arg, dmu_tx_t *tx)
scn->scn_last_checkpoint = 0;
scn->scn_checkpointing = B_FALSE;
spa_scan_stat_init(spa);
vdev_scan_stat_init(spa->spa_root_vdev);

if (DSL_SCAN_IS_SCRUB_RESILVER(scn)) {
scn->scn_phys.scn_ddt_class_max = zfs_scrub_ddt_class_max;
Expand Down Expand Up @@ -3652,6 +3664,16 @@ dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx)
return;
}

/*
* Disabled by default, set zfs_scan_report_txgs to report
* average performance over the last zfs_scan_report_txgs TXGs.
*/
if (!dsl_scan_is_paused_scrub(scn) && zfs_scan_report_txgs != 0 &&
tx->tx_txg % zfs_scan_report_txgs == 0) {
scn->scn_issued_before_pass += spa->spa_scan_pass_issued;
spa_scan_stat_init(spa);
}

/*
* It is possible to switch from unsorted to sorted at any time,
* but afterwards the scan will remain sorted unless reloaded from
Expand Down Expand Up @@ -3780,6 +3802,9 @@ dsl_scan_sync(dsl_pool_t *dp, dmu_tx_t *tx)
if (scn->scn_is_sorted) {
scn->scn_checkpointing = B_TRUE;
scn->scn_clearing = B_TRUE;
scn->scn_issued_before_pass +=
spa->spa_scan_pass_issued;
spa_scan_stat_init(spa);
}
zfs_dbgmsg("scan complete for %s txg %llu",
spa->spa_name,
Expand Down
1 change: 0 additions & 1 deletion module/zfs/spa_misc.c
Original file line number Diff line number Diff line change
Expand Up @@ -2556,7 +2556,6 @@ spa_scan_stat_init(spa_t *spa)
spa->spa_scan_pass_scrub_spent_paused = 0;
spa->spa_scan_pass_exam = 0;
spa->spa_scan_pass_issued = 0;
vdev_scan_stat_init(spa->spa_root_vdev);
}

/*
Expand Down

0 comments on commit 3db2f02

Please sign in to comment.