Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: rework split and scatter mechanism #63471

Merged
merged 2 commits into from
Apr 26, 2021

Commits on Apr 22, 2021

  1. backupccl: don't include empty import entries during restore

    This commit filters out importSpans that have no associated data.
    
    File spans' start keys are set to be the .Next() of the EndKey of the
    previous file span. This was at least the case on the backup that is
    currently being run against restore2TB. This lead to a lot of importSpan
    entries that contained no files and would cover an empty key range. The
    precense of these import spans did not effect the performance of
    restores that much, but they did cause unnecessary splits and scatters
    which further contributed to the elevated NLHEs that were seen.
    
    This commit generally calms down the leaseholder errors, but do not
    eliminate them entirely. Future investigation is still needed. It also
    does not seem to have any performance impact on RESTORE performance (in
    terms of speed of the job).
    
    Release note: None
    pbardea committed Apr 22, 2021
    Configuration menu
    Copy the full SHA
    58dacfb View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2021

  1. backupccl: rework split and scatter mechanism

    This change reworks how RESTORE splits and scatter's in 2 ways:
    
    1.
    This commits updates the split and scatter mechanism that restore uses
    to split at the key of the next chunk/entry rather than at the current
    chunk/entry.
    
    This resolves a long-standing TODO that updates the split and scattering
    of RESTORE to perform a split at the key of the next chunk/entry.
    Previously, we would split at a the start key of the span, and then
    scatter that span.
    
    Consider a chunk with entries that we want to split and scatter. If we
    split at the start of each entry and scatter the entry we just split off
    (the right hand side of the split), we'll be continuously scattering the
    suffix keyspace of the chunk. It's preferrable to split out a prefix
    first (by splitting at the key of the next chunk) and scattering the
    prefix.
    
    We additionally refactor the split and scatter processor interface and
    stop scattering the individual entries, but rather just split them.
    
    2.
    Individual entries are no longer scattered. They used to be scattered
    with the randomizeLeases option set to false, but there was not any
    indication that this was beneficial to performance, so it was removed
    for simplicity.
    
    Release note: None
    pbardea committed Apr 23, 2021
    Configuration menu
    Copy the full SHA
    bae5e13 View commit details
    Browse the repository at this point in the history