backupccl: rework split and scatter mechanism #63471

pbardea · 2021-04-12T13:41:08Z

backupccl: don't include empty import entries during restore
This commit filters out importSpans that have no associated data.

File spans' start keys are set to be the .Next() of the EndKey of the
previous file span. This was at least the case on the backup that is
currently being run against restore2TB. This lead to a lot of importSpan
entries that contained no files and would cover an empty key range. The
precense of these import spans did not effect the performance of
restores that much, but they did cause unnecessary splits and scatters
which further contributed to the elevated NLHEs that were seen.

This commit generally calms down the leaseholder errors, but do not
eliminate them entirely. Future investigation is still needed. It also
does not seem to have any performance impact on RESTORE performance (in
terms of speed of the job).

Release note: None

backupccl: rework split and scatter mechanism
This change reworks how RESTORE splits and scatter's in 2 ways:

This commits updates the split and scatter mechanism that restore uses
to split at the key of the next chunk/entry rather than at the current
chunk/entry.

This resolves a long-standing TODO that updates the split and scattering
of RESTORE to perform a split at the key of the next chunk/entry.
Previously, we would split at a the start key of the span, and then
scatter that span.

Consider a chunk with entries that we want to split and scatter. If we
split at the start of each entry and scatter the entry we just split off
(the right hand side of the split), we'll be continuously scattering the
suffix keyspace of the chunk. It's preferrable to split out a prefix
first (by splitting at the key of the next chunk) and scattering the
prefix.

We additionally refactor the split and scatter processor interface and
stop scattering the individual entries, but rather just split them.

Individual entries are no longer scattered. They used to be scattered
with the randomizeLeases option set to false, but there was not any
indication that this was beneficial to performance, so it was removed
for simplicity.

Release note: None

cockroach-teamcity · 2021-04-12T13:41:15Z

This change is

pbardea · 2021-04-19T16:21:11Z

Adding a few results from benchmarks for this diff.

AddSSTable Request Rate

Before:

After:

Note that we still see some nodes (unexpectedly) finish processing their export requests much sooner than other nodes. This manifests as restores slowing down towards the end. This will be addressed separately.

AdminSplit Request Rate

Before:

After:

This commit filters out importSpans that have no associated data. File spans' start keys are set to be the .Next() of the EndKey of the previous file span. This was at least the case on the backup that is currently being run against restore2TB. This lead to a lot of importSpan entries that contained no files and would cover an empty key range. The precense of these import spans did not effect the performance of restores that much, but they did cause unnecessary splits and scatters which further contributed to the elevated NLHEs that were seen. This commit generally calms down the leaseholder errors, but do not eliminate them entirely. Future investigation is still needed. It also does not seem to have any performance impact on RESTORE performance (in terms of speed of the job). Release note: None

adityamaru

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt and @pbardea)

pkg/ccl/backupccl/split_and_scatter_processor_test.go, line 174 at r2 (raw file):

				2: 2, // Entry 3, Entry 8
				3: 1, // Entry 4
				// 4: 0 // Entry 4 gets redirected to stream 0 since stream 4 does not exist.

nit: this should be Entry 5 gets redirected to stream 0, but you have CI passing so no need to re-push 🙂

This change reworks how RESTORE splits and scatter's in 2 ways: 1. This commits updates the split and scatter mechanism that restore uses to split at the key of the next chunk/entry rather than at the current chunk/entry. This resolves a long-standing TODO that updates the split and scattering of RESTORE to perform a split at the key of the next chunk/entry. Previously, we would split at a the start key of the span, and then scatter that span. Consider a chunk with entries that we want to split and scatter. If we split at the start of each entry and scatter the entry we just split off (the right hand side of the split), we'll be continuously scattering the suffix keyspace of the chunk. It's preferrable to split out a prefix first (by splitting at the key of the next chunk) and scattering the prefix. We additionally refactor the split and scatter processor interface and stop scattering the individual entries, but rather just split them. 2. Individual entries are no longer scattered. They used to be scattered with the randomizeLeases option set to false, but there was not any indication that this was beneficial to performance, so it was removed for simplicity. Release note: None

shermanCRL · 2021-04-26T13:48:59Z

Backportable to 21.1?

pbardea · 2021-04-26T13:55:04Z

@shermanCRL I believe this is. I also think #64067 could be a good candidate to backport. Will wrangle them up today.

bors r=adityamaru

craig · 2021-04-26T15:03:14Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-04-26T16:35:19Z

Build succeeded:

GitHub CI (Cockroach)

64656: release-21.1: improve restore performance consistency r=dt a=pbardea Backport: * 2/2 commits from "backupccl: rework split and scatter mechanism" (#63471) * 1/1 commits from "backupccl: fix a bug in routing scattered ranges" (#63875) * 1/1 commits from "backupccl: create more chunks on larger clusters" (#64067) Please see individual PRs for details. /cc @cockroachdb/release Added do-not-merge label until release-21.1 opens for 21.1.1 backports once the release is out. Co-authored-by: Paul Bardea <pbardea@gmail.com>

pbardea changed the title ~~backupccl: split at next chunk during split and scatter~~ [wip] backupccl: split at next chunk during split and scatter Apr 12, 2021

pbardea force-pushed the split-next branch 2 times, most recently from 0f88201 to 0383802 Compare April 16, 2021 20:34

pbardea changed the title ~~[wip] backupccl: split at next chunk during split and scatter~~ backupccl: rework split and scatter mechanism Apr 16, 2021

pbardea force-pushed the split-next branch from 0383802 to c9faf22 Compare April 19, 2021 03:10

pbardea force-pushed the split-next branch 6 times, most recently from ef15083 to 9a8e318 Compare April 22, 2021 17:25

pbardea force-pushed the split-next branch 2 times, most recently from 33e3447 to 51c0b85 Compare April 22, 2021 22:32

pbardea requested review from dt and adityamaru April 22, 2021 22:32

pbardea marked this pull request as ready for review April 22, 2021 22:32

pbardea requested a review from a team April 22, 2021 22:32

adityamaru approved these changes Apr 23, 2021

View reviewed changes

pbardea force-pushed the split-next branch from 51c0b85 to bae5e13 Compare April 23, 2021 15:26

craig bot merged commit 2dbd33c into cockroachdb:master Apr 26, 2021

pbardea mentioned this pull request May 4, 2021

release-21.1: improve restore performance consistency #64656

Merged

This was referenced May 4, 2021

release-20.2: restore performance consistency improvements #64659

Merged

restore: investigation of performance bottlenecks #64937

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backupccl: rework split and scatter mechanism #63471

backupccl: rework split and scatter mechanism #63471

pbardea commented Apr 12, 2021 •

edited

Loading

cockroach-teamcity commented Apr 12, 2021

pbardea commented Apr 19, 2021 •

edited

Loading

adityamaru left a comment

shermanCRL commented Apr 26, 2021

pbardea commented Apr 26, 2021

craig bot commented Apr 26, 2021

craig bot commented Apr 26, 2021

backupccl: rework split and scatter mechanism #63471

backupccl: rework split and scatter mechanism #63471

Conversation

pbardea commented Apr 12, 2021 • edited Loading

cockroach-teamcity commented Apr 12, 2021

pbardea commented Apr 19, 2021 • edited Loading

AddSSTable Request Rate

AdminSplit Request Rate

adityamaru left a comment

Choose a reason for hiding this comment

shermanCRL commented Apr 26, 2021

pbardea commented Apr 26, 2021

craig bot commented Apr 26, 2021

craig bot commented Apr 26, 2021

pbardea commented Apr 12, 2021 •

edited

Loading

pbardea commented Apr 19, 2021 •

edited

Loading