bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

jeffrey-xiao · 2019-06-06T21:31:38Z

The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning
replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism.

All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large.

Fixes #37697.

@dt I don't have context on how long import jobs take, so an hour might be too conservative.

cockroach-teamcity · 2019-06-06T21:31:45Z

This change is

nvanbenschoten

The code here looks mostly complete, but it's unclear to me whether it is implementing the exact approach we want. There were a few ideas floated around in #37697, so I understand why there might be some confusion. I'm also a bit confused about what we agreed upon, so input from @dt would be appreciated.

Also, after this change, it looks like SchemaChanger.backfillIndexes will be the last remaining users of Gossip.DisableMerges (other than the pre-migration paths). It would be nice to make a similar change there in a different PR so that we can eventually go through and remove the code entirely.

The other big point I'd leave here is that there are a decent number of new conditions introduced in this PR. It's critical that this kind of stuff gets well commented when introduced or it becomes significantly more difficult to understand out of context when working on tangentially related changes. It's usually easy enough to parse what a block of code is doing, but in a large code base, it can become harder to understand why a block of code is doing what it's doing. For instance, why does condition X allow us to skip operation Y? Why would we ever want to perform operation Y? Leaving descriptive comments about this can save a lot of time in the future.

Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @jeffrey-xiao)

pkg/ccl/backupccl/restore.go, line 1079 at r2 (raw file):

	}

	// TODO(jeffreyxiao): Remove this check in 20.1.

Explain this in a comment. Why do we only disable merging if the cluster version is above this level?

pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

	}

	// Unsplit ranges after restoration is complete.

In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt

EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?

pkg/ccl/importccl/import_stmt.go, line 786 at r2 (raw file):

	}

	stickyBitEnabled := r.settings.Version.IsActive(cluster.VersionStickyBit)

Same thing here. Give these descriptive comments about why we're only doing this when this version is inactive.

pkg/sql/backfill.go, line 1101 at r2 (raw file):

		return err
	}
	// Unsplit ranges after backfill and validation are complete.

Same point as above. Is this still the approach we want?

pkg/storage/replica_command.go, line 384 at r1 (raw file):

) (roachpb.AdminUnsplitResponse, *roachpb.Error) {
	var reply roachpb.AdminUnsplitResponse
	var lastErr error

Instead of copying all of this from AdminSplit, can we pull out a utility function? At some point, we might need the same logic structure for AdminMerge as well.

pkg/storage/replica_command.go, line 469 at r1 (raw file):

		return txn.Run(ctx, b)
	})
	// The ConditionFailedError can occur because the descriptors acting as

We might as well mirror the same logic as that in adminSplitWithDescriptor and put this inside of a ; err != nil { block.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jeffrey-xiao, and @nvanbenschoten)

pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt

EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?

Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).

pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):

		expirationTime := hlc.Timestamp{}
		if stickyBitEnabled {
			expirationTime = sp.db.Clock().Now().Add(time.Hour.Nanoseconds(), 0)

what harm does setting this do on older versions?

pkg/sql/backfill.go, line 1101 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same point as above. Is this still the approach we want?

actually, in this case, these are not even the right spans at all -- IIRC these are the PK spans that we read from in order to generate and write index spans -- the splits, we make (well, we don't yet but we should) would be in the index, not these, and we don't have a list of them (so it is like direct IMPORT).

jeffrey-xiao

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)

pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, dt (David Taylor) wrote…

Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).

So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?

pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):

Previously, dt (David Taylor) wrote…

what harm does setting this do on older versions?

On older versions stickyBitEnabled would be false, so the split is always considered for automatic merging if the merge queue is enabled. The behavior should be consistent in a mixed version cluster.

dt

bulk-io parts LGTM (leaving core parts to core reviewers)

Reviewed 3 of 5 files at r5, 3 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @jeffrey-xiao, and @nvanbenschoten)

pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, jeffrey-xiao (Jeffrey Xiao) wrote…

So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?

yep, 1h is fine.

nvanbenschoten

Reviewed 1 of 6 files at r3, 5 of 5 files at r5, 5 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffrey-xiao)

pkg/storage/replica_command.go, line 414 at r5 (raw file):

}

func (r *Replica) executeCommand(ctx context.Context, command func() error) *roachpb.Error {

This is nice, but I think we can improve it a little more. To start, I'd rename it to something like executeAdminCommandWithDescriptor. There are plenty of other commands that would never want to call this.

We can then give it a bit of justification in a comment. Why would anyone call this? What's the point of the retry loop? To answer that, we'll end up saying something about how the function is wrapping a read-modify-write operation for RangeDescriptors in a retry loop. Retry loops around these kinds of operations are pretty common because the write will often perform a CAS-like operation to avoid write skew. We see this all the time when dealing with atomics.

Once we talk about that, it may become evident that a more useful/understandable signature is something like:

func(ctx context.Context, updateDesc func(*roachpb.RangeDescriptor) error) *roachpb.Error {

This commit extracts the retry logic in AdminSplit into a helper function and uses for AdminUnsplit. Release note: None

…kfill, restore The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism. All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large. Release note: None

nvanbenschoten

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)

jeffrey-xiao · 2019-06-19T18:49:11Z

TFTRs!

bors r+

@dt

38079: bulk: use expiration time to disable automatic merging in import, backfill, restore r=jeffrey-xiao a=jeffrey-xiao The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism. All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large. Fixes #37697. @dt I don't have context on how long import jobs take, so an hour might be too conservative. 38295: storage: fix flake in Test{Conditional,Init}PutUpdatesTSCacheOnError r=nvanbenschoten a=nvanbenschoten Fixes #38256. A request was slipping in between the manual clock update and the first request the test sent with an unspecified timestamp. This commit fixes the issue by explicitly specifying the timestamp. Release note: None Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2019-06-19T19:22:39Z

Build succeeded

GitHub CI (Cockroach)

jeffrey-xiao requested review from a team June 6, 2019 21:31

jeffrey-xiao force-pushed the import-use-manual-splitting branch 3 times, most recently from 16086a3 to fa5c6ca Compare June 7, 2019 15:33

jeffrey-xiao requested a review from a team June 7, 2019 15:33

jeffrey-xiao force-pushed the import-use-manual-splitting branch from fa5c6ca to 3dae6e2 Compare June 7, 2019 15:34

jeffrey-xiao requested review from dt, nvanbenschoten and ajwerner June 13, 2019 18:21

nvanbenschoten requested changes Jun 17, 2019

View reviewed changes

jeffrey-xiao force-pushed the import-use-manual-splitting branch 2 times, most recently from c6d16f9 to e88dbc8 Compare June 17, 2019 16:34

dt reviewed Jun 17, 2019

View reviewed changes

jeffrey-xiao force-pushed the import-use-manual-splitting branch from e88dbc8 to a03c883 Compare June 17, 2019 18:50

jeffrey-xiao commented Jun 17, 2019

View reviewed changes

jeffrey-xiao requested a review from dt June 17, 2019 19:20

jeffrey-xiao force-pushed the import-use-manual-splitting branch 2 times, most recently from ea76964 to 776f510 Compare June 18, 2019 21:47

dt approved these changes Jun 19, 2019

View reviewed changes

nvanbenschoten reviewed Jun 19, 2019

View reviewed changes

jeffrey-xiao added 2 commits June 19, 2019 13:20

storage: add retry logic to AdminUnsplit

58bdfa4

This commit extracts the retry logic in AdminSplit into a helper function and uses for AdminUnsplit. Release note: None

jeffrey-xiao force-pushed the import-use-manual-splitting branch from 776f510 to b7e0c23 Compare June 19, 2019 17:20

jeffrey-xiao requested a review from nvanbenschoten June 19, 2019 18:15

nvanbenschoten approved these changes Jun 19, 2019

View reviewed changes

craig bot merged commit b7e0c23 into cockroachdb:master Jun 19, 2019

jeffrey-xiao deleted the import-use-manual-splitting branch June 19, 2019 19:27

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

jeffrey-xiao commented Jun 6, 2019 •

edited

Loading

cockroach-teamcity commented Jun 6, 2019

nvanbenschoten left a comment

dt left a comment

jeffrey-xiao left a comment

dt left a comment

nvanbenschoten left a comment

nvanbenschoten left a comment

jeffrey-xiao commented Jun 19, 2019

craig bot commented Jun 19, 2019

bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

Conversation

jeffrey-xiao commented Jun 6, 2019 • edited Loading

cockroach-teamcity commented Jun 6, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

jeffrey-xiao left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

jeffrey-xiao commented Jun 19, 2019

craig bot commented Jun 19, 2019

Build succeeded

jeffrey-xiao commented Jun 6, 2019 •

edited

Loading