Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk: use expiration time to disable automatic merging in import, backfill, restore #38079

Merged
merged 2 commits into from
Jun 19, 2019

Conversation

jeffrey-xiao
Copy link
Contributor

@jeffrey-xiao jeffrey-xiao commented Jun 6, 2019

The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning
replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism.

All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large.

Fixes #37697.

@dt I don't have context on how long import jobs take, so an hour might be too conservative.

@jeffrey-xiao jeffrey-xiao requested review from a team June 6, 2019 21:31
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jeffrey-xiao jeffrey-xiao force-pushed the import-use-manual-splitting branch 3 times, most recently from 16086a3 to fa5c6ca Compare June 7, 2019 15:33
@jeffrey-xiao jeffrey-xiao requested a review from a team June 7, 2019 15:33
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here looks mostly complete, but it's unclear to me whether it is implementing the exact approach we want. There were a few ideas floated around in #37697, so I understand why there might be some confusion. I'm also a bit confused about what we agreed upon, so input from @dt would be appreciated.

Also, after this change, it looks like SchemaChanger.backfillIndexes will be the last remaining users of Gossip.DisableMerges (other than the pre-migration paths). It would be nice to make a similar change there in a different PR so that we can eventually go through and remove the code entirely.

The other big point I'd leave here is that there are a decent number of new conditions introduced in this PR. It's critical that this kind of stuff gets well commented when introduced or it becomes significantly more difficult to understand out of context when working on tangentially related changes. It's usually easy enough to parse what a block of code is doing, but in a large code base, it can become harder to understand why a block of code is doing what it's doing. For instance, why does condition X allow us to skip operation Y? Why would we ever want to perform operation Y? Leaving descriptive comments about this can save a lot of time in the future.

Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @jeffrey-xiao)


pkg/ccl/backupccl/restore.go, line 1079 at r2 (raw file):

	}

	// TODO(jeffreyxiao): Remove this check in 20.1.

Explain this in a comment. Why do we only disable merging if the cluster version is above this level?


pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

	}

	// Unsplit ranges after restoration is complete.

In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt

EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?


pkg/ccl/importccl/import_stmt.go, line 786 at r2 (raw file):

	}

	stickyBitEnabled := r.settings.Version.IsActive(cluster.VersionStickyBit)

Same thing here. Give these descriptive comments about why we're only doing this when this version is inactive.


pkg/sql/backfill.go, line 1101 at r2 (raw file):

		return err
	}
	// Unsplit ranges after backfill and validation are complete.

Same point as above. Is this still the approach we want?


pkg/storage/replica_command.go, line 384 at r1 (raw file):

) (roachpb.AdminUnsplitResponse, *roachpb.Error) {
	var reply roachpb.AdminUnsplitResponse
	var lastErr error

Instead of copying all of this from AdminSplit, can we pull out a utility function? At some point, we might need the same logic structure for AdminMerge as well.


pkg/storage/replica_command.go, line 469 at r1 (raw file):

		return txn.Run(ctx, b)
	})
	// The ConditionFailedError can occur because the descriptors acting as

We might as well mirror the same logic as that in adminSplitWithDescriptor and put this inside of a ; err != nil { block.

@jeffrey-xiao jeffrey-xiao force-pushed the import-use-manual-splitting branch 2 times, most recently from c6d16f9 to e88dbc8 Compare June 17, 2019 16:34
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jeffrey-xiao, and @nvanbenschoten)


pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt

EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?

Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).


pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):

		expirationTime := hlc.Timestamp{}
		if stickyBitEnabled {
			expirationTime = sp.db.Clock().Now().Add(time.Hour.Nanoseconds(), 0)

what harm does setting this do on older versions?


pkg/sql/backfill.go, line 1101 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same point as above. Is this still the approach we want?

actually, in this case, these are not even the right spans at all -- IIRC these are the PK spans that we read from in order to generate and write index spans -- the splits, we make (well, we don't yet but we should) would be in the index, not these, and we don't have a list of them (so it is like direct IMPORT).

Copy link
Contributor Author

@jeffrey-xiao jeffrey-xiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)


pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, dt (David Taylor) wrote…

Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).

So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?


pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):

Previously, dt (David Taylor) wrote…

what harm does setting this do on older versions?

On older versions stickyBitEnabled would be false, so the split is always considered for automatic merging if the merge queue is enabled. The behavior should be consistent in a mixed version cluster.

@jeffrey-xiao jeffrey-xiao requested a review from dt June 17, 2019 19:20
@jeffrey-xiao jeffrey-xiao force-pushed the import-use-manual-splitting branch 2 times, most recently from ea76964 to 776f510 Compare June 18, 2019 21:47
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bulk-io parts LGTM (leaving core parts to core reviewers)

Reviewed 3 of 5 files at r5, 3 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @jeffrey-xiao, and @nvanbenschoten)


pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):

Previously, jeffrey-xiao (Jeffrey Xiao) wrote…

So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?

yep, 1h is fine.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r3, 5 of 5 files at r5, 5 of 5 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffrey-xiao)


pkg/storage/replica_command.go, line 414 at r5 (raw file):

}

func (r *Replica) executeCommand(ctx context.Context, command func() error) *roachpb.Error {

This is nice, but I think we can improve it a little more. To start, I'd rename it to something like executeAdminCommandWithDescriptor. There are plenty of other commands that would never want to call this.

We can then give it a bit of justification in a comment. Why would anyone call this? What's the point of the retry loop? To answer that, we'll end up saying something about how the function is wrapping a read-modify-write operation for RangeDescriptors in a retry loop. Retry loops around these kinds of operations are pretty common because the write will often perform a CAS-like operation to avoid write skew. We see this all the time when dealing with atomics.

Once we talk about that, it may become evident that a more useful/understandable signature is something like:

func(ctx context.Context, updateDesc func(*roachpb.RangeDescriptor) error) *roachpb.Error {

This commit extracts the retry logic in AdminSplit into a helper
function and uses for AdminUnsplit.

Release note: None
…kfill, restore

The existing mechanism to prevent the merge queue from automatically
merging splits created in import, backfill, and restore was to gossip
the table keys that the merge queue should ignore when scanning
replicas. Now that there is support for specifying an expiration time at
a range level, we can use that instead of the gossip mechanism.

All splits created during backfill, restore, and import use an
expiration time of an hour. The rationale behind using an expiration
time rather than unsplitting the ranges at the end of the operation is
because adding an additional O(n) cost to bulk operations is not ideal
when n can be large.

Release note: None
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)

@jeffrey-xiao
Copy link
Contributor Author

TFTRs!

bors r+

craig bot pushed a commit that referenced this pull request Jun 19, 2019
38079: bulk: use expiration time to disable automatic merging in import, backfill, restore r=jeffrey-xiao a=jeffrey-xiao

The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning
replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism.

All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large.

Fixes #37697.

@dt I don't have context on how long import jobs take, so an hour might be too conservative.

38295: storage: fix flake in Test{Conditional,Init}PutUpdatesTSCacheOnError r=nvanbenschoten a=nvanbenschoten

Fixes #38256.

A request was slipping in between the manual clock update and the first
request the test sent with an unspecified timestamp. This commit fixes
the issue by explicitly specifying the timestamp.

Release note: None

Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jun 19, 2019

Build succeeded

@craig craig bot merged commit b7e0c23 into cockroachdb:master Jun 19, 2019
@jeffrey-xiao jeffrey-xiao deleted the import-use-manual-splitting branch June 19, 2019 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import: importing should use manual splitting with sticky bits
4 participants