-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk: use expiration time to disable automatic merging in import, backfill, restore #38079
bulk: use expiration time to disable automatic merging in import, backfill, restore #38079
Conversation
16086a3
to
fa5c6ca
Compare
fa5c6ca
to
3dae6e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here looks mostly complete, but it's unclear to me whether it is implementing the exact approach we want. There were a few ideas floated around in #37697, so I understand why there might be some confusion. I'm also a bit confused about what we agreed upon, so input from @dt would be appreciated.
Also, after this change, it looks like SchemaChanger.backfillIndexes
will be the last remaining users of Gossip.DisableMerges
(other than the pre-migration paths). It would be nice to make a similar change there in a different PR so that we can eventually go through and remove the code entirely.
The other big point I'd leave here is that there are a decent number of new conditions introduced in this PR. It's critical that this kind of stuff gets well commented when introduced or it becomes significantly more difficult to understand out of context when working on tangentially related changes. It's usually easy enough to parse what a block of code is doing, but in a large code base, it can become harder to understand why a block of code is doing what it's doing. For instance, why does condition X allow us to skip operation Y? Why would we ever want to perform operation Y? Leaving descriptive comments about this can save a lot of time in the future.
Reviewed 1 of 1 files at r1, 5 of 5 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @jeffrey-xiao)
pkg/ccl/backupccl/restore.go, line 1079 at r2 (raw file):
} // TODO(jeffreyxiao): Remove this check in 20.1.
Explain this in a comment. Why do we only disable merging if the cluster version is above this level?
pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):
} // Unsplit ranges after restoration is complete.
In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt
EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?
pkg/ccl/importccl/import_stmt.go, line 786 at r2 (raw file):
} stickyBitEnabled := r.settings.Version.IsActive(cluster.VersionStickyBit)
Same thing here. Give these descriptive comments about why we're only doing this when this version is inactive.
pkg/sql/backfill.go, line 1101 at r2 (raw file):
return err } // Unsplit ranges after backfill and validation are complete.
Same point as above. Is this still the approach we want?
pkg/storage/replica_command.go, line 384 at r1 (raw file):
) (roachpb.AdminUnsplitResponse, *roachpb.Error) { var reply roachpb.AdminUnsplitResponse var lastErr error
Instead of copying all of this from AdminSplit
, can we pull out a utility function? At some point, we might need the same logic structure for AdminMerge
as well.
pkg/storage/replica_command.go, line 469 at r1 (raw file):
return txn.Run(ctx, b) }) // The ConditionFailedError can occur because the descriptors acting as
We might as well mirror the same logic as that in adminSplitWithDescriptor
and put this inside of a ; err != nil {
block.
c6d16f9
to
e88dbc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jeffrey-xiao, and @nvanbenschoten)
pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
In #37697 (comment) it sounded like we were moving towards an approach where we would never manually unsplit the ranges after the restore. Instead, we would set a reasonable TTL on the split points to give the restore enough time to start importing into the split ranges and simply let the split points expire. Where did we land on that? What would be a reasonable TTL? cc. @dt
EDIT: I see below that we do something like this for IMPORT. Why is RESTORE different?
Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).
pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):
expirationTime := hlc.Timestamp{} if stickyBitEnabled { expirationTime = sp.db.Clock().Now().Add(time.Hour.Nanoseconds(), 0)
what harm does setting this do on older versions?
pkg/sql/backfill.go, line 1101 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same point as above. Is this still the approach we want?
actually, in this case, these are not even the right spans at all -- IIRC these are the PK spans that we read from in order to generate and write index spans -- the splits, we make (well, we don't yet but we should) would be in the index, not these, and we don't have a list of them (so it is like direct IMPORT).
e88dbc8
to
a03c883
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)
pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):
Previously, dt (David Taylor) wrote…
Yeah, I was also under the impression that TTL'ed splits would mean we didn't need an unsplit step (and that was a big reason I was in favor of TTL'ed splits -- I didn't want to add another O(n) step to bulk ops where n can be pretty big).
So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?
pkg/ccl/importccl/sst_writer_proc.go, line 155 at r4 (raw file):
Previously, dt (David Taylor) wrote…
what harm does setting this do on older versions?
On older versions stickyBitEnabled
would be false, so the split is always considered for automatic merging if the merge queue is enabled. The behavior should be consistent in a mixed version cluster.
ea76964
to
776f510
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulk-io parts LGTM (leaving core parts to core reviewers)
Reviewed 3 of 5 files at r5, 3 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @jeffrey-xiao, and @nvanbenschoten)
pkg/ccl/backupccl/restore.go, line 1255 at r2 (raw file):
Previously, jeffrey-xiao (Jeffrey Xiao) wrote…
So should we just a TTL on the splits and not unsplit them at the end of the operation? What should the TTL be?
yep, 1h is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r3, 5 of 5 files at r5, 5 of 5 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jeffrey-xiao)
pkg/storage/replica_command.go, line 414 at r5 (raw file):
} func (r *Replica) executeCommand(ctx context.Context, command func() error) *roachpb.Error {
This is nice, but I think we can improve it a little more. To start, I'd rename it to something like executeAdminCommandWithDescriptor
. There are plenty of other commands that would never want to call this.
We can then give it a bit of justification in a comment. Why would anyone call this? What's the point of the retry loop? To answer that, we'll end up saying something about how the function is wrapping a read-modify-write operation for RangeDescriptors in a retry loop. Retry loops around these kinds of operations are pretty common because the write will often perform a CAS-like operation to avoid write skew. We see this all the time when dealing with atomics.
Once we talk about that, it may become evident that a more useful/understandable signature is something like:
func(ctx context.Context, updateDesc func(*roachpb.RangeDescriptor) error) *roachpb.Error {
This commit extracts the retry logic in AdminSplit into a helper function and uses for AdminUnsplit. Release note: None
…kfill, restore The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism. All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large. Release note: None
776f510
to
b7e0c23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @nvanbenschoten)
TFTRs! bors r+ |
38079: bulk: use expiration time to disable automatic merging in import, backfill, restore r=jeffrey-xiao a=jeffrey-xiao The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism. All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large. Fixes #37697. @dt I don't have context on how long import jobs take, so an hour might be too conservative. 38295: storage: fix flake in Test{Conditional,Init}PutUpdatesTSCacheOnError r=nvanbenschoten a=nvanbenschoten Fixes #38256. A request was slipping in between the manual clock update and the first request the test sent with an unspecified timestamp. This commit fixes the issue by explicitly specifying the timestamp. Release note: None Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
The existing mechanism to prevent the merge queue from automatically merging splits created in import, backfill, and restore was to gossip the table keys that the merge queue should ignore when scanning
replicas. Now that there is support for specifying an expiration time at a range level, we can use that instead of the gossip mechanism.
All splits created during backfill, restore, and import use an expiration time of an hour. The rationale behind using an expiration time rather than unsplitting the ranges at the end of the operation is because adding an additional O(n) cost to bulk operations is not ideal when n can be large.
Fixes #37697.
@dt I don't have context on how long import jobs take, so an hour might be too conservative.