-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: introduce a Raft-based transport for closedts #59566
Conversation
No tests yet. |
29c1092
to
be71108
Compare
be71108
to
263c0c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! There's a lot here, but the new logic really isn't too complex. It's much easier to reason about than the old system.
Reviewed 34 of 34 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_application_result.go, line 235 at r7 (raw file):
log.VEventf(ctx, 2, "retry: proposalIllegalLeaseIndex") maxLeaseIndex, pErr := r.propose(ctx, p, tok.Move(ctx))
Do you think this has similar buffer aliasing issues to what we saw a few days ago?
pkg/kv/kvserver/replica_application_state_machine.go, line 431 at r7 (raw file):
} if log.V(4) { log.Infof(ctx, "processing command %x: maxLeaseIndex=%d closedts: %s idx: %d",
If you want to keep this. Move the cmd.ent.Index
before maxLeaseIndex
and name it raftIndex=%d
. Also, s/: /=/
.
pkg/kv/kvserver/replica_application_state_machine.go, line 440 at r7 (raw file):
// reject a command, all will. if !b.r.shouldApplyCommand(ctx, cmd, &b.state) { log.VEventf(ctx, 1, "applying command %x with forced error: %s", cmd.raftCmd.ClosedTimestamp, cmd.forcedErr)
Did you mean to keep this?
pkg/kv/kvserver/replica_application_state_machine.go, line 448 at r7 (raw file):
cmd.raftCmd.ClosedTimestamp.Reset() } else { // !!! Find a way to assert that this command is not writing below the replica's closed ts.
Is this addressed?
pkg/kv/kvserver/replica_application_state_machine.go, line 826 at r7 (raw file):
if cts := cmd.raftCmd.ClosedTimestamp; !cts.IsEmpty() { if cts.Less(b.state.ClosedTimestamp) { return wrapWithNonDeterministicFailure(errors.Errorf(
Any reason not to just fatal here? Is the new error handling needed?
pkg/kv/kvserver/replica_application_state_machine.go, line 891 at r7 (raw file):
r.mu.state.RaftAppliedIndex = b.state.RaftAppliedIndex r.mu.state.LeaseAppliedIndex = b.state.LeaseAppliedIndex closedTimestampUpdated := !r.mu.state.ClosedTimestamp.Equal(b.state.ClosedTimestamp)
Any reason not to just blindly write the update? If you need closedTimestampUpdated
for later, consider:
closedTimestampUpdated := r.mu.state.ClosedTimestamp.Forward(b.state.ClosedTimestamp)
pkg/kv/kvserver/replica_proposal.go, line 117 at r7 (raw file):
// booleans. Request *roachpb.BatchRequest // leaseStatus represents the lease under which the Request was evaluated and
nit: add an empty line above this comment.
pkg/kv/kvserver/replica_proposal.go, line 117 at r7 (raw file):
// booleans. Request *roachpb.BatchRequest // leaseStatus represents the lease under which the Request was evaluated and
What does this contain for lease requests. Consider spelling that out.
pkg/kv/kvserver/replica_proposal.go, line 171 at r7 (raw file):
} func (proposal *ProposalData) alreadyP() {
!!!
pkg/kv/kvserver/replica_proposal.go, line 414 at r7 (raw file):
r.concMgr.OnRangeLeaseUpdated(newLease.Sequence, iAmTheLeaseHolder) r.mu.proposalBuf.OnLeaseChangeLocked(iAmTheLeaseHolder, r.mu.state.ClosedTimestamp)
Give this a comment.
pkg/kv/kvserver/replica_proposal_buf.go, line 114 at r7 (raw file):
} // propBuf is a multi-producer, single-consumer buffer for Raft proposals. The
This comment could use some updates to indicate the new role that the proposal buffer plays.
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
arr propBufArray clock *hlc.Clock
Should we put accessors to the clock
and settings
on the proposer
interface so we don't need to plumb the dependencies into here?
pkg/kv/kvserver/replica_proposal_buf.go, line 148 at r7 (raw file):
// currently-evaluating requests. evalTracker tracker.Tracker // closedTSNanos is the largest "closed timestamp" - i.e. the largest
"closedTSNanos"
Also, should this be renamed to "assignedClosedTS" or something to indicate that this is not a real closed timestamp yet?
pkg/kv/kvserver/replica_proposal_buf.go, line 153 at r7 (raw file):
// any more. // // We only close physical timestamps, ignoring the logical part. If we say
Stale paragraph.
pkg/kv/kvserver/replica_proposal_buf.go, line 217 at r7 (raw file):
enqueueUpdateCheck() closeTimestampPolicy() roachpb.RangeClosedTimestampPolicy // raftAppliedClosedTimestamp returns the closed timestamp from the ReplicaState.
If we just need this to check if the timestamp is empty, I'd slim down the interface. Consider raftTransportClosedTimestampsInUse() bool
or something like that. No need to let future changes do something stupid.
pkg/kv/kvserver/replica_proposal_buf.go, line 271 at r7 (raw file):
will be assigned which is assigned here
Something's off here.
pkg/kv/kvserver/replica_proposal_buf.go, line 273 at r7 (raw file):
// closed_timestamp. max_lease_index will be assigned which is assigned here // when the command is sequenced in the buffer. closed_timestamp will be // assigned later. It is also expected that the byte slice has sufficient
Clarify when "later" is.
pkg/kv/kvserver/replica_proposal_buf.go, line 517 at r7 (raw file):
} now := b.clock.PhysicalNow()
Why PhysicalNow
? That seems wrong, doesn't it? Also, let's pull this logic out into some other method so that we can wrap it with a test.
pkg/kv/kvserver/replica_proposal_buf.go, line 620 at r7 (raw file):
// could, in principle, but we'd have to make a copy of the encoded command // as to not modify the copy that's already stored in the local replica's // command cache.
s/command cache/raft entry cache/
pkg/kv/kvserver/replica_proposal_buf.go, line 687 at r7 (raw file):
} // OnLeaseChange is called when a new lease is applied to this range. The closed
Move this above EvaluatingRequestsCount
.
pkg/kv/kvserver/replica_proposal_buf.go, line 688 at r7 (raw file):
// OnLeaseChange is called when a new lease is applied to this range. The closed // timestamp tracked by the propBuf is updated accordingly, such that the leases
Isn't this lease stuff wrong? That's not what we're passing anymore.
pkg/kv/kvserver/replica_proposal_buf.go, line 729 at r7 (raw file):
// aware of all the reads performed by the transferer. isBrandNewLeaseRequest := false isLeaseExtension := false
Do we need this variable?
pkg/kv/kvserver/replica_proposal_buf.go, line 736 at r7 (raw file):
// former is more up to date, having been modified by the evaluation. newLease := p.command.ReplicatedEvalResult.State.Lease isLeaseExtension = newLease.Sequence != leaseReq.PrevLease.Sequence
Isn't this backwards? Lease extensions have the same sequence number.
Also, should we use p.leaseStatus
in place of leaseReq.PrevLease
?
pkg/kv/kvserver/replica_proposal_buf.go, line 741 at r7 (raw file):
} isBrandNewLeaseRequest = true closedTSTarget = newLease.Start.ToTimestamp()
Make a note of why we completely overwrite the old closedTSTarget
. Something about how it's meaningless on a non-leaseholder.
pkg/kv/kvserver/replica_proposal_buf.go, line 746 at r7 (raw file):
lb := b.evalTracker.LowerBound(ctx) if !lb.IsEmpty() { closedTSTarget.Backward(lb.FloorPrev())
Make a not about why Prev
and also why FloorPrev
(to get rid of logical ticks).
pkg/kv/kvserver/replica_proposal_buf.go, line 760 at r7 (raw file):
closedTSTarget.Backward(p.leaseStatus.Expiration()) } b.closedTS.Forward(closedTSTarget)
Give this a comment.
pkg/kv/kvserver/replica_proposal_buf.go, line 817 at r7 (raw file):
// the propBuf. func (b *propBuf) EvaluatingRequestsCount() int { b.p.locker().Lock()
Can we use the rlocker
?
pkg/kv/kvserver/replica_proposal_buf.go, line 856 at r7 (raw file):
func (t *TrackedRequestToken) doneLocked(ctx context.Context) { if t.done { // t.b == nil means that Reset() was called on this token.
Does Reset still exist?
pkg/kv/kvserver/replica_proposal_buf.go, line 863 at r7 (raw file):
} t.done = true if t.b != nil {
Is this condition possible?
pkg/kv/kvserver/replica_proposal_buf.go, line 887 at r7 (raw file):
// returns the minimum timestamp at which this request must write. The tracked // request is identified by its tentative write timestamp. After calling this, // the caller must bump the write timestamp to at least the returned minTS.
Doesn't it need to bump to a time above the minTS? Or does this behave differently from the old tracker?
pkg/kv/kvserver/replica_proposal_buf.go, line 910 at r7 (raw file):
// TrackEvaluatingRequest. See comments over there for when this needs to be // called. func (b *propBuf) AbandonEvaluatingRequest(ctx context.Context, tok tracker.RemovalToken) {
Is this still in use?
pkg/kv/kvserver/replica_raft.go, line 120 at r7 (raw file):
} proposal.finishApplication(ctx, pr)
stray line.
pkg/kv/kvserver/replica_write.go, line 91 at r7 (raw file):
defer untrack(ctx, 0, 0, 0) // covers all error returns below writeTS := ba.WriteTimestamp()
nit: any reason to pull this all the way out here? Should we just pass ba.WriteTimestamp()
directly to TrackEvaluatingRequest
?
pkg/kv/kvserver/store_split.go, line 125 at r7 (raw file):
} rsl.SetClosedTimestamp(ctx, readWriter, closedTS)
Do we need error handling here? And a comment?
pkg/kv/kvserver/store_split.go, line 163 at r7 (raw file):
rightRepl.mu.Lock() rightRepl.mu.initialMaxClosed = initialMaxClosed // !!! initialize propBuf.closedTS here? Don't think so, splitPostApply() will do it through leasePostApply().
Yeah, I don't think so either.
pkg/kv/kvserver/testing_knobs.go, line 242 at r7 (raw file):
// worrying about Raft). AllowLeaseRequestProposalsWhenNotLeader bool // DontCloseTimestamps inhibits the propBuf's closing of timestamps.
So all entries will have an empty closed timestamp field? Maybe spell that out explicitly.
pkg/kv/kvserver/batcheval/cmd_subsume.go, line 159 at r7 (raw file):
} // !!! FreezeStart should be bumped to the closed timestamp
I think it should just go away. This was mostly a mistake. We can fix things to serve reads under the frozen range's closed timestamp, but this whole "serving reads under the range's freeze timestamp" business is not great. See this new TODO: https://github.com/cockroachdb/cockroach/pull/60521/files#diff-b080aecf7d979c0b5a4c43973db709d0d866f137cb4e961ebfc0d979ad22c003R296.
pkg/kv/kvserver/kvserverpb/proposer_kv.go, line 26 at r7 (raw file):
ClosedTimestamp: hlc.Timestamp{ WallTime: math.MaxInt64, Logical: math.MaxInt32,
If we promise never to include logical ticks in the closed timestamp, maybe this can go away and we can save a few bytes of unnecessary space in the pre-allocation buffer. What do you think?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 244 at r7 (raw file):
// well as that uproots whatever ordering was originally envisioned. // // This field is set through RaftCommandFooter hackery.
You call it "hackery", I call it "style".
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 270 at r7 (raw file):
map<string, string> trace_data = 16; // The closed timestamp carried by this command. Once a follower is told to
nit: consider moving this right below max_lease_index
, since there's a duality to these two fields.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 272 at r7 (raw file):
// The closed timestamp carried by this command. Once a follower is told to // apply this command, it knows that there will be no further writes at // timestamps <= closed_timestamp_nanos. Note that the command itself might
s/closed_timestamp_nanos/closed_timestamp/
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 276 at r7 (raw file):
// used after this command is applied. // // Note that this represents only the physical part of a timestamp; the
Stale paragraph.
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 302 at r7 (raw file):
// ClosedTimestampFooter is similar to RaftCommandFooter, allowing the proposal // buffer to fill in the max_lease_index field after most of the proto has been
"closed_timestamp file"
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 307 at r7 (raw file):
util.hlc.Timestamp closed_timestamp = 17 [(gogoproto.nullable) = false]; }
nit: strange extra lines at the bottom of this file.
pkg/kv/kvserver/stateloader/stateloader.go, line 171 at r7 (raw file):
} if state.UsingAppliedStateKey { rai, lai := state.RaftAppliedIndex, state.LeaseAppliedIndex
nit: follow this pattern, so:
rai, lai, ct := state.RaftAppliedIndex, state.LeaseAppliedIndex, state.ClosedTimestamp
if err := rsl.SetRangeAppliedState(ctx, readWriter, rai, lai, ms, ct); err != nil {
pkg/kv/kvserver/stateloader/stateloader.go, line 302 at r7 (raw file):
readWriter storage.ReadWriter, appliedIndex, leaseAppliedIndex uint64, closedTimestamp hlc.Timestamp,
Move this arg down to keep the four fields in the same order everywhere.
pkg/storage/enginepb/mvcc3.proto, line 208 at r7 (raw file):
// closed_timestamp is the largest timestamp that is known to have been // closed. This means that the current leaseholder (if any) and any future
"closed as of this applied index"
263c0c3
to
a8329ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the last two commits are new here. The rest are #59571, #59693 and #60573.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/kv/kvserver/replica_application_result.go, line 235 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think this has similar buffer aliasing issues to what we saw a few days ago?
I don't think so, because r.propose
allocates a new buffer and re-marshals everything. Right?
pkg/kv/kvserver/replica_application_state_machine.go, line 431 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If you want to keep this. Move the
cmd.ent.Index
beforemaxLeaseIndex
and name itraftIndex=%d
. Also,s/: /=/
.
done
pkg/kv/kvserver/replica_application_state_machine.go, line 440 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you mean to keep this?
removed
pkg/kv/kvserver/replica_application_state_machine.go, line 448 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this addressed?
it is now
pkg/kv/kvserver/replica_application_state_machine.go, line 826 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Any reason not to just fatal here? Is the new error handling needed?
I'm glad you think it's not. fataling
pkg/kv/kvserver/replica_application_state_machine.go, line 891 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Any reason not to just blindly write the update? If you need
closedTimestampUpdated
for later, consider:closedTimestampUpdated := r.mu.state.ClosedTimestamp.Forward(b.state.ClosedTimestamp)
done
pkg/kv/kvserver/replica_proposal.go, line 117 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: add an empty line above this comment.
done
pkg/kv/kvserver/replica_proposal.go, line 117 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What does this contain for lease requests. Consider spelling that out.
see now
pkg/kv/kvserver/replica_proposal.go, line 171 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
!!!
gone
pkg/kv/kvserver/replica_proposal.go, line 414 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment.
see now
pkg/kv/kvserver/replica_proposal_buf.go, line 114 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This comment could use some updates to indicate the new role that the proposal buffer plays.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we put accessors to the
clock
andsettings
on theproposer
interface so we don't need to plumb the dependencies into here?
I think binding them early is much better. These things are clearly node-wide, not per-range.
pkg/kv/kvserver/replica_proposal_buf.go, line 148 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"closedTSNanos"
Also, should this be renamed to "assignedClosedTS" or something to indicate that this is not a real closed timestamp yet?
renamed
Although it is a very real closed timestamp - arguably more real than the other one :)
pkg/kv/kvserver/replica_proposal_buf.go, line 153 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Stale paragraph.
removed
pkg/kv/kvserver/replica_proposal_buf.go, line 217 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we just need this to check if the timestamp is empty, I'd slim down the interface. Consider
raftTransportClosedTimestampsInUse() bool
or something like that. No need to let future changes do something stupid.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 271 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
will be assigned which is assigned here
Something's off here.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 273 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Clarify when "later" is.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 517 at r7 (raw file):
Why PhysicalNow? That seems wrong, doesn't it?
Why? You mean as opposed to clock.Now()
? We only care about the physical part cause that's what the policies operate on, right?
Also, let's pull this logic out into some other method so that we can wrap it with a test.
Done, and added a test.
pkg/kv/kvserver/replica_proposal_buf.go, line 620 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/command cache/raft entry cache/
done
pkg/kv/kvserver/replica_proposal_buf.go, line 687 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Move this above
EvaluatingRequestsCount
.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 688 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Isn't this lease stuff wrong? That's not what we're passing anymore.
see now
pkg/kv/kvserver/replica_proposal_buf.go, line 729 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we need this variable?
done
pkg/kv/kvserver/replica_proposal_buf.go, line 736 at r7 (raw file):
Isn't this backwards? Lease extensions have the same sequence number.
done
Also, should we use p.leaseStatus in place of leaseReq.PrevLease?
done, good idea
pkg/kv/kvserver/replica_proposal_buf.go, line 741 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make a note of why we completely overwrite the old
closedTSTarget
. Something about how it's meaningless on a non-leaseholder.
see now
pkg/kv/kvserver/replica_proposal_buf.go, line 746 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make a not about why
Prev
and also whyFloorPrev
(to get rid of logical ticks).
done
pkg/kv/kvserver/replica_proposal_buf.go, line 760 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Give this a comment.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 817 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use the
rlocker
?
yes, done
pkg/kv/kvserver/replica_proposal_buf.go, line 856 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does Reset still exist?
no, deleted
pkg/kv/kvserver/replica_proposal_buf.go, line 863 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this condition possible?
no, deleted
pkg/kv/kvserver/replica_proposal_buf.go, line 887 at r7 (raw file):
Doesn't it need to bump to a time above the minTS?
No, it needs to bump to minTS
, which already incorporates a Next()
call on the closed timestamp.
pkg/kv/kvserver/replica_proposal_buf.go, line 910 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this still in use?
no, deleted
pkg/kv/kvserver/replica_raft.go, line 120 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
stray line.
done
pkg/kv/kvserver/replica_write.go, line 91 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: any reason to pull this all the way out here? Should we just pass
ba.WriteTimestamp()
directly toTrackEvaluatingRequest
?
done
pkg/kv/kvserver/store_split.go, line 125 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we need error handling here? And a comment?
see now. Were you looking for more?
pkg/kv/kvserver/store_split.go, line 163 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yeah, I don't think so either.
done
pkg/kv/kvserver/testing_knobs.go, line 242 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
So all entries will have an empty closed timestamp field? Maybe spell that out explicitly.
done
pkg/kv/kvserver/batcheval/cmd_subsume.go, line 159 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think it should just go away. This was mostly a mistake. We can fix things to serve reads under the frozen range's closed timestamp, but this whole "serving reads under the range's freeze timestamp" business is not great. See this new TODO: https://github.com/cockroachdb/cockroach/pull/60521/files#diff-b080aecf7d979c0b5a4c43973db709d0d866f137cb4e961ebfc0d979ad22c003R296.
yeah
pkg/kv/kvserver/kvserverpb/proposer_kv.go, line 26 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we promise never to include logical ticks in the closed timestamp, maybe this can go away and we can save a few bytes of unnecessary space in the pre-allocation buffer. What do you think?
I'm not impressed by saving space in this buffer; I don't think that's a factor. The logical ticks do maybe cost some bandwidth and proto unmarshalling, so I'm not in love with them - that's why all the closed timestamp policy computations are done without logical ticks, and also why I do PrevFloor()
on what comes out of the tracker (and also cause I don't want to introduce ugly-looking logical=maxint timestamps). The only thing that can introduce closed ts with logical ticks is the lease start time. And I guess there it makes sense to keep them, to properly align the closed ts with the lease start. What do you think?
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 270 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider moving this right below
max_lease_index
, since there's a duality to these two fields.
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 272 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/closed_timestamp_nanos/closed_timestamp/
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 276 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Stale paragraph.
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 302 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"closed_timestamp file"
done
pkg/kv/kvserver/kvserverpb/proposer_kv.proto, line 307 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: strange extra lines at the bottom of this file.
done
pkg/kv/kvserver/stateloader/stateloader.go, line 171 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: follow this pattern, so:
rai, lai, ct := state.RaftAppliedIndex, state.LeaseAppliedIndex, state.ClosedTimestamp if err := rsl.SetRangeAppliedState(ctx, readWriter, rai, lai, ms, ct); err != nil {
done
pkg/kv/kvserver/stateloader/stateloader.go, line 302 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Move this arg down to keep the four fields in the same order everywhere.
done
pkg/storage/enginepb/mvcc3.proto, line 208 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"closed as of this applied index"
done
a8329ba
to
cd2c3ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting much closer!
Reviewed 7 of 10 files at r9, 5 of 8 files at r10, 15 of 17 files at r11, 2 of 8 files at r12, 35 of 35 files at r14, 11 of 11 files at r15.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_application_result.go, line 235 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't think so, because
r.propose
allocates a new buffer and re-marshals everything. Right?
Oh, you're right. I was thinking that this was calling into proposalBuf.Insert
.
pkg/kv/kvserver/replica_follower_read.go, line 153 at r15 (raw file):
defer r.mu.RUnlock() // TODO(andrei): Make sure that this synchronizes with the closed timestamps // side-transport once the side-transport is written.
👍
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think binding them early is much better. These things are clearly node-wide, not per-range.
If so, then let's pull these two fields right up before p proposer
.
pkg/kv/kvserver/replica_proposal_buf.go, line 217 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
done
Looks like you missed the first word in the comment.
pkg/kv/kvserver/replica_proposal_buf.go, line 517 at r7 (raw file):
Why? You mean as opposed to clock.Now()? We only care about the physical part cause that's what the policies operate on, right?
But PhysicalNow
doesn't take into consideration clock updates from other nodes. PhysicalNow
bypasses the entire HLC clock. If we know that a follower is leading our clock by 50ms, it would be better to account for that in this target.
EDIT: to clarify in case this is where the confusion is: hlc.PhysicalNow() != hlc.Now().WallTime
.
pkg/kv/kvserver/replica_proposal_buf.go, line 620 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
done
Try that one more time 😃
pkg/kv/kvserver/replica_proposal_buf.go, line 736 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Isn't this backwards? Lease extensions have the same sequence number.
done
Also, should we use p.leaseStatus in place of leaseReq.PrevLease?
done, good idea
I think you wanted oldLease := p.leaseStatus.Lease
pkg/kv/kvserver/replica_proposal_buf.go, line 887 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Doesn't it need to bump to a time above the minTS?
No, it needs to bump to
minTS
, which already incorporates aNext()
call on the closed timestamp.
Do we like that? On the surface, I do. But then it means that we have to bump to the timestamp returned from this method but then we have to bump above any timestamp returned from the timestamp cache. I think the reason it was the other way before was to maintain that symmetry - so that the closed timestamp was similar to an entry in the timestamp cache.
I guess the difficulty here is that we're putting .Next()
into the evalTracker
, right?
pkg/kv/kvserver/replica_proposal_buf.go, line 689 at r14 (raw file):
targetDuration := closedts.TargetDuration.Get(&b.settings.SV) closedTSTarget = hlc.Timestamp{WallTime: now - targetDuration.Nanoseconds()} case roachpb.LEAD_FOR_GLOBAL_READS:
Let's take this out for now so that we can solve the fallout of it independently from this PR. Maybe just comment it out and make the first case case roachpb.LAG_BY_CLUSTER_SETTING, roachpb.LEAD_FOR_GLOBAL_READS
.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 71 at r14 (raw file):
func (t testProposerRaft) Step(raftpb.Message) error { // TODO(andrei, nvanbenschoten): Capture the message and test against it.
What do you think about adding some tests that address this TODO by capturing MsgProps, decoding them, and asserting that their MinLeaseIndex and ClosedTimestamps are as expected? I think this is the kind of testing we're currently missing with closed timestamps in general.
pkg/kv/kvserver/replica_raftstorage.go, line 1010 at r14 (raw file):
// wholesale replace r.mu.state. r.mu.state = s // If we're the leaseholder, we might have learned that we got the lease
Isn't this handled by the call to leasePostApplyLocked
above?
pkg/kv/kvserver/store_split.go, line 125 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
see now. Were you looking for more?
No, looks good.
pkg/kv/kvserver/batcheval/cmd_subsume.go, line 164 at r15 (raw file):
closedTS := cArgs.EvalCtx.FrozenClosedTimestamp(ctx) if !closedTS.IsEmpty() { reply.ClosedTimestamp = closedTS
reply.ClosedTimestamp = cArgs.EvalCtx.FrozenClosedTimestamp(ctx)
pkg/kv/kvserver/kvserverpb/proposer_kv.go, line 26 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm not impressed by saving space in this buffer; I don't think that's a factor. The logical ticks do maybe cost some bandwidth and proto unmarshalling, so I'm not in love with them - that's why all the closed timestamp policy computations are done without logical ticks, and also why I do
PrevFloor()
on what comes out of the tracker (and also cause I don't want to introduce ugly-looking logical=maxint timestamps). The only thing that can introduce closed ts with logical ticks is the lease start time. And I guess there it makes sense to keep them, to properly align the closed ts with the lease start. What do you think?
SGTM. The lease start time is a good point.
pkg/kv/kvserver/stateloader/stateloader.go, line 489 at r14 (raw file):
} func (rsl StateLoader) SetClosedTimestamp(
I'm guessing you'll need a comment on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now based on #59571 and #60634. Removed the dependency on #60573.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If so, then let's pull these two fields right up before
p proposer
.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 217 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Looks like you missed the first word in the comment.
done
pkg/kv/kvserver/replica_proposal_buf.go, line 517 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why? You mean as opposed to clock.Now()? We only care about the physical part cause that's what the policies operate on, right?
But
PhysicalNow
doesn't take into consideration clock updates from other nodes.PhysicalNow
bypasses the entire HLC clock. If we know that a follower is leading our clock by 50ms, it would be better to account for that in this target.EDIT: to clarify in case this is where the confusion is:
hlc.PhysicalNow() != hlc.Now().WallTime
.
ah, indeed. done.
pkg/kv/kvserver/replica_proposal_buf.go, line 620 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Try that one more time 😃
was close tho
pkg/kv/kvserver/replica_proposal_buf.go, line 736 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think you wanted
oldLease := p.leaseStatus.Lease
done
pkg/kv/kvserver/replica_proposal_buf.go, line 887 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we like that? On the surface, I do. But then it means that we have to bump to the timestamp returned from this method but then we have to bump above any timestamp returned from the timestamp cache. I think the reason it was the other way before was to maintain that symmetry - so that the closed timestamp was similar to an entry in the timestamp cache.
I guess the difficulty here is that we're putting
.Next()
into theevalTracker
, right?
I think the old Tracker
was also returning the minimum ts at which the new request can evaluate, not the highest ts at which it cannot. See the Next()
call here, and also the comment on Track()
:
minProp := t.mu.next.Next() |
So it seems to me that the
Next()
in applyTimestampCache()
is unnecessary when it applies to minReadTS
: cockroach/pkg/kv/kvserver/replica_tscache.go
Line 298 in 616ce3c
nextRTS := rTS.Next() |
I don't care too much about the symmetry you're talking about (particularly since it doesn't look like it existed). I think it's natural for the timestamp cache to contain the read timestamp, just like the tracked contain the closed timestamps, and for callers to deal with the required Next()
at different levels. What does bother me, however, is the way in which we pass these closed timestamps into applyTimestampCache
, which kinda funges the reason for a push. I think we should keep the application of the tscache separate from the application of the closed ts.
pkg/kv/kvserver/replica_proposal_buf.go, line 689 at r14 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's take this out for now so that we can solve the fallout of it independently from this PR. Maybe just comment it out and make the first case
case roachpb.LAG_BY_CLUSTER_SETTING, roachpb.LEAD_FOR_GLOBAL_READS
.
done
pkg/kv/kvserver/replica_proposal_buf_test.go, line 71 at r14 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What do you think about adding some tests that address this TODO by capturing MsgProps, decoding them, and asserting that their MinLeaseIndex and ClosedTimestamps are as expected? I think this is the kind of testing we're currently missing with closed timestamps in general.
working on it
pkg/kv/kvserver/replica_raftstorage.go, line 1010 at r14 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Isn't this handled by the call to
leasePostApplyLocked
above?
rebased on #60634 to get rid of the need to do this
pkg/kv/kvserver/batcheval/cmd_subsume.go, line 164 at r15 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
reply.ClosedTimestamp = cArgs.EvalCtx.FrozenClosedTimestamp(ctx)
done
pkg/kv/kvserver/stateloader/stateloader.go, line 489 at r14 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm guessing you'll need a comment on this.
done
cd2c3ce
to
618b2b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod the additional testing in replica_proposal_buf_test.go
. I'll push through #59571 tomorrow to get it out of your way. Thanks for all the work you're putting into this.
Reviewed 1 of 4 files at r17, 15 of 17 files at r18, 31 of 33 files at r19, 11 of 11 files at r20.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
done
Sorry, I was unclear. The evalTracker
was in a fine location. I meant the clock
and the settings
, which are both injected dependencies.
pkg/kv/kvserver/replica_proposal_buf.go, line 689 at r14 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
done
👍 thanks.
618b2b6
to
b29dba2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some unit tests to the propBuf. PTAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/replica_proposal_buf.go, line 143 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Sorry, I was unclear. The
evalTracker
was in a fine location. I meant theclock
and thesettings
, which are both injected dependencies.
moved the settings over, but the evalTracker
also belongs here with the more static things. Particularly now since I've made Init()
take it in as a param (for the benefit of tests).
b29dba2
to
f768e85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r22, 15 of 17 files at r23, 32 of 34 files at r24, 11 of 11 files at r25.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/replica_proposal_buf_test.go, line 203 at r25 (raw file):
r, ok := ba.GetArg(roachpb.RequestLease) if ok { lease = &r.(*roachpb.RequestLeaseRequest).Lease
nit: I think the leaseStatus
for a RequestLeaseRequest will be different. It should be LeaseStatus{Lease: lease}
.
pkg/kv/kvserver/replica_proposal_buf_test.go, line 693 at r25 (raw file):
} func TestProposalBufferClosedTimestamp(t *testing.T) {
Very nice test! Mind giving it a short comment?
pkg/kv/kvserver/replica_proposal_buf_test.go, line 701 at r25 (raw file):
clock := hlc.NewClock(mc.UnixNano, time.Nanosecond) st := cluster.MakeTestingClusterSettings() var tracker mockTracker
nit: any reason not to pull this into the loop next to testProposer{ ... }
and then construct it with the lowerBound
directly?
f768e85
to
14b56a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @andreimatei, and @nvanbenschoten)
pkg/kv/kvserver/replica_proposal_buf_test.go, line 203 at r25 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: I think the
leaseStatus
for a RequestLeaseRequest will be different. It should beLeaseStatus{Lease: lease}
.
as discussed, pc.lease
is right here
pkg/kv/kvserver/replica_proposal_buf_test.go, line 693 at r25 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Very nice test! Mind giving it a short comment?
done; moved up a comment from inside the test
pkg/kv/kvserver/replica_proposal_buf_test.go, line 701 at r25 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: any reason not to pull this into the loop next to
testProposer{ ... }
and then construct it with thelowerBound
directly?
done
Discovered while investigating a test failure in cockroachdb#59566. In 278a21b, we shifted from talking about read and write requests to locking and non-locking requests when deciding whether a request could be served on a follower. This prevented locking scans and gets from being served on followers. However, it began letting lone HeartbeatTxn and EndTxn requests past the old `!IsReadOnly()` check. Luckily, these were still prevented from being served on followers because they are only sent in read-write transactions, which were also prevented from performing follower reads. Yesterday, in 0ac8ab9, we lifted this second limitation, allowing read-write transactions to perform follower reads for non-locking batches. However, this no longer prevented HeartbeatTxn and EndTxn requests from being routed and served on follower replicas. This resulted in a pretty disastrous situation where in very rare cases, a follower was proposing a write under a lease that it did not own. Luckily, new assertions added in cockroachdb#59566 caught this. This commit fixes this oversight be re-introducing "read-only" as a condition for serving follower reads. Release note: None
60758: sql: Added new tables and columns in pg_catalog r=rafiss a=RichardJCai Taking over #59498 since @mnovelodou is out and we want to get this in before stability. Previously, added tables and columns existed in postgres but not cockroach db This was inadequate because this may cause errors at tools that expects these columns to exists To address this, this patch contains the result of diff tool to add what is missing Release note (sql change): Added missing tables and columns at pg_catalog New columns in table pg_attribute: atthasmissing New columns in table pg_class: relrowsecurity relforcerowsecurity relispartition relispopulated relreplident relrewrite New columns in table pg_collation: collprovider collversion collisdeterministic New columns in table pg_index: indnkeyatts New columns in table pg_proc: prokind prosupport New columns in table pg_aggregate: aggmfinalmodify aggfinalmodify New Tables: pg_replication_origin_roname_index pg_config pg_largeobject_loid_pn_index pg_ts_config pg_operator_oid_index pg_conversion_oid_index pg_depend_depender_index pg_stat_user_functions pg_index_indrelid_index pg_group pg_db_role_setting_databaseid_rol_index pg_statio_user_tables pg_partitioned_table_partrelid_index pg_publication_oid_index pg_publication pg_publication_tables pg_stat_xact_user_tables pg_ts_config_map_index pg_replication_origin_roiident_index pg_stat_progress_vacuum pg_file_settings pg_statio_all_indexes pg_shseclabel_object_index pg_enum_oid_index pg_statistic_relid_att_inh_index pg_largeobject_metadata_oid_index pg_stat_archiver pg_statio_sys_sequences pg_collation_oid_index pg_subscription pg_namespace_oid_index pg_amop pg_stat_database_conflicts pg_tablespace_oid_index pg_class_oid_index pg_range_rngtypid_index pg_description_o_c_o_index pg_opclass_oid_index pg_sequence_seqrelid_index pg_trigger_oid_index pg_timezone_abbrevs pg_ts_parser_oid_index pg_transform pg_extension_oid_index pg_statio_user_sequences pg_shdepend_reference_index pg_foreign_data_wrapper_oid_index pg_ts_config_oid_index pg_ts_dict_oid_index pg_init_privs_o_c_o_index pg_user_mappings pg_default_acl_role_nsp_obj_index pg_stat_slru pg_constraint_contypid_index pg_stat_progress_create_index pg_transform_type_lang_index pg_authid_oid_index pg_shmem_allocations pg_statio_sys_tables pg_statio_all_sequences pg_policy_oid_index pg_shdepend_depender_index pg_attribute_relid_attnum_index pg_event_trigger_oid_index pg_amproc pg_cast_oid_index pg_constraint_conparentid_index pg_statio_sys_indexes pg_conversion_default_index pg_statistic_ext pg_shadow pg_trigger_tgconstraint_index pg_ts_template pg_cast_source_target_index pg_type_oid_index pg_amproc_fam_proc_index pg_aggregate_fnoid_index pg_stat_progress_basebackup pg_default_acl_oid_index pg_foreign_server_oid_index pg_statistic_ext_data_stxoid_index pg_transform_oid_index pg_language_oid_index pg_db_role_setting pg_amop_opr_fam_index pg_user_mapping_oid_index pg_hba_file_rules pg_am_oid_index pg_statio_all_tables pg_statio_user_indexes pg_stat_progress_analyze pg_replication_origin pg_depend_reference_index pg_stat_bgwriter pg_attrdef_adrelid_adnum_index pg_stat_sys_indexes pg_statistic_ext_oid_index pg_foreign_table_relid_index pg_user_mapping_user_server_index pg_seclabel_object_index pg_subscription_rel_srrelid_srsubid_index pg_publication_rel_prrelid_prpubid_index pg_inherits_parent_index pg_ts_dict pg_stat_user_tables pg_stat_progress_cluster pg_stat_xact_all_tables pg_stat_database pg_shdescription_o_c_index pg_publication_rel_oid_index pg_largeobject pg_publication_rel pg_rewrite_oid_index pg_stat_all_indexes pg_amop_fam_strat_index pg_proc_oid_index pg_database_oid_index pg_sequences pg_subscription_oid_index pg_enum_typid_sortorder_index pg_ts_parser pg_ts_config_map pg_attrdef_oid_index pg_timezone_names pg_ts_template_oid_index pg_statistic_ext_relid_index pg_index_indexrelid_index pg_amop_oid_index pg_amproc_oid_index pg_rules pg_opfamily pg_stat_xact_sys_tables pg_policies pg_constraint_oid_index pg_stat_sys_tables pg_stat_xact_user_functions pg_available_extension_versions pg_stat_all_tables pg_auth_members_role_member_index pg_auth_members_member_role_index pg_inherits_relid_seqno_index pg_opfamily_oid_index pg_class_tblspc_relfilenode_index pg_cursors pg_stat_gssapi pg_stat_ssl pg_stat_user_indexes Fixes #58001 60765: kv: don't serve non-locking, read-write requests on followers r=nvanbenschoten a=nvanbenschoten Discovered while investigating a test failure in #59566. In 278a21b, we shifted from talking about read and write requests to locking and non-locking requests when deciding whether a request could be served on a follower. This prevented locking scans and gets from being served on followers. However, it began letting lone HeartbeatTxn and EndTxn requests past the old `!IsReadOnly()` check. Luckily, these were still prevented from being served on followers because they are only sent in read-write transactions, which were also prevented from performing follower reads. Yesterday, in 0ac8ab9, we lifted this second limitation, allowing read-write transactions to perform follower reads for non-locking batches. However, this no longer prevented HeartbeatTxn and EndTxn requests from being routed and served on follower replicas. This resulted in a pretty disastrous situation where in very rare cases, a follower was proposing a write under a lease that it did not own. Luckily, new assertions added in #59566 caught this. This commit fixes this oversight be re-introducing "read-only" as a condition for serving follower reads. Release note: None 60781: cluster: use WaitConditionNextExit r=rickystewart a=tbg There is a chance that this will address an issue that causes spurious failures on CI, where it looks like we're janking the file system out from under a running process. For context, see: #58955 Release note: None Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
14b56a2
to
1bc95ef
Compare
Looks like you need to |
doing it
…On Fri, Feb 19, 2021 at 7:16 PM Nathan VanBenschoten < ***@***.***> wrote:
Looks like you need to make generate after the recent rebase.
pkg/clusterversion/key_string.go is out of date.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#59566 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC4C4N4ZSWLZCOPLG4SM3LS735NRANCNFSM4WYAWVAA>
.
|
1bc95ef
to
3456954
Compare
This patch introduces a replacement for the existing closed timestamp mechanism / transport. The new mechanism is gated by a cluster version. Raft commands now carry increasing closed timestamps generated by the propBuf by using the recent request Tracker for synchronizing with in-flight requests (i.e. not closing timestamps below them). Raft commands get a closed ts field, and the range state gets the field as well. The propBuf pays attention to the range's closed timestamp policy for deciding whether to close lagging or leading timestamps. Fixes cockroachdb#57395, cockroachdb#57396 Touches cockroachdb#57405 Release note: None
This patch deals with what happens to the RHS's timestamp cache on a merge. Before this patch, we were either not touching the cache at all, when the leases of the LHS and RHS were collocated at merge time, or we were bumping the RHS's ts cache up to the freeze point otherwise (because, in this case, the RHS's ts cache info has been lost). This patch goes further: now we'll bump the RHS ts cache up to the RHS closed timestamp on the argument the the RHS's closed timestamp is lost. This patch is needed by the effort to move closed timestamp to be per-range instead of per-store, and also to have future-time closed timestamps. Today, the new ts cache bump is not necessary for a fairly subtle reason: if the pre-merge leases are collocated,, then the closed ts of the RHS is not "lost" because it's the same as the one of the LHS. If the leases are not collocated, the freeze time of the RHS is certainly above its closed ts. So, in either case, the current code doesn't lead to the possibility of accepting write post-merge that invalidate previous follower reads. The RHS' closed timestamp is plumbed from the freeze to the merge through subsume response. Release note: None
3456954
to
8738d72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
Build succeeded: |
This patch introduces a replacement for the existing closed timestamp
mechanism / transport. The new mechanism is gated by a cluster version.
Raft commands now carry increasing closed timestamps generated by the
propBuf by using the recent request Tracker for synchronizing with
in-flight requests (i.e. not closing timestamps below them).
Raft commands get a closed ts field, and the range state gets the field
as well.
The propBuf pays attention to the range's closed timestamp policy for
deciding whether to close lagging or leading timestamps.
Release note: None