Skip to content

Commit

Permalink
kvserver: rationalize proposal timestamp
Browse files Browse the repository at this point in the history
This patch reworks the ReplicatedEvalResult.WriteTimestamp field
(*).  Before this patch, WriteTimestamp was always coming from
ba.WriteTimestamp(), which is either a transaction's write timestamp or,
for non-txn requests, the batch's read timestamp or, for non-MVCC
requests, some random clock value. Below Raft, the field is used for
updating the followers' clocks, and also to check the request against
the GC threshold. This patch sets the WriteTimestamp differently for
IntentWrite requests than other requests:
- for regular writes, the field remains ba.WriteTimestamp()
- for other proposals, the field is a clock reading on the proposer

Some requests (e.g. LeaseTransfers) need a clock signal to travel with
their proposal, and now they get it (see cockroachdb#62569).

[*] An alternative to split the field into two was considered, but it's
hard to do now because of backwards compatibility. It can be done in the
next release, though, because now all the uses of the WriteTimestamp
field tolerate it being empty.

Fixes cockroachdb#62569

Release note: None
  • Loading branch information
andreimatei committed Apr 21, 2021
1 parent 3ee84bb commit 5334f33
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 7 deletions.
4 changes: 3 additions & 1 deletion pkg/kv/kvserver/kvserverpb/proposer_kv.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion pkg/kv/kvserver/kvserverpb/proposer_kv.proto
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,9 @@ message ReplicatedEvalResult {
bool is_lease_request = 6;
// The timestamp at which this command is writing. Used to verify the validity
// of the command against the GC threshold and to update the followers'
// clocks.
// clocks. If the request that produced this command is not an IntentWrite
// one, then the request's write timestamp is meaningless; for such request's,
// this field is simply a clock reading from the proposer.
util.hlc.Timestamp write_timestamp = 8 [(gogoproto.nullable) = false];
// The stats delta corresponding to the data in this WriteBatch. On
// a split, contains only the contributions to the left-hand side.
Expand Down
11 changes: 7 additions & 4 deletions pkg/kv/kvserver/replica_application_state_machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -460,10 +460,13 @@ func (b *replicaAppBatch) Stage(cmdI apply.Command) (apply.CheckedCommand, error
// we can only assert on the leaseholder, as only that replica has
// cmd.proposal.Request filled in.
if cmd.IsLocal() && cmd.proposal.Request.IsIntentWrite() {
wts := cmd.proposal.Request.WriteTimestamp()
if wts.LessEq(b.state.RaftClosedTimestamp) {
return nil, makeNonDeterministicFailure("writing at %s below closed ts: %s (%s)",
wts, b.state.RaftClosedTimestamp.String(), cmd.proposal.Request.String())
wts := cmd.raftCmd.ReplicatedEvalResult.WriteTimestamp
if !wts.IsEmpty() && wts.LessEq(b.state.RaftClosedTimestamp) {
wts := wts // Shadow variable that escapes to the heap.
return nil, wrapWithNonDeterministicFailure(
errors.AssertionFailedf("writing at %s below closed ts: %s (%s)",
wts, b.state.RaftClosedTimestamp.String(), cmd.proposal.Request),
"attempting to write below closed timestamp")
}
}
log.Event(ctx, "applying command")
Expand Down
10 changes: 9 additions & 1 deletion pkg/kv/kvserver/replica_proposal.go
Original file line number Diff line number Diff line change
Expand Up @@ -826,7 +826,15 @@ func (r *Replica) evaluateProposal(
// Set the proposal's replicated result, which contains metadata and
// side-effects that are to be replicated to all replicas.
res.Replicated.IsLeaseRequest = ba.IsLeaseRequest()
res.Replicated.WriteTimestamp = ba.WriteTimestamp()
if ba.IsIntentWrite() {
res.Replicated.WriteTimestamp = ba.WriteTimestamp()
} else {
// For misc requests, use WriteTimestamp to propagate a clock signal. This
// is particularly important for lease transfers, as it assures that the
// follower getting the lease will have a clock above the start time of
// its lease.
res.Replicated.WriteTimestamp = r.store.Clock().Now()
}
res.Replicated.Delta = ms.ToStatsDelta()

// This is the result of a migration. See the field for more details.
Expand Down

0 comments on commit 5334f33

Please sign in to comment.