Skip to content

Commit

Permalink
kvserver: add assertions for invariants around up liveness records
Browse files Browse the repository at this point in the history
Now that we have cockroachdb#53842, we maintain the invariant that there always
exists a liveness record for any given node. We can now simplify our
handling of liveness records internally: where previously we had code to
handle the possibility of empty liveness records (we created a new one
on the fly), we change them to assertions that verify that empty
liveness records are no longer flying around in the system.

When retrieving the liveness record from our in-memory cache,
it was possible for us to not find anything due to gossip delays.
Instead of simply giving up then, now we can read the records directly
from KV (and evebtually update our caches to store this newly read
record). This PR introduces this mechanism through usage of
`getLivenessRecordFromKV`. We should note that the existing cache
structure within NodeLiveness is a look-aside cache, and that's not
changed. It would further simplify things if it was a look-through cache
where the update happened while fetching any record and failing to find
it, but we defer that to future work. A TODO outlining this will be
introduced in a future commit.

A note for ease of review: one structural change introduced in this diff
is breaking down `ErrNoLivenessRecord` into `ErrMissingLivenessRecord` and
`errLivenessRecordCacheMiss`. The former will be used in a future commit
to generate better hints for users (it'll only ever surface when
attempting to decommission/recommission non-existent nodes). The latter
is used to represent cache misses. This too will be improved in a future
commit, where instead of returning a specific error on cache access,
we'll return a boolean instead.

---

We don't intend to backport this to 20.2 due to the hazard described in
\cockroachdb#54216. We want this PR to bake on master and (possibly) trip up the
assertions added above if we've missed anything. They're the only ones
checking for the invariant we've introduced around liveness records.
That invariant will be depended on for long running migrations, so
better to shake things out early.

Release note: None
  • Loading branch information
irfansharif committed Sep 29, 2020
1 parent 48c955d commit 6d7d530
Show file tree
Hide file tree
Showing 4 changed files with 140 additions and 143 deletions.
7 changes: 2 additions & 5 deletions pkg/kv/kvserver/helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -514,12 +514,9 @@ func (nl *NodeLiveness) SetDrainingInternal(
}

func (nl *NodeLiveness) SetDecommissioningInternal(
ctx context.Context,
nodeID roachpb.NodeID,
oldLivenessRec LivenessRecord,
targetStatus kvserverpb.MembershipStatus,
ctx context.Context, oldLivenessRec LivenessRecord, targetStatus kvserverpb.MembershipStatus,
) (changeCommitted bool, err error) {
return nl.setMembershipStatusInternal(ctx, nodeID, oldLivenessRec, targetStatus)
return nl.setMembershipStatusInternal(ctx, oldLivenessRec, targetStatus)
}

// GetCircuitBreaker returns the circuit breaker controlling
Expand Down
Loading

0 comments on commit 6d7d530

Please sign in to comment.