-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: disable observed timestamp captured before range merge from applying to merged range #73292
Comments
There may be a few different ways to solve this, and all have some drawbacks:
|
For this to work given the current lease transfer protection, we would need the LHS of the merge to both be:
Collocating the LHS to the RHS's node is necessary (as opposed to the other way around) because we'd be trying to piggyback the merge protection on top of the lease transfer protection, which is provided by the lease start time. The merge's LHS subsumes the RHS and retains its lease, so it's the LHS that we'd need to worry about. The first condition here is straightforward — it ensures that an observed timestamp served before the merge by the node after it holds the two leaseholders forms an upper bound with all local timestamps in the RHS range. The second condition is more subtle — it ensures that if the leaseholders were already collocated, then the lease transfer protection for the LHS inherits the lease transfer protection from the RHS. In other words, it ensures that an observed timestamp served before the merge and before the collocation by the node that later holds the two leaseholders cannot be assumed to form an upper bound with all local timestamps in the RHS range. Without this, we could see the following hazard:
This is roughly what I had in mind. If we remembered when the largest freeze timestamp of any merge into a range, we could use this as a lower bound for all local uncertainty limits later assigned by that range. This would be symetrical with our handling of new leases. Care would need to be taken to propagate this information to the RHS of a range split. This feels like the cleanest solution to me. It requires a small amount of extra persistent state, but it avoids coupling range merges to our leasing protocol.
The other drawback to this approach is that we would only pick up observed timestamps for a single range when visiting a node, instead of an observed timestamp for all current ranges on that node. So for instance, if we visited a node with r1 and r2 to read a key from r1, we wouldn't collect an observed timestamp which could be used to limit uncertainty if we later returned to read a key from r2. Hypothetically we could collect a separate observed timestamp from all ranges on a node when visiting it, but that would be very expensive.
I don't understand this solution. Do you mind spelling it out in more detail? |
Previously a lease could be transferred using AdminTransferLease to a different node, but a call to transfer back to the original node was a no-op. This is done to prepare the work for the fix to cockroachdb#73292. Specifically the fix there is to self-transfer the lease to restart the lease timestamp and get correct protection on merges. A few unit tests needed to be modified to make this work, as they were assuming that AdminTransferLease was idempotent, when in fact it was only lucky that they worked. Release note: None
Previously a lease could be transferred using AdminTransferLease to a different node, but a call to transfer back to the original node was a no-op. This is done to prepare the work for the fix to cockroachdb#73292. Specifically the fix there is to self-transfer the lease to restart the lease timestamp and get correct protection on merges. A few unit tests needed to be modified to make this work, as they were assuming that AdminTransferLease was idempotent, when in fact it was only lucky that they worked. Release note: None
85d4627 introduced logic to handle the where after a lease changes, observed timestamps captured on the incoming leaseholder's node cannot be used to ignore uncertainty for data written by the former leaseholder. This logic currently lives here:
cockroach/pkg/kv/kvserver/observedts/limit.go
Lines 27 to 40 in 7097a90
And is tested in
TestRangeLocalUncertaintyLimitAfterNewLease
.While working on some refactors to fix #58459 and eventually #36431, I noticed a similar case after a range merge, where observed timestamps from before the merge cannot be used to ignore uncertainty for data written by the former RHS range. A comment describing this was added to #73244:
We currently have no protection here, and so we are susceptible to a stale read in this case.
Epic CRDB-1514
Jira issue: CRDB-11522
The text was updated successfully, but these errors were encountered: