Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: kvserver: fix write below closedts bug #63861

Merged
merged 6 commits into from
Apr 19, 2021

Conversation

andreimatei
Copy link
Contributor

Backport 6/6 commits from #63672.

/cc @cockroachdb/release


This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in #62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.

Make it clear that an else case is, in fact, an else case.

Release note: None
Clarifiy a <= vs < in a closedts comment.

Release note: None
Make this testing filter run after a closed timestamp is assigned to the
command to be filtered.
I had to slightly change a test because, now, the filter doesn't run if
the raft group is nil.

Release note: None
The TestServer/TestCluster have facilities for querying a range's lease.
This patch addresses two problems with this functionality:
1) If the node being queried has a lease proposal in flight, the
respective functions would return it at the detriment of the current
lease. This behavior is dubious, and indeed not what a test that I'm
writing wants. This patch adds a function that returns both the current
and the prospective next lease.
2) The functions let you ask for the state of a particular node to be
queried. This was buggy, though, because the LeaseInfoRequest could
be transparently forwarded to a different node in case the queried node
doesn't have a replica or has a learner replica. That's confusing as
hell. This patch optionally dissallows that, giving the caller
confidence that they got the state from the node it was interested in.

Release note: None
This patch fixes a bug in our closed timestamp management. This bug was
making it possible for a command to close a timestamp even though other
requests writing at lower timestamps are currently evaluating. The
problem was that we were assuming that, if a replica is proposing a new
lease, there can't be any requests in flight and every future write
evaluated on the range will wait for the new lease and the evaluate
above the lease start time. Based on that reasoning, the proposal buffer
was recording the lease start time as its assignedClosedTimestamp. This
was matching what it does for every write, where assignedClosedTimestamp
corresponds to the the closed timestamp carried by the command.

It turns out that the replica's reasoning was wrong. It is, in fact,
possible for writes to be evaluating on the range when the lease
acquisition is proposed. And these evaluations might be done at
timestamps below the would-be lease's start time. This happens when the
replica has already received a lease through a lease transfer. The
transfer must have applied after the previous lease expired and the
replica decided to start acquiring a new one.

This fixes one of the assertion failures seen in cockroachdb#62655.

Release note (bug fix): A bug leading to crashes with the message
"writing below closed ts" has been fixed.
@andreimatei andreimatei requested a review from a team as a code owner April 19, 2021 18:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei andreimatei merged commit 16e991c into cockroachdb:release-21.1 Apr 19, 2021
@andreimatei andreimatei deleted the backport21.1-63672 branch April 19, 2021 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants