-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: rationalize server-side refreshes and fix bugs #42969
storage: rationalize server-side refreshes and fix bugs #42969
Conversation
First commit is #42965 |
1a160ee
to
13319ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1, 9 of 9 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
13319ba
to
41ef739
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 5 files at r1, 8 of 9 files at r2, 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):
b.CPut("a", "cput", strToValue("orig")) b.Put("c", "put") err := txn.CommitInBatch(ctx, b)
What's up with this?
pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):
return txn.CommitInBatch(ctx, b) }, // Expect a TxcCoordSender retry, which should succeed.
TxnCoordSender
pkg/roachpb/api.proto, line 1806 at r3 (raw file):
// nowhere to defer the error to). Similarly, it's pointless to set the flag // on batches containing an EndTransaction - although we do allow it and it // effects sub-batches split off by the DistSender.
Why do we do it if it's pointless? I would say either less here or more.
pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):
// If there was an EndTransaction in the batch that finalized the transaction, // the WriteTooOld status has been fully processed and we can discard the error. if baHeader.Txn != nil && baHeader.Txn.Status.IsFinalizedOrStaging() {
Are there any tests that hit the OrStaging
part of this condition?
pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):
// retry (i.e. there were no refresh spans and the commit timestamp // wasn't leaked). if baHeader.Txn.Status == roachpb.COMMITTED {
Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?
pkg/storage/stores.go, line 175 at r3 (raw file):
log.Fatal(ctx, "batch request missing store ID") } if ba.Header.DeferWriteTooOldError && ba.Txn == nil {
This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a BatchRequest.Validate
method instead? That can also check the conditions above.
41ef739
to
cf28026
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @bdarnell, and @nvanbenschoten)
pkg/kv/dist_sender_server_test.go, line 2603 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's up with this?
done
pkg/kv/dist_sender_server_test.go, line 2642 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
TxnCoordSender
done
pkg/roachpb/api.proto, line 1806 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why do we do it if it's pointless? I would say either less here or more.
See now. Also keep in mind that we don't really "do it", since this flag is never set :P
pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are there any tests that hit the
OrStaging
part of this condition?
I'll add a test
pkg/storage/replica_evaluate.go, line 387 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why do we need this? What fails without it? Should we also have it for transactions with a STAGING status?
prepended a commit removing this. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 11 files at r4, 9 of 9 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/roachpb/api.proto, line 1804 at r5 (raw file):
// // Non-transactional requests are not allowed to set this flag (since there's // nowhere to defer the error to). At the replica level, this flag is ignore
s/ignore/ignored/
pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'll add a test
👀
pkg/storage/replica_write.go, line 437 at r5 (raw file):
break } log.Infof(ctx, "!!! local retries updating ba.Timestamp %s->%s", ba.Timestamp, wtoErr.ActualTimestamp)
!!!
pkg/storage/stores.go, line 175 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is a pretty high-level code-path to be dropping such a specific condition into. Should we add a
BatchRequest.Validate
method instead? That can also check the conditions above.
I still think we should do something like this.
cf28026
to
65c5e33
Compare
65c5e33
to
6ce1e52
Compare
I've reworked this, PTAL. One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)
pkg/roachpb/api.proto, line 1804 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/ignore/ignored/
done
pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
👀
👀 now
pkg/storage/replica_write.go, line 437 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
!!!
done
pkg/storage/stores.go, line 175 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I still think we should do something like this.
my silence merely meant that I had not gotten to it. Done.
6ce1e52
to
1d27ede
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm unsure about is what the story around non-transactional batches and timestamp bumps is. What's supposed to happen, for example, to a non-txn batch consisting of a Put+Scan, where the Put is pushed by the timestamp cache? The DistSender will split the two, and then the Scan will evaluate at... which timestamp?
I can now answer my question: a write running into the ts cache does not affect the timestamp of "non-write" requests from the same batch because all the non-writes have been split off by the DistSender and their sub-batches are not updated to the bumped timestamp. CPuts
are affected by the timestamp bump - so it's possible for a non-transactional batch to read some data at ts0 but execute a CPut at ts1. Seems funky, but we probably don't care about non-txn batches too much.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @nvanbenschoten)
65f0a6a
to
ae654e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r6, 13 of 13 files at r7, 15 of 15 files at r8.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):
}, retryable: func(ctx context.Context, txn *client.Txn) error { if err := txn.Put(ctx, "aother", "another put"); err != nil {
s/aother/another/
pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):
b.Put("a", "final value") return txn.CommitInBatch(ctx, b) },
Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.
pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):
br, pErr, largestRefreshTS := sr.sendLockedWithRefreshAttempts(ctx, ba, maxAttempts) if pErr != nil { // The server sometimes "performs refreshes" - it updates the transaction's
We only need to do this on the error path?
pkg/roachpb/batch.go, line 600 at r7 (raw file):
} // ValidateForEvaluation performs sanity checks on the batch.
Note where this validation is meant to take place. E.g. this is server-side validation.
pkg/storage/replica_evaluate.go, line 369 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
👀 now
👀'd. Very nice
pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):
// EndTransaction that decided that it can essentially refresh to something // higher than baHeader.Timestamp because there were no refresh spans. br.Timestamp.Forward(baHeader.Txn.ReadTimestamp)
minor nit: I'd s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/
. You just set br.Txn
so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious that baHeader.Txn
was being mutated along the way.
pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):
// If this write request also implies a read, it morally needs to be // "refreshed".
it morally needs to be "refreshed" if it hit a WriteTooOldError.
Just so we don't cause any confusion with the fact that !roachpb.NeedsRefresh(args)
for these requests.
pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):
var pd result.Result cArgs := batcheval.CommandArgs{
Did you mean to keep this diff?
pkg/storage/replica_write.go, line 399 at r7 (raw file):
br, res, pErr = evaluateBatch(ctx, idKey, batch, rec, ms, ba, false /* readOnly */) if pErr == nil {
nit: if pErr == nil && opLogger != nil {
pkg/storage/replica_write.go, line 416 at r7 (raw file):
} func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool {
I'd document this a bit. Also, make it clear that it can mutate ba
.
pkg/storage/replica_write.go, line 417 at r7 (raw file):
func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool { if pErr == nil {
It seems like there's some redundancy between checking pErr
here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. Taking pErr
into account here implies the latter.
ae654e9
to
c2ffe71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/dist_sender_server_test.go, line 2457 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/aother/another/
done
pkg/kv/dist_sender_server_test.go, line 2463 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add some comment here about why the parallel commit was allowed even though the timestamp was forwarded.
added more words above
pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We only need to do this on the error path?
Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.
pkg/roachpb/batch.go, line 600 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Note where this validation is meant to take place. E.g. this is server-side validation.
done
pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
minor nit: I'd
s/baHeader.Txn.ReadTimestamp/br.Txn.ReadTimestamp/
. You just setbr.Txn
so that doesn't change any logic, but it seems a little closer to what you semantically mean here. It's not otherwise obvious thatbaHeader.Txn
was being mutated along the way.
done
pkg/storage/replica_evaluate.go, line 335 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
it morally needs to be "refreshed" if it hit a WriteTooOldError.
Just so we don't cause any confusion with the fact that
!roachpb.NeedsRefresh(args)
for these requests.
you meant add more words to the comment, right? Done.
pkg/storage/replica_evaluate.go, line 446 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you mean to keep this diff?
no. gone.
pkg/storage/replica_write.go, line 399 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
if pErr == nil && opLogger != nil {
done
pkg/storage/replica_write.go, line 416 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'd document this a bit. Also, make it clear that it can mutate
ba
.
done
pkg/storage/replica_write.go, line 417 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It seems like there's some redundancy between checking
pErr
here and a few lines above. Also, is this method trying to determine whether we "can" do a server-side retry or whether we "should" do a server-side retry. TakingpErr
into account here implies the latter.
Removed the check for pErr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 13 files at r10, 15 of 15 files at r11.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/txn_interceptor_span_refresher.go, line 168 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Yes. We could do it on success too if you prefer, at the expense of more lines of code.
As I'm trying to explain in the comment, the server does these refreshes only on commits. If the commit succeeded, there's nothing left for the client to do, so the state updates don't matter.
If the commit fails, the state does matter but only because of a pretty weak reason. That's why I wanted to keep the code changes to a minimum.
Ack.
pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):
// ReadTimestamp when the client said that there's nothing to refresh. In // these cases, we need to update our refreshedTimestamp too. This is pretty // inconsequential: the server only does this on an EndTransaction. If the
s/EndTransaction/EndTxn/
Do you mind searching this entire diff for that change?
c2ffe71
to
8b13e15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/kv/txn_interceptor_span_refresher.go, line 171 at r11 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/EndTransaction/EndTxn/
Do you mind searching this entire diff for that change?
done
8b13e15
to
0851180
Compare
I've reworked the patch again to take EndTxn out of the bumping timestamps business, which takes the patch to a new local maximum. See updated commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r12 is https://reviewable.io/reviews/cockroachdb/cockroach/43791
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 14 files at r14, 17 of 17 files at r15, 2 of 2 files at r16.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei)
pkg/roachpb/batch.go, line 490 at r16 (raw file):
// can sometimes have their read timestamp bumped on the server, which doesn't // work for read requests due to have the timestamp-aware latching works (i.e. a // read that acquired a latch @ ts10 can't simply be bumped to tsrequests 20
"tsrequests"
pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
done
Did we regress on this?
pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):
if baHeader.Txn != nil { baHeader.Txn.WriteTimestamp.Forward(tErr.ActualTimestamp) log.VEventf(ctx, 2, "setting WTO. idx: %d, req: %s", index, args)
Are you meaning to keep this? If so, we should make it more readable.
pkg/storage/replica_write.go, line 260 at r16 (raw file):
// timestamp, let's evaluate the batch at the bumped timestamp. This will // allow it commit, and also it'll allow us to attempt the 1PC code path. maybeBumpReadTimestampToWriteTimestamp(ctx, ba)
What do you think about merging some of this logic with maybeStripInFlightWrites
and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.
pkg/storage/replica_write.go, line 300 at r16 (raw file):
synthesizeEndTxnResponse := func() (_res onePCResult) { // Handle the case of a required one phase commit transaction. defer func() {
Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?
pkg/storage/replica_write.go, line 332 at r16 (raw file):
} // 1PC execution was successful, let's synthesize a EndTxnResponse.
s/a/an/
pkg/storage/replica_write.go, line 481 at r16 (raw file):
func canDoServersideRetry(ctx context.Context, pErr *roachpb.Error, ba *roachpb.BatchRequest) bool { if pErr == nil {
I thought we had removed this.
pkg/storage/replica_write.go, line 526 at r16 (raw file):
// Note that this, like all the server-side bumping of the read timestamp, only // works for batches that exclusively contain writes; reads cannot be bumped // liked this because they've already acquired timestamp-aware latches.
"bumped liked this"
pkg/storage/replica_write.go, line 548 at r16 (raw file):
"ba.Txn.WriteTimestamp: %s", ts, txn.ReadTimestamp, txn.WriteTimestamp) } txn.ReadTimestamp = ts
This should be copy-on-write, right?
8089a94
to
7fa9bd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to prepend r17.
Also please take a look at the deleted tests in cmd_end_transaction_test.go and tell me if you feel like something else should have been done about them - EndTxn
itself no longer looks at the CanCommitAtHigherTimestamp
flag, and so all those tests now return retriable errors that are swallowed by the replica retry loop.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/roachpb/batch.go, line 490 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"tsrequests"
done
pkg/storage/replica_evaluate.go, line 368 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did we regress on this?
i guess so, I don't know what happened. Fixed.
pkg/storage/replica_evaluate.go, line 312 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are you meaning to keep this? If so, we should make it more readable.
removed
pkg/storage/replica_write.go, line 260 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What do you think about merging some of this logic with
maybeStripInFlightWrites
and creating a more general server-side request update path? The reason for all of these kinds of updates is that we need to wait until we're routed to a Replica before we're able to make certain decisions. It seems like that kind of logic should then be centralized.
As discussed, they unfortunately can't be called together, but I've moved them both to a new, dedicated, file.
pkg/storage/replica_write.go, line 300 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why does this need to be a defer-ed function? This control flow seems much more difficult to understand than before. What do we gain from it?
ok see now
pkg/storage/replica_write.go, line 332 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/a/an/
done
pkg/storage/replica_write.go, line 481 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I thought we had removed this.
Done. Not sure what happened, but I went through and re-incorporated a couple of small changes from the previous review.
pkg/storage/replica_write.go, line 526 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"bumped liked this"
done
pkg/storage/replica_write.go, line 548 at r16 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This should be copy-on-write, right?
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 27 files at r17, 7 of 14 files at r19, 1 of 4 files at r20, 19 of 19 files at r21, 2 of 2 files at r22.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):
// This files contains functions performing updates to a BatchRequest performed // on the server-side, specifically after the request has been routed to a // replica.
👍
Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?
pkg/storage/replica_test.go, line 505 at r22 (raw file):
{ru: txnReqsNoRefresh, isTxn: true, exp1PC: true}, {ru: txnReqsNoRefresh, isTxn: true, isTSOff: true, exp1PC: true}, // !!!
!!!
pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):
commit: true, inFlightWrites: writes, noRefreshSpans: true,
Let's remove noRefreshSpans
if no test case sets it to true anymore and it's not considered during EndTxn evaluation.
d585ecc
to
7042c27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I've prepended a commit fixing txn.Update()
. PTAL
Everything else is unchanged, modulo the minor edits addressing your last comments.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/storage/replica_batch_updates.go, line 26 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
👍
Could you make a note about both why these updates exist (e.g. they allow the batch to be updated based on range split boundaries) and also make a note that all of these methods should be copy-on-write?
done
pkg/storage/replica_test.go, line 505 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
!!!
done
pkg/storage/batcheval/cmd_end_transaction_test.go, line 304 at r22 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's remove
noRefreshSpans
if no test case sets it to true anymore and it's not considered during EndTxn evaluation.
done
Updates to the WriteTooOld field were not commutative. This patch fixes that, by clarifying that the transaction with the higher ReadTimestamp gets to dictate the WriteTooOld value. I'm not sure what consequences this used to have, besides allowing for the confusing case where the server would receive a request with the WriteTooOld flag set, but with the ReadTimestamp==WriteTimestamp. A future commit introduces a sanity assertion that all the requests with the WTO flag have a bumped WriteTimestamp. Release note: None
This test was creating batches with different flags set. The combination of the WriteTooOld flag being set and the write timestamp not being bumped in relation to the read timestamp does not make sense. All such tests are removed, and the test spec is improved. A future commit introduces an assertion that such requests are indeed not received by servers. Release note: None
This patch fixes an incorrect comment on br.Timestamp that mistakenly indicated that the field is only relevant for non-transactional requests. It also makes the management of br.Timestamp in evaluateBatch() - it used to be pretty unscrutable. Release note: None
Release note: None
This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None
Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes cockroachdb#42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed.
Explain that it's about the quality of the serializability error that is produced. Release note: None
7042c27
to
5e5b0c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 30 files at r23, 7 of 14 files at r26, 1 of 5 files at r27, 19 of 19 files at r28, 2 of 2 files at r29.
Reviewable status:complete! 1 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained
42969: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. 43915: cli: warn if trying to [rd]ecommision when already [rd]ecommissioned r=tbg a=knz Fixes #36624. First commit from #43908 Release note (cli change): The CLI commands `cockroach node decommission` and `cockroach node recommission` now produce a warning on the standard error if one of the node(s) specified is already (d/r)ecommissioned. 43989: colexec: fix an issue with builtin operators and a minor cleanup r=yuzefovich a=yuzefovich Flat bytes relies on coldata.Batch.SetLength call to maintain its invariant. We assume that it is always called before return the batch in which Bytes vector might have been modified. This was not the case for default builtin and substring operators, and the calls were added. Additionally, to be safe, similar calls have been in added in projection operators. In a few places where we were setting the length of an internal batch to 0 and then returning it, those were replaced with returning coldata.ZeroBatch. Fixes: #43656. Release note: None 44004: githooks: accept release note category 'security update' r=knz a=knz Forgot this in #43869 44008: re-enable: roachprod: Make multiple set [provider]-zones always geo-distribute nodes r=jlinder a=jlinder This re-enables commit d24e40e which was reverted in commit 63279f9. It was reverted due to roachtest automatically passing in a list of zones but only wanting the first zone to be used (#43898) which was fixed in f68c6d5 . Before: if multiple zones were set for a provider and --geo wasn't set, all hosts would be started in just one zone in one region. Why change? Because if multiple zones are set, the intention is that they be used. Now, --geo and --[provider]-zones work as follows for gcloud, aws and azure: 1. when geo and zones are not set, nodes are all placed in one of the default zones 2. when geo is set but zones aren't, nodes are spread evenly across the default zones 3. when zones are set, nodes are spread evenly across the specified zones Fixes #38542. Release note: None 44016: builtins: miscellaneous fixes for the to_hex builtin r=mjibson a=otan Resolves #41707. Release note (sql change, bug fix): * Added to_hex(string) -> string functionality. * Previously, `to_hex(-1)` would return `-1` instead of the negative hex representation (`FFFFFFFFFFFFFFFF`). This has been rectified in this PR. 44023: roachtest: bump minimum version of the sqlsmith roachtest r=yuzefovich a=rohany Fixes #43995. This PR bumps the minimum version of the SQLSmith roachtest to be v20.1.0. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: James H. Linder <jamesl@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Build succeeded |
44350: storage: revert "storage: rationalize server-side refreshes and fix bugs" r=andreimatei a=andreimatei This reverts commit 1edb0d5. Revert the main commit from #42969 (storage: rationalize server-side refreshes and fix bugs). It seems to be causing consistency problems, as seen in #43110. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
44381: storage: revert remaining bits of 42969 r=andreimatei a=andreimatei Revert everything from #42969 that wasn't reverted by #44350. Even after the partial revert in #44350, it appears the bug causing TPCC to fail (#43110) is still there. Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner This PR adds a --max-concurrency flag to roachprod and defaults its value to 32. In large clusters doing many concurrent SSH operations can lead to unexpected behavior where the command fails to communicate with the SSH agent and leads to the user being prompted for their private key passphrase. Adding a limit prevents this behavior when interacting with a cluster of 256 nodes. Release note: None 44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately. --------- Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.
One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.
Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.
The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.
The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...
Fixes #42849
Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.