-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: rationalize server-side refreshes and fix bugs #44507
Conversation
3a1a5da
to
9c96479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hold off on this review. I've got some more tweaks I want to do.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
9c96479
to
c7c1a7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. Compared to the previous time you've seen this, there's now a small preamble refactor commit (the first commit), and in the main commit I've improved some stuff around server-side refreshes of 1PC attempts (we no longer needlessly attempt them when CanCommitAtHigherTimestamp
is not set. Also, not deferring WTOE on IsReadAndWrite()
is back; the previous iteration had removed it based on the fact that we don't defer anything anyway because the ba.DeferWriteTooOldError
flag is never set, but... I've put it back.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
c7c1a7e
to
cf94349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 14 of 14 files at r3, 5 of 5 files at r4, 19 of 19 files at r5, 2 of 2 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):
}, retryable: func(ctx context.Context, txn *client.Txn) error { if err := txn.Put(ctx, "aother", "another put"); err != nil {
s/aother/another/
pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):
Consider adding:
// We expect the request to succeed after a server-side retry. txnCoordRetry: false,
pkg/roachpb/batch.go, line 579 at r3 (raw file):
if et, ok := req.(*EndTxnRequest); ok { h := req.Header() str = append(str, fmt.Sprintf("%s(commit:%t) [%s] tsflex:%t",
nit: did you consider putting this in the parenthesis? Something like EndTxn(commit:false, tsflex: false)
pkg/roachpb/batch.go, line 99 at r5 (raw file):
} // IsReadAndWrite returns true if the request both reads and writes
I think you added this back in the wrong file. It should be in api.go
.
pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):
if baHeader.Txn != nil && baHeader.Txn.Status.IsCommittedOrStaging() { if writeTooOldState.err != nil { log.Fatalf(ctx, "committed txn with writeTooOld err: %s", writeTooOldState.err)
I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?
Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }
pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):
var pd result.Result cArgs := batcheval.CommandArgs{
Why?
pkg/storage/replica_test.go, line 545 at r5 (raw file):
} // Emulate what a server actually does and bump the write timestamp when
Which ones? None of the test cases changed.
pkg/storage/replica_test.go, line 547 at r5 (raw file):
// Emulate what a server actually does and bump the write timestamp when // possible. This makes some batches with diverged read and write // timestamps to still pass isOnePhaseCommit().
s/to still//
pkg/storage/replica_write.go, line 380 at r1 (raw file):
} // newBatchedEngine creates and engine.Batch. Depending on whether rangefeeds
s/and/an/
pkg/storage/replica_write.go, line 310 at r5 (raw file):
res result.Result } synthesizeEndTxnResponse := func() onePCResult {
Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole onePCResult
stuff seems hard to read to me, so I'd vote to avoid it if possible.
pkg/storage/replica_write.go, line 312 at r5 (raw file):
synthesizeEndTxnResponse := func() onePCResult { if pErr != nil { return onePCResult{success: false}
We're not setting pErr
here? Could you comment why?
pkg/storage/replica_write.go, line 318 at r5 (raw file):
otherwise we wouldn't have attempted 1PC
Is this true? What if the txn had read before and hit an uncertainty retry?
pkg/storage/replica_write.go, line 475 at r5 (raw file):
} // canDoServersideRetry looks at the error produced by evaluating ba and decides
nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".
5431fb8
to
8fb46b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/aother/another/
done
pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Consider adding:
// We expect the request to succeed after a server-side retry. txnCoordRetry: false,
done
pkg/roachpb/batch.go, line 579 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: did you consider putting this in the parenthesis? Something like
EndTxn(commit:false, tsflex: false)
done
pkg/roachpb/batch.go, line 99 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think you added this back in the wrong file. It should be in
api.go
.
done
pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):
I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?
No... If there's an EndTxn
in the batch, then we can't return the WriteTooOld
flag... We never did.
I mean, we could, subject I guess to the same reasoning as returning a WriteTooOld
flag when a CPut encounters a write-too-old. No?
Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }
Done
pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why?
reverted the change
pkg/storage/replica_test.go, line 545 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Which ones? None of the test cases changed.
well I think preventing the tests from changing was the point here. IsOnePhaseCommit()
used to indirectly check CanForwardCommitTimestampWithoutRefresh()
. Now it no longer does.
pkg/storage/replica_test.go, line 547 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/to still//
done
pkg/storage/replica_write.go, line 380 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/and/an/
done
pkg/storage/replica_write.go, line 310 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole
onePCResult
stuff seems hard to read to me, so I'd vote to avoid it if possible.
I've extracted the 1PC evaluation to a new method; I think it's a lot more readable now. PTAL.
I've stared at the goescape output for a while; things look pretty good.
pkg/storage/replica_write.go, line 312 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We're not setting
pErr
here? Could you comment why?
done
pkg/storage/replica_write.go, line 318 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
otherwise we wouldn't have attempted 1PC
Is this true? What if the txn had read before and hit an uncertainty retry?
well I was relying on the splitting of reads and writes. But I got rid of this assertion.
pkg/storage/replica_write.go, line 475 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".
When shouldn't we?
8fb46b0
to
c40fb47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 29 of 29 files at r7, 1 of 1 files at r8, 14 of 14 files at r9, 5 of 5 files at r10, 20 of 20 files at r11, 4 of 4 files at r12.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):
No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.
Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting writeTooOldState.err
to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation of EndTxn
?
Oh, or did we get rid of server-side refreshes within cmd_end_transaction.go
? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃
pkg/storage/replica_write.go, line 475 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
When shouldn't we?
When we succeeded in running the batch. But this is fine.
pkg/storage/replica_write.go, line 265 at r11 (raw file):
if isOnePhaseCommit(ba) { res := r.evaluate1PC(ctx, idKey, ba, spans) if res.success == onePCSucceeded {
nit: use a switch statement and exhaustively check all three cases.
pkg/storage/replica_write.go, line 312 at r11 (raw file):
func (r *Replica) evaluate1PC( ctx context.Context, idKey storagebase.CmdIDKey, ba *roachpb.BatchRequest, spans *spanset.SpanSet, ) (_res onePCResult) {
What's up with the underscore? I'd name it something else if you don't want it to clash with res
. Maybe onePCRes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.
Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting
writeTooOldState.err
to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation ofEndTxn
?Oh, or did we get rid of server-side refreshes within
cmd_end_transaction.go
? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃
Right, this commit gets rid of refreshing in cmd_end_transaction.go
.
pkg/storage/replica_write.go, line 475 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
When we succeeded in running the batch. But this is fine.
ack
pkg/storage/replica_write.go, line 265 at r11 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: use a switch statement and exhaustively check all three cases.
done
pkg/storage/replica_write.go, line 312 at r11 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's up with the underscore? I'd name it something else if you don't want it to clash with
res
. MaybeonePCRes
.
done
But I like the underscores to indicate that the named ret vals are only to be used in defers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13.
Reviewable status:complete! 1 of 0 LGTMs obtained
a16fcb1
to
5480d25
Compare
Move blabber about engine.Batch creation to a dedicated function. It was obscuring what reads better as a tight retry loop. Release note: None
This test was creating batches with different flags set. The combination of the WriteTooOld flag being set and the write timestamp not being bumped in relation to the read timestamp does not make sense. All such tests are removed, and the test spec is improved. A future commit introduces an assertion that such requests are indeed not received by servers. Release note: None
Release note: None
This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None
Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes cockroachdb#42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed.
Explain that it's about the quality of the serializability error that is produced. Release note: None
5480d25
to
d5b8886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
Canceled (will resume) |
35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner This PR adds a --max-concurrency flag to roachprod and defaults its value to 32. In large clusters doing many concurrent SSH operations can lead to unexpected behavior where the command fails to communicate with the SSH agent and leads to the user being prompted for their private key passphrase. Adding a limit prevents this behavior when interacting with a cluster of 256 nodes. Release note: None 44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately. --------- Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Build succeeded |
PR cockroachdb#44507 has left an unnecessary check in place. The check tolerates requests with the WriteTooOld flag set for leaf txn, even though the same commit makes sure that leaves reset that flag. Release note: None
PR cockroachdb#44507 has left an unnecessary check in place. The check tolerates requests with the WriteTooOld flag set for leaf txn, even though the same commit makes sure that leaves reset that flag. Release note: None
This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately.
Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.
One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.
Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.
The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.
The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...
Fixes #42849
Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.