storage: rationalize server-side refreshes and fix bugs #44507

andreimatei · 2020-01-29T20:48:01Z

This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately.

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

cockroach-teamcity · 2020-01-29T20:48:21Z

This change is

andreimatei

hold off on this review. I've got some more tweaks I want to do.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

andreimatei

PTAL. Compared to the previous time you've seen this, there's now a small preamble refactor commit (the first commit), and in the main commit I've improved some stuff around server-side refreshes of 1PC attempts (we no longer needlessly attempt them when CanCommitAtHigherTimestamp is not set. Also, not deferring WTOE on IsReadAndWrite() is back; the previous iteration had removed it based on the fact that we don't defer anything anyway because the ba.DeferWriteTooOldError flag is never set, but... I've put it back.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

nvanbenschoten

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 14 of 14 files at r3, 5 of 5 files at r4, 19 of 19 files at r5, 2 of 2 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):

			},
			retryable: func(ctx context.Context, txn *client.Txn) error {
				if err := txn.Put(ctx, "aother", "another put"); err != nil {

s/aother/another/

pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):
Consider adding:

// We expect the request to succeed after a server-side retry.
txnCoordRetry: false,

pkg/roachpb/batch.go, line 579 at r3 (raw file):

		if et, ok := req.(*EndTxnRequest); ok {
			h := req.Header()
			str = append(str, fmt.Sprintf("%s(commit:%t) [%s] tsflex:%t",

nit: did you consider putting this in the parenthesis? Something like EndTxn(commit:false, tsflex: false)

pkg/roachpb/batch.go, line 99 at r5 (raw file):

}

// IsReadAndWrite returns true if the request both reads and writes

I think you added this back in the wrong file. It should be in api.go.

pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

	if baHeader.Txn != nil && baHeader.Txn.Status.IsCommittedOrStaging() {
		if writeTooOldState.err != nil {
			log.Fatalf(ctx, "committed txn with writeTooOld err: %s", writeTooOldState.err)

I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?

Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }

pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):

	var pd result.Result

	cArgs := batcheval.CommandArgs{

Why?

pkg/storage/replica_test.go, line 545 at r5 (raw file):

				}

				// Emulate what a server actually does and bump the write timestamp when

Which ones? None of the test cases changed.

pkg/storage/replica_test.go, line 547 at r5 (raw file):

				// Emulate what a server actually does and bump the write timestamp when
				// possible. This makes some batches with diverged read and write
				// timestamps to still pass isOnePhaseCommit().

s/to still//

pkg/storage/replica_write.go, line 380 at r1 (raw file):

}

// newBatchedEngine creates and engine.Batch. Depending on whether rangefeeds

s/and/an/

pkg/storage/replica_write.go, line 310 at r5 (raw file):

			res   result.Result
		}
		synthesizeEndTxnResponse := func() onePCResult {

Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole onePCResult stuff seems hard to read to me, so I'd vote to avoid it if possible.

pkg/storage/replica_write.go, line 312 at r5 (raw file):

		synthesizeEndTxnResponse := func() onePCResult {
			if pErr != nil {
				return onePCResult{success: false}

We're not setting pErr here? Could you comment why?

pkg/storage/replica_write.go, line 318 at r5 (raw file):

otherwise we wouldn't have attempted 1PC

Is this true? What if the txn had read before and hit an uncertainty retry?

pkg/storage/replica_write.go, line 475 at r5 (raw file):

}

// canDoServersideRetry looks at the error produced by evaluating ba and decides

nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/aother/another/

done

pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider adding:

// We expect the request to succeed after a server-side retry.
txnCoordRetry: false,

done

pkg/roachpb/batch.go, line 579 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: did you consider putting this in the parenthesis? Something like EndTxn(commit:false, tsflex: false)

done

pkg/roachpb/batch.go, line 99 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think you added this back in the wrong file. It should be in api.go.

done

pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.
I mean, we could, subject I guess to the same reasoning as returning a WriteTooOld flag when a CPut encounters a write-too-old. No?

Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }

Done

pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why?

reverted the change

pkg/storage/replica_test.go, line 545 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Which ones? None of the test cases changed.

well I think preventing the tests from changing was the point here. IsOnePhaseCommit() used to indirectly check CanForwardCommitTimestampWithoutRefresh(). Now it no longer does.

pkg/storage/replica_test.go, line 547 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/to still//

done

pkg/storage/replica_write.go, line 380 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/and/an/

done

pkg/storage/replica_write.go, line 310 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole onePCResult stuff seems hard to read to me, so I'd vote to avoid it if possible.

I've extracted the 1PC evaluation to a new method; I think it's a lot more readable now. PTAL.

I've stared at the goescape output for a while; things look pretty good.

pkg/storage/replica_write.go, line 312 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're not setting pErr here? Could you comment why?

done

pkg/storage/replica_write.go, line 318 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

otherwise we wouldn't have attempted 1PC
Is this true? What if the txn had read before and hit an uncertainty retry?

well I was relying on the splitting of reads and writes. But I got rid of this assertion.

pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".

When shouldn't we?

nvanbenschoten

Reviewed 29 of 29 files at r7, 1 of 1 files at r8, 14 of 14 files at r9, 5 of 5 files at r10, 20 of 20 files at r11, 4 of 4 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.

Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting writeTooOldState.err to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation of EndTxn?

Oh, or did we get rid of server-side refreshes within cmd_end_transaction.go? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃

pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

When shouldn't we?

When we succeeded in running the batch. But this is fine.

pkg/storage/replica_write.go, line 265 at r11 (raw file):

	if isOnePhaseCommit(ba) {
		res := r.evaluate1PC(ctx, idKey, ba, spans)
		if res.success == onePCSucceeded {

nit: use a switch statement and exhaustively check all three cases.

pkg/storage/replica_write.go, line 312 at r11 (raw file):

func (r *Replica) evaluate1PC(
	ctx context.Context, idKey storagebase.CmdIDKey, ba *roachpb.BatchRequest, spans *spanset.SpanSet,
) (_res onePCResult) {

What's up with the underscore? I'd name it something else if you don't want it to clash with res. Maybe onePCRes.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.

Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting writeTooOldState.err to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation of EndTxn?

Oh, or did we get rid of server-side refreshes within cmd_end_transaction.go? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃

Right, this commit gets rid of refreshing in cmd_end_transaction.go.

pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When we succeeded in running the batch. But this is fine.

ack

pkg/storage/replica_write.go, line 265 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: use a switch statement and exhaustively check all three cases.

done

pkg/storage/replica_write.go, line 312 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's up with the underscore? I'd name it something else if you don't want it to clash with res. Maybe onePCRes.

done

But I like the underscores to indicate that the named ret vals are only to be used in defers.

nvanbenschoten

Reviewed 1 of 1 files at r13.
Reviewable status: complete! 1 of 0 LGTMs obtained

Move blabber about engine.Batch creation to a dedicated function. It was obscuring what reads better as a tight retry loop. Release note: None

This test was creating batches with different flags set. The combination of the WriteTooOld flag being set and the write timestamp not being bumped in relation to the read timestamp does not make sense. All such tests are removed, and the test spec is improved. A future commit introduces an assertion that such requests are indeed not received by servers. Release note: None

Release note: None

This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None

Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes cockroachdb#42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed.

Explain that it's about the quality of the serializability error that is produced. Release note: None

andreimatei

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

craig · 2020-02-07T19:17:25Z

Canceled (will resume)

35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner This PR adds a --max-concurrency flag to roachprod and defaults its value to 32. In large clusters doing many concurrent SSH operations can lead to unexpected behavior where the command fails to communicate with the SSH agent and leads to the user being prompted for their private key passphrase. Adding a limit prevents this behavior when interacting with a cluster of 256 nodes. Release note: None 44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately. --------- Before this patch, we had several issues due to the server erroneously considering that it's OK to commit a transaction at a bumped timestamp. One of the issues was a lost update: a CPut could erroneously succeed even though there's been a more recent write. This was caused by faulty code in evaluateBatch() that was thinking that, just because an EndTxn claimed to have been able to commit a transaction, that means that any WriteTooOldError encountered previously by the batch was safe to discard. An EndTxn might consider that it can commit even if there had been previous write too old conditions if the NoRefreshSpans flag is set. The problems is that a CPut that had returned a WriteTooOldError also evaluated at the wrong read timestamp, and so its evaluation can't be relied on. Another issue is that, when the EndTxn code mentioned above considers that it's safe to commit at a bumped timestamp, it doesn't take into considerations that the EndTxn's batch might have performed reads (other than CPuts) that have been evaluated at a lower timestamp. This can happen, for example in the following scenario: - a txn sends a Put which gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The scan gets evaluated at the original timestamp, but then we commit at a bumped one because the NoRefreshSpans flag is set. The patch fixes the bugs by reworking how evaluation takes advantage of the fact that some requests have flexible timestamps. EndTxn no longer is in the business of committing at bumped timestamps, and its code is thus simplified. Instead, the replica's "local retries" loop takes over. The replica already had code handling non-transactional batches that evaluated them repeatedly in case of WriteTooOldErrors. This patch rationalizes and expands this code to deal with transactional batches too, and with pushes besides WriteTooOldErrors. This reevaluation loop now handles the cases in which the EndTxn used to bump the commit timestamp. The patch also fixes a third bug: the logic evaluateBatch() for resetting the WriteTooOld state after a successful EndTransaction was ignoring the STAGING state, meaning that the server would return a WriteTooOldError even though the transaction was committed. I'm not sure if this had dramatic consequences or was benign... Fixes #42849 Release note (bug fix): A bug causing lost update transaction anomalies was fixed. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

craig · 2020-02-07T20:09:56Z

Build succeeded

GitHub CI (Cockroach)

PR cockroachdb#44507 has left an unnecessary check in place. The check tolerates requests with the WriteTooOld flag set for leaf txn, even though the same commit makes sure that leaves reset that flag. Release note: None

andreimatei force-pushed the txn.server-side-refresh branch 2 times, most recently from 3a1a5da to 9c96479 Compare January 30, 2020 21:18

andreimatei changed the title ~~wip~~ storage: rationalize server-side refreshes and fix bugs Jan 30, 2020

andreimatei requested a review from nvanbenschoten January 30, 2020 21:57

andreimatei mentioned this pull request Jan 30, 2020

storage: fix replay protection for 1PC txns #44566

Merged

andreimatei commented Jan 30, 2020

View reviewed changes

andreimatei force-pushed the txn.server-side-refresh branch from 9c96479 to c7c1a7e Compare January 31, 2020 22:30

andreimatei commented Jan 31, 2020

View reviewed changes

andreimatei force-pushed the txn.server-side-refresh branch from c7c1a7e to cf94349 Compare February 3, 2020 21:09

nvanbenschoten reviewed Feb 3, 2020

View reviewed changes

andreimatei force-pushed the txn.server-side-refresh branch from 5431fb8 to 8fb46b0 Compare February 5, 2020 00:44

andreimatei commented Feb 5, 2020

View reviewed changes

andreimatei force-pushed the txn.server-side-refresh branch from 8fb46b0 to c40fb47 Compare February 5, 2020 01:39

nvanbenschoten reviewed Feb 5, 2020

View reviewed changes

andreimatei commented Feb 5, 2020

View reviewed changes

nvanbenschoten approved these changes Feb 5, 2020

View reviewed changes

andreimatei force-pushed the txn.server-side-refresh branch 5 times, most recently from a16fcb1 to 5480d25 Compare February 6, 2020 22:32

andreimatei added 6 commits February 7, 2020 12:12

storage: small refactor

62e5beb

Move blabber about engine.Batch creation to a dedicated function. It was obscuring what reads better as a tight retry loop. Release note: None

roachpb: rename NoRefreshSpans -> CanCommitAtHigherTimestamp

5bc8b77

Release note: None

storage: move maybeStripInFlightWrites

70fc444

This patch creates a dedicated file for functions performing server-side modifications to a batch. The next commit will have a 2nd such function. Release note: None

roachpb: improve comments on the write_too_old field

d5b8886

Explain that it's about the quality of the serializability error that is produced. Release note: None

andreimatei force-pushed the txn.server-side-refresh branch from 5480d25 to d5b8886 Compare February 7, 2020 17:15

andreimatei commented Feb 7, 2020

View reviewed changes

craig bot merged commit d5b8886 into cockroachdb:master Feb 7, 2020

andreimatei deleted the txn.server-side-refresh branch February 10, 2020 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: rationalize server-side refreshes and fix bugs #44507

storage: rationalize server-side refreshes and fix bugs #44507

andreimatei commented Jan 29, 2020 •

edited

Loading

cockroach-teamcity commented Jan 29, 2020

andreimatei left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

nvanbenschoten left a comment

andreimatei left a comment

craig bot commented Feb 7, 2020

craig bot commented Feb 7, 2020

storage: rationalize server-side refreshes and fix bugs #44507

storage: rationalize server-side refreshes and fix bugs #44507

Conversation

andreimatei commented Jan 29, 2020 • edited Loading

cockroach-teamcity commented Jan 29, 2020

andreimatei left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Feb 7, 2020

Canceled (will resume)

craig bot commented Feb 7, 2020

Build succeeded

andreimatei commented Jan 29, 2020 •

edited

Loading