Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: rationalize server-side refreshes and fix bugs #44507

Merged
merged 6 commits into from
Feb 7, 2020

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Jan 29, 2020

This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately.


Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts) that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch 2 times, most recently from 3a1a5da to 9c96479 Compare January 30, 2020 21:18
@andreimatei andreimatei changed the title wip storage: rationalize server-side refreshes and fix bugs Jan 30, 2020
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hold off on this review. I've got some more tweaks I want to do.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch from 9c96479 to c7c1a7e Compare January 31, 2020 22:30
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL. Compared to the previous time you've seen this, there's now a small preamble refactor commit (the first commit), and in the main commit I've improved some stuff around server-side refreshes of 1PC attempts (we no longer needlessly attempt them when CanCommitAtHigherTimestamp is not set. Also, not deferring WTOE on IsReadAndWrite() is back; the previous iteration had removed it based on the fact that we don't defer anything anyway because the ba.DeferWriteTooOldError flag is never set, but... I've put it back.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch from c7c1a7e to cf94349 Compare February 3, 2020 21:09
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 14 of 14 files at r3, 5 of 5 files at r4, 19 of 19 files at r5, 2 of 2 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):

			},
			retryable: func(ctx context.Context, txn *client.Txn) error {
				if err := txn.Put(ctx, "aother", "another put"); err != nil {

s/aother/another/


pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):
Consider adding:

// We expect the request to succeed after a server-side retry.
txnCoordRetry: false,

pkg/roachpb/batch.go, line 579 at r3 (raw file):

		if et, ok := req.(*EndTxnRequest); ok {
			h := req.Header()
			str = append(str, fmt.Sprintf("%s(commit:%t) [%s] tsflex:%t",

nit: did you consider putting this in the parenthesis? Something like EndTxn(commit:false, tsflex: false)


pkg/roachpb/batch.go, line 99 at r5 (raw file):

}

// IsReadAndWrite returns true if the request both reads and writes

I think you added this back in the wrong file. It should be in api.go.


pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

	if baHeader.Txn != nil && baHeader.Txn.Status.IsCommittedOrStaging() {
		if writeTooOldState.err != nil {
			log.Fatalf(ctx, "committed txn with writeTooOld err: %s", writeTooOldState.err)

I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?

Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }


pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):

	var pd result.Result

	cArgs := batcheval.CommandArgs{

Why?


pkg/storage/replica_test.go, line 545 at r5 (raw file):

				}

				// Emulate what a server actually does and bump the write timestamp when

Which ones? None of the test cases changed.


pkg/storage/replica_test.go, line 547 at r5 (raw file):

				// Emulate what a server actually does and bump the write timestamp when
				// possible. This makes some batches with diverged read and write
				// timestamps to still pass isOnePhaseCommit().

s/to still//


pkg/storage/replica_write.go, line 380 at r1 (raw file):

}

// newBatchedEngine creates and engine.Batch. Depending on whether rangefeeds

s/and/an/


pkg/storage/replica_write.go, line 310 at r5 (raw file):

			res   result.Result
		}
		synthesizeEndTxnResponse := func() onePCResult {

Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole onePCResult stuff seems hard to read to me, so I'd vote to avoid it if possible.


pkg/storage/replica_write.go, line 312 at r5 (raw file):

		synthesizeEndTxnResponse := func() onePCResult {
			if pErr != nil {
				return onePCResult{success: false}

We're not setting pErr here? Could you comment why?


pkg/storage/replica_write.go, line 318 at r5 (raw file):

otherwise we wouldn't have attempted 1PC

Is this true? What if the txn had read before and hit an uncertainty retry?


pkg/storage/replica_write.go, line 475 at r5 (raw file):

}

// canDoServersideRetry looks at the error produced by evaluating ba and decides

nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch from 5431fb8 to 8fb46b0 Compare February 5, 2020 00:44
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/dist_sender_server_test.go, line 2071 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/aother/another/

done


pkg/kv/dist_sender_server_test.go, line 2077 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider adding:

// We expect the request to succeed after a server-side retry.
txnCoordRetry: false,

done


pkg/roachpb/batch.go, line 579 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: did you consider putting this in the parenthesis? Something like EndTxn(commit:false, tsflex: false)

done


pkg/roachpb/batch.go, line 99 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think you added this back in the wrong file. It should be in api.go.

done


pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

I'm confused. Isn't this exactly what we expect to see when we perform a server-side refresh during an EndTxn? What am I missing?

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.
I mean, we could, subject I guess to the same reasoning as returning a WriteTooOld flag when a CPut encounters a write-too-old. No?

Also, we might as well restructure these two conditions to if writeTooOldState.err != nil { ... }

Done


pkg/storage/replica_evaluate.go, line 424 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why?

reverted the change


pkg/storage/replica_test.go, line 545 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Which ones? None of the test cases changed.

well I think preventing the tests from changing was the point here. IsOnePhaseCommit() used to indirectly check CanForwardCommitTimestampWithoutRefresh(). Now it no longer does.


pkg/storage/replica_test.go, line 547 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/to still//

done


pkg/storage/replica_write.go, line 380 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/and/an/

done


pkg/storage/replica_write.go, line 310 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you double-check that this isn't causing anything new to escape to the heap? To be honest, this whole onePCResult stuff seems hard to read to me, so I'd vote to avoid it if possible.

I've extracted the 1PC evaluation to a new method; I think it's a lot more readable now. PTAL.

I've stared at the goescape output for a while; things look pretty good.


pkg/storage/replica_write.go, line 312 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're not setting pErr here? Could you comment why?

done


pkg/storage/replica_write.go, line 318 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
otherwise we wouldn't have attempted 1PC

Is this true? What if the txn had read before and hit an uncertainty retry?

well I was relying on the splitting of reads and writes. But I got rid of this assertion.


pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: I know we talked about this before and I might have been the one to push you towards this, but "can" seems like the wrong word here. We often "can" perform a serverside retry - that doesn't mean that we "should".

When shouldn't we?

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch from 8fb46b0 to c40fb47 Compare February 5, 2020 01:39
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 29 of 29 files at r7, 1 of 1 files at r8, 14 of 14 files at r9, 5 of 5 files at r10, 20 of 20 files at r11, 4 of 4 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.

Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting writeTooOldState.err to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation of EndTxn?

Oh, or did we get rid of server-side refreshes within cmd_end_transaction.go? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃


pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

When shouldn't we?

When we succeeded in running the batch. But this is fine.


pkg/storage/replica_write.go, line 265 at r11 (raw file):

	if isOnePhaseCommit(ba) {
		res := r.evaluate1PC(ctx, idKey, ba, spans)
		if res.success == onePCSucceeded {

nit: use a switch statement and exhaustively check all three cases.


pkg/storage/replica_write.go, line 312 at r11 (raw file):

func (r *Replica) evaluate1PC(
	ctx context.Context, idKey storagebase.CmdIDKey, ba *roachpb.BatchRequest, spans *spanset.SpanSet,
) (_res onePCResult) {

What's up with the underscore? I'd name it something else if you don't want it to clash with res. Maybe onePCRes.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_evaluate.go, line 356 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No... If there's an EndTxn in the batch, then we can't return the WriteTooOld flag... We never did.

Sure, I'm not saying we can, I'm just confused how the code is working. We're never setting writeTooOldState.err to nil when we see an EndTxn that performs a server-side refresh, so what happens if it was set by a previous request in the same batch and then refreshed away during the evaluation of EndTxn?

Oh, or did we get rid of server-side refreshes within cmd_end_transaction.go? That must be it. I've looked at too many subtle variations of this change to have a sound mental model for how this works right now 😃

Right, this commit gets rid of refreshing in cmd_end_transaction.go.


pkg/storage/replica_write.go, line 475 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When we succeeded in running the batch. But this is fine.

ack


pkg/storage/replica_write.go, line 265 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: use a switch statement and exhaustively check all three cases.

done


pkg/storage/replica_write.go, line 312 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's up with the underscore? I'd name it something else if you don't want it to clash with res. Maybe onePCRes.

done

But I like the underscores to indicate that the named ret vals are only to be used in defers.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r13.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@andreimatei andreimatei force-pushed the txn.server-side-refresh branch 5 times, most recently from a16fcb1 to 5480d25 Compare February 6, 2020 22:32
Move blabber about engine.Batch creation to a dedicated function. It was
obscuring what reads better as a tight retry loop.

Release note: None
This test was creating batches with different flags set. The combination
of the WriteTooOld flag being set and the write timestamp not being
bumped in relation to the read timestamp does not make sense. All such
tests are removed, and the test spec is improved.
A future commit introduces an assertion that such requests are indeed
not received by servers.

Release note: None
This patch creates a dedicated file for functions performing server-side
modifications to a batch. The next commit will have a 2nd such function.

Release note: None
Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes cockroachdb#42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.
Explain that it's about the quality of the serializability error that
is produced.

Release note: None
@andreimatei andreimatei force-pushed the txn.server-side-refresh branch from 5480d25 to d5b8886 Compare February 7, 2020 17:15
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@craig
Copy link
Contributor

craig bot commented Feb 7, 2020

Canceled (will resume)

craig bot pushed a commit that referenced this pull request Feb 7, 2020
35294: roachprod: add a max-concurrency flag with a default of 32 r=nvanbenschoten a=ajwerner

This PR adds a --max-concurrency flag to roachprod and defaults its value to 32.
In large clusters doing many concurrent SSH operations can lead to unexpected
behavior where the command fails to communicate with the SSH agent and leads to
the user being prompted for their private key passphrase. Adding a limit
prevents this behavior when interacting with a cluster of 256 nodes.

Release note: None

44507: storage: rationalize server-side refreshes and fix bugs r=andreimatei a=andreimatei

This is most #42969, which had been reverted. Different other parts of #42969 have been re-merged separately.

---------

Before this patch, we had several issues due to the server erroneously
considering that it's OK to commit a transaction at a bumped timestamp.

One of the issues was a lost update: a CPut could erroneously succeed
even though there's been a more recent write. This was caused by faulty
code in evaluateBatch() that was thinking that, just because an EndTxn
claimed to have been able to commit a transaction, that means that any
WriteTooOldError encountered previously by the batch was safe to
discard. An EndTxn might consider that it can commit even if there had
been previous write too old conditions if the NoRefreshSpans flag is
set. The problems is that a CPut that had returned a WriteTooOldError
also evaluated at the wrong read timestamp, and so its evaluation can't
be relied on.

Another issue is that, when the EndTxn code mentioned above considers
that it's safe to commit at a bumped timestamp, it doesn't take into
considerations that the EndTxn's batch might have performed reads (other
than CPuts)  that have been evaluated at a lower timestamp. This can
happen, for example in the following scenario: - a txn sends a Put which
gets bumped by the ts cache - the txn then sends a Scan + EndTxn. The
scan gets evaluated at the original timestamp, but then we commit at a
bumped one because the NoRefreshSpans flag is set.

The patch fixes the bugs by reworking how evaluation takes advantage of
the fact that some requests have flexible timestamps. EndTxn no longer
is in the business of committing at bumped timestamps, and its code is
thus simplified. Instead, the replica's "local retries" loop takes over.
The replica already had code handling non-transactional batches that
evaluated them repeatedly in case of WriteTooOldErrors. This patch
rationalizes and expands this code to deal with transactional batches
too, and with pushes besides WriteTooOldErrors. This reevaluation loop
now handles the cases in which the EndTxn used to bump the commit
timestamp.

The patch also fixes a third bug: the logic evaluateBatch() for
resetting the WriteTooOld state after a successful EndTransaction was
ignoring the STAGING state, meaning that the server would return a
WriteTooOldError even though the transaction was committed. I'm not sure
if this had dramatic consequences or was benign...

Fixes #42849

Release note (bug fix): A bug causing lost update transaction anomalies
was fixed.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Feb 7, 2020

Build succeeded

@craig craig bot merged commit d5b8886 into cockroachdb:master Feb 7, 2020
@andreimatei andreimatei deleted the txn.server-side-refresh branch February 10, 2020 17:36
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 13, 2020
PR cockroachdb#44507 has left an unnecessary check in place. The check tolerates
requests with the WriteTooOld flag set for leaf txn, even though the
same commit makes sure that leaves reset that flag.

Release note: None
andreimatei added a commit to andreimatei/cockroach that referenced this pull request Feb 14, 2020
PR cockroachdb#44507 has left an unnecessary check in place. The check tolerates
requests with the WriteTooOld flag set for leaf txn, even though the
same commit makes sure that leaves reset that flag.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1PC CPuts suffering from lost updates
3 participants