Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: persist minimum transaction timestamps in intents #38782

Merged
merged 9 commits into from
Jul 11, 2019

Conversation

nvanbenschoten
Copy link
Member

Fixes #36089.
Informs #37199.

This commit addresses the second concern from #37199 (comment) by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable.

This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn.

@tbg I'm assigning you because you reviewed most of the lazy transaction record stuff (especially #33523), but let me know if you'd like me to find a different reviewer.

@nvanbenschoten nvanbenschoten requested review from tbg and a team July 9, 2019 22:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member Author

Here's some confirmation that this addresses the remaining txn stalls in kv/contention/nodes=4:

Screen Shot 2019-07-09 at 11 59 28 PM

before on left, after on right

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/cqErr3 branch 2 times, most recently from 828fd26 to a27bd89 Compare July 10, 2019 15:21
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: neat.

Reviewed 4 of 4 files at r1, 1 of 1 files at r2, 6 of 6 files at r3, 3 of 3 files at r4, 22 of 22 files at r5, 7 of 7 files at r6, 27 of 27 files at r7, 3 of 3 files at r8, 22 of 22 files at r9, 7 of 7 files at r10, 2 of 2 files at r11, 1 of 1 files at r12.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_tscache.go, line 297 at r5 (raw file):

// CanCreateTxnRecord determines whether a transaction record can be created
// for the provided transaction information. Callers should provide the

s/should/must/


pkg/storage/replica_tscache.go, line 299 at r5 (raw file):

, that is, the epoch zero OrigTimestamp.


pkg/storage/replica_tscache.go, line 305 at r5 (raw file):

// it returns the reason that transaction record was rejected. If the method
// ever determines that a transaction record must be rejected, it will continue
// to reject that transaction going forwards.

While you're here, you could also point out that if this method returns true, we may still fail to create the record because there's no locking here. Maybe this is clear already.


pkg/storage/replica_tscache.go, line 309 at r5 (raw file):

// provisional commit timestamp because it may have moved forward over the
// course of a single epoch and they can't provide their (current epoch's)
// OrigTimestamp because it may have moved forward over a series of prior

Feel like I've asked this before, but wasn't EpochZeroTimestamp the thing folks wanted to pass here? Why not?
I also assume callers that had only seen an intent sometimes need this and could only provide something larger than optimal, but this isn't in the old comment.

Nice that this all simplifies with this PR.


pkg/storage/batcheval/transaction.go, line 165 at r5 (raw file):

		// that we can remove it in v20.1 and replace it with:
		//
		//  synthTxnRecord.Status = roachpb.ABORTED

👍


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

							// harmless, especially because the txnPushAttempt performs a
							// high-priority push.
							MinTimestamp: txn.timestamp,

We can't have the real minTimestamp here?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_tscache.go, line 297 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

s/should/must/

Done.


pkg/storage/replica_tscache.go, line 299 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

, that is, the epoch zero OrigTimestamp.

The notion of an "epoch zero OrigTimestamp" is part of what I'm trying to get rid of here. The contract we want from this timestamp is that it's the minimum timestamp that a transaction could have ever written. Mixing this up with epochs and the OrigTimestamp (which I'd like to rename to the read timestamp) just confuses all of this.


pkg/storage/replica_tscache.go, line 305 at r5 (raw file):

because there's no locking here

Latching provides this locking. No request that would update the timestamp cache to prevent a txn record from being created could be evaluating concurrently.


pkg/storage/replica_tscache.go, line 309 at r5 (raw file):

but wasn't EpochZeroTimestamp the thing folks wanted to pass here

Yes, but that field is only populated after a restart. We were passing the result of InclusiveTimeBounds here before, which accounted for this.

I also assume callers that had only seen an intent sometimes need this and could only provide something larger than optimal

Yes, that's correct

but this isn't in the old comment.

The old comment discussed the false negatives that could occur if this timestamp wasn't the real min timestamp. The (now removed) comment in cmd_push_txn.go discussed the actual implications of this.


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

We can't have the real minTimestamp here?

We'd need to ship in through the LogicalOpLog as an MVCCLogicalOp. That doesn't seem worth it and would also cause compatibility concerns.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 28 of 28 files at r13, 1 of 1 files at r14, 8 of 8 files at r15, 3 of 3 files at r16, 22 of 22 files at r17, 7 of 7 files at r18, 2 of 2 files at r19, 1 of 1 files at r20.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/replica_tscache.go, line 299 at r5 (raw file):
That sounds good, but then why not

, that is, the epoch zero value of the Timestamp field (the epoch zero write timestamp).

Were you also planning to rename Timestamp to WriteTimestamp? That would make sense to me, the role of the two timestamps is confusing enough to those who haven't internalized it over the years.


pkg/storage/replica_tscache.go, line 305 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

because there's no locking here

Latching provides this locking. No request that would update the timestamp cache to prevent a txn record from being created could be evaluating concurrently.

But the low water mark could rise over due to unrelated requests rolling over the page and change the outcome?

Now that I looked up this method I'm seeing that it's really just part of the eval context, and there it makes more sense because that restricts who calls it (command evaluation), so latches are assumed. This reminds me that I was hoping to take off the eval context methods from the main replica, but it was a giant yak shave. I should get back to that at some point though.


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We'd need to ship in through the LogicalOpLog as an MVCCLogicalOp. That doesn't seem worth it and would also cause compatibility concerns.

That is unfortunate, but on the other hand this line keeps all of the magic in the comments that you removed alive, but in a more subtle way. It directly contradicts the contract of the MinTimestamp field. I feel that this is problematic enough to fix.

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_tscache.go, line 299 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

That sounds good, but then why not

, that is, the epoch zero value of the Timestamp field (the epoch zero write timestamp).

Were you also planning to rename Timestamp to WriteTimestamp? That would make sense to me, the role of the two timestamps is confusing enough to those who haven't internalized it over the years.

Yes, exactly. The goal is to get the long list of transaction timestamps down to read_timestamp, write_timestamp, min_timestamp, and max_timestamp, last_heartbeat.

max_timestamp might also be due for a rename though, as it's not quite clear from the name what it's actual meaning is. Perhaps max_uncertain_timestamp.

Also, hopefully, last_heartbeat can also go away when we stop heartbeating individual transactions.


pkg/storage/replica_tscache.go, line 305 at r5 (raw file):

But the low water mark could rise over due to unrelated requests rolling over the page and change the outcome?

Sure, but that wouldn't prevent the request from creating the transaction record. It's no different than any other write that only checks the timestamp cache once. Timestamp cache pressure could create artificial conflicts, but nothing real, so the "race" is immaterial.


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

That is unfortunate, but on the other hand this line keeps all of the magic in the comments that you removed alive, but in a more subtle way. It directly contradicts the contract of the MinTimestamp field. I feel that this is problematic enough to fix.

I'm curious what your suggested fix is. It seems to me like it could go one of a few ways:

  1. ship the transaction's min timestamp around through the LogicalOpLog with the rest of the transaction metadata. This would have compatibility issues with v19.1, so at some level we'll still need this, although I guess we can let the missing min timestamp case fall through to SynthesizeTxnFromMeta, which already needs to handle it. Interestingly, this is essentially what we'd get for free if the transaction ID included its minimum timestamp, which is the long term goal for this series of changes.
  2. don't set any min timestamp here and formally support txn pushes without one specified by falling back to the old approach of using the write timestamp as the min timestamp when one is not specified, which essentially makes the current compatibility hack set in stone.
  3. don't set any min timestamp here and formally support txn pushes without one specified by considering missing records to be aborted. I'm not sure if this works in practice, but we could probably make it work.
  4. something else I haven't yet considered.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/cqErr3 branch 2 times, most recently from 9655167 to 8135219 Compare July 10, 2019 23:11
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm curious what your suggested fix is. It seems to me like it could go one of a few ways:

  1. ship the transaction's min timestamp around through the LogicalOpLog with the rest of the transaction metadata. This would have compatibility issues with v19.1, so at some level we'll still need this, although I guess we can let the missing min timestamp case fall through to SynthesizeTxnFromMeta, which already needs to handle it. Interestingly, this is essentially what we'd get for free if the transaction ID included its minimum timestamp, which is the long term goal for this series of changes.
  2. don't set any min timestamp here and formally support txn pushes without one specified by falling back to the old approach of using the write timestamp as the min timestamp when one is not specified, which essentially makes the current compatibility hack set in stone.
  3. don't set any min timestamp here and formally support txn pushes without one specified by considering missing records to be aborted. I'm not sure if this works in practice, but we could probably make it work.
  4. something else I haven't yet considered.

I decided to go with the first option. The migration story ends up being almost the same as it is for normal PushTxn requests and the fact that this will come for free with a change to the transaction ID format was decently compelling. The new field also parallels the existing TxnKey field, which is nice. PTAL.

…nsactions

This was only needed for compatibility with 2.1 nodes.

Release note: None
This isn't needed by v19.1 or v19.2 nodes. It was only added to avoid
causing v2.1 nodes to panic. See 833a827.

Release note: None
Before this change, zerofields only traversed the top-level struct.
This change adjusts the library to traverse all the way down nested
structs when performing its verification.

Release note: None
Fixes cockroachdb#36089.
Informs cockroachdb#37199.

This commit addresses the second concern from cockroachdb#37199 (comment)
by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's
minimum timestamp. This allows pushers to have perfect accuracy
into whether an intent is part of a transaction that can eventually
be committed or whether it has been aborted by any other pusher and
uncommittable.

This allows us to get rid of the false positive cases where a pusher
incorrectly detects that a transaction is still active and begins
waiting on it. In this worst case, this could lead to transactions
waiting for the entire transaction expiration for a contending txn.

Release note: None
This cleans up the handling of minimum timestamp in the rangefeed processor.

The migration story here is relatively straightforward. If an MVCCWriteIntentOp
doesn't have a MinTimestamp then we know that we're in one of the following
three cases:

1. the coordinator for the transaction is a v19.1 node and did
   not assign a MinTimestamp to its transaction
2. an initResolvedTSScan found an intent without a MinTimestamp
3. the leaseholder that created the operation is a v19.1 node and
   did not propagate the transaction's MinTimestamp. The coordinator
   may be a v19.1 or v19.2 node

The first and second case imply that the transaction coordinator will never
interact with a v20.1 node. This means that the migration story discussed in
SynthesizeTxnFromMeta will work well. The third case is more complicated
because the leaseholder who proposed the MVCCWriteIntentOp might not be around
any more. This means that by the time the MVCCWriteIntentOp results in a PushTxn
request, the request may go to a v20.1 node even when the transaction's coordinator
is still alive. In that case, the migration in SynthesizeTxnFromMeta isn't
exactly what we want, but it also won't cause correctness issues. The worst
thing that can happen is that we'll abort the transaction, which we were likely
already going to do anyway.

Release note: None
…tedMinTimestamp

This also adds a comment about the migration plan.
We can't address this until v20.1.

Release note: None
The change also adjusts the COCKROACH_TXN_LIVENESS_HEARTBEAT_MULTIPLIER
and the minQPS to levels that more easily triggered stalls before the
rest of this PR.

Release note: None
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewed 28 of 28 files at r21, 1 of 1 files at r22, 8 of 8 files at r23, 3 of 3 files at r24, 22 of 22 files at r25, 11 of 11 files at r26, 7 of 7 files at r27, 2 of 2 files at r28, 1 of 1 files at r29.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/rangefeed/processor.go, line 272 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I decided to go with the first option. The migration story ends up being almost the same as it is for normal PushTxn requests and the fact that this will come for free with a change to the transaction ID format was decently compelling. The new field also parallels the existing TxnKey field, which is nice. PTAL.

I was leaning towards option 1 as well when I skimmed this comment last night. Thanks for making the change.

@nvanbenschoten
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Jul 11, 2019
38782: storage: persist minimum transaction timestamps in intents r=nvanbenschoten a=nvanbenschoten

Fixes #36089.
Informs #37199.

This commit addresses the second concern from #37199 (comment) by implementing its suggestion #1.

It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable.

This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn.

@tbg I'm assigning you because you reviewed most of the lazy transaction record stuff (especially #33523), but let me know if you'd like me to find a different reviewer.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jul 11, 2019

Build succeeded

@craig craig bot merged commit 0d91411 into cockroachdb:master Jul 11, 2019
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/cqErr3 branch July 18, 2019 23:28
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jul 23, 2019
…mestamp

Fixes cockroachdb#39008.

This was missed in cockroachdb#38782. I must have been thinking about the behavior
we'll want to switch to in v20.1. Luckily, the issue was easily caught
by `tpcc/mixed-headroom/n5cpu16`.

Release note: None
craig bot pushed a commit that referenced this pull request Jul 23, 2019
39055: storage: remain compatible with transactions missing their minimum timestamp r=tbg a=nvanbenschoten

Fixes #39008.

This was missed in #38782. I must have been thinking about the behavior
we'll want to switch to in v20.1. Luckily, the issue was easily caught
by `tpcc/mixed-headroom/n5cpu16`.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
spaskob pushed a commit to spaskob/cockroach that referenced this pull request Aug 1, 2019
…mestamp

Fixes cockroachdb#39008.

This was missed in cockroachdb#38782. I must have been thinking about the behavior
we'll want to switch to in v20.1. Luckily, the issue was easily caught
by `tpcc/mixed-headroom/n5cpu16`.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: kv/contention/nodes=4 failed
3 participants