-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a corner case issue that causes inconsistent Coordinator states when lazy recovery happens before group commit #2135
Conversation
by group commit and lazy recovery
@komamitsu Overall, LGTM. Thanks. One question. For future optimization, can we perform the two writes |
@brfrn169 That's a really good point. Concurrent writes for the two records might result in the following situation:
In other words,
|
@komamitsu Thanks. I understand we can't perform the two writes concurrently. |
@komamitsu Sorry, one more question. If we put the child ID as a clustering key column of the record in the coordinator table, would the logic be simpler than the current one? |
@brfrn169 Thanks for the question. I think it depends on if ScalarDB supports INSERT operation with multiple conditions. If it's possible,
My understanding is it's not possible for now. So, I don't think it's beneficial to use a clustering key for child ID in terms of this case. |
@komamitsu We can do that as follows: storage.mutate(
Arrays.asList(
Put.newBuilder()
.namespace("coordinator")
.table("state")
.partitionKey(Key.ofText("tx_id", "p99"))
.clusteringKey(Key.ofText("tx_sub_id", "c1"))
.intValue("state", TransactionState.ABORTED.get())
.condition(ConditionBuilder.putIfNotExists())
.build(),
Put.newBuilder()
.namespace("coordinator")
.table("state")
.partitionKey(Key.ofText("tx_id", "p99"))
.clusteringKey(Key.ofText("tx_sub_id", "null"))
.intValue("state", TransactionState.ABORTED.get())
.condition(ConditionBuilder.putIfNotExists())
.build())); ScalarDB Storage API can atomically mutate multiple records within a partition. |
@brfrn169 Oh, I didn't know Storage API supports multiple atomic mutations. I just looked over the implementations and they seem good.
I was thinking of inserting a single record to |
@komamitsu Sorry for the late reply.
Okay, Thanks. Let's discuss it offline later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the offline discussion! As discussed, since having tx_child_ids
as a clustering key in the coordinator.state
table doesn't have significant advantages, I think the current implementation is the best approach for now. Thank you!
@komamitsu @brfrn169 I don't understand why we need to write two records, if using child ID as a clustering key column.
Logically, the following information is enough to indicate that the group (p99) and the transaction (p99:c1) are committed.
Can you clarify? |
@feeblefakie Your understanding is correct from the perspective of normal operation without considering conflicts between original commits and lazy recoveries.
This is needed to be issued by lazy recoveries only for conflicting with the original commit. Let's see the above 2 cases in the description again. A. The original commit with
B. The original commit with
If we only insert a single record with full-id-like sub ID ( Does that answer your question? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@komamitsu Thank you for the explanation! I understood the fix, but I wondered if there could be better/simpler ways to handle it. That is because the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
(very sorry for the late reply)
…hen lazy recovery happens before group commit (#2135)
…hen lazy recovery happens before group commit (#2135)
Description
We found the following issue by Jepsen test that could happen in the current implementation of the group commit feature:
p99:c1
wherep99
is a parent ID andc1
is a child IDp99
tx_id: p99:c1
) created by Tx1tx_id: p99:c1, tx_child_ids:{}, tx_state: ABORTED
to the Coordinator table to rollback the transactiontx_id: p99, tx_child_ids:{c1}, tx_state: COMMITTED
to the Coordinator table. This record's partition key doesn't conflict with the existing record inserted by the lazy recoveryThis PR fixes the issue. For details, see
Additional notes
below.Related issues and/or PRs
None
Changes made
tx_id: <parent tx ID>, tx_child_ids:{}, tx_state: ABORTED
before inserting a recordtx_id: p99:c1, tx_child_ids:{}, tx_state: ABORTED
that the existing lazy recovery does to conflict with a record insertion by the original commitChecklist
Additional notes (optional)
Lazy recoveries don't know which the transaction that created the PREPARE record is using, a parent ID or a full ID as
tx_id
partition key.Case a) If a transaction becomes "ready for commit" in time, it'll be committed in a group with
tx_id: <parent tx ID>
.Case b) If a transaction is delayed, it'll be committed in an isolated group with a full ID as
tx_id: <full tx ID>
.If lazy recoveries only insert a record with
tx_id: <full tx ID>
to abort the transaction, it will not conflict with the group commit usingtx_id: <parent tx ID>
in case #a. Therefore, lazy recoveries first need to insert a record withtx_id: <parent tx ID>
and emptytx_child_ids
to the Coordinator table. We'll call this insertionlazy-recovery-abort-with-parent-id
. This record is intended to conflict with a potential group commit considering case#1, even though it doesn't help in finding the coordinator state sincetx_child_ids
is empty.Once the record insertion with
tx_id: <parent tx ID>
succeeds, the lazy recovery will insert another record withtx_id: <full tx ID>
. We'll call this insertionlazy-recovery-abort-with-full-id
. This record insertion is needed to conflict with a potential delayed group commit that hastx_id: <full tx ID>
in case #b, and indicates the transaction is aborted.Let's walk through all the cases.
A. The original commit with
tx_id: <parent tx ID>
succeeds in case #a, and then lazy recovery happenstx_id: <parent tx ID>
succeedslazy-recovery-abort-with-parent-id
failstx_child_ids
contains the transaction child IDB. The original commit with
tx_id: <parent tx ID>
is in-progress in case #a, and lazy recovery happens firstlazy-recovery-abort-with-parent-id
succeedstx_id: <parent tx ID>
failslazy-recovery-abort-with-full-id
later)lazy-recovery-abort-with-full-id
succeedslazy-recovery-abort-with-full-id
C. The original commit with
tx_id: <full tx ID>
is done in case #b, and then lazy recovery happenstx_id: <full tx ID>
succeedslazy-recovery-abort-with-parent-id
succeedslazy-recovery-abort-with-full-id
failstx_id
is the transaction full IDD. The original commit with
tx_id: <full tx ID>
is in-progress in case #b, and lazy recovery happens firstlazy-recovery-abort-with-parent-id
succeedslazy-recovery-abort-with-full-id
succeedstx_id: <full tx ID>
failslazy-recovery-abort-with-full-id
Release notes
Fixed a corner case issue that causes inconsistent Coordinator states when lazy recovery happens before group commit