Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
61110: kv: implement commit-wait for future time transaction commits r=nvanbenschoten a=nvanbenschoten Fixes #57687. Related to #52745. This PR introduces a "commit-wait" sleep stage after a transaction commits, which is entered if doing so is deemed necessary for consistency. By default, commit-wait is only necessary for transactions that commit with a future-time timestamp that leads the local HLC clock. This is because CockroachDB's consistency model depends on all transactions waiting until their commit timestamp is below their gateway clock. In doing so, transactions ensure that at the time that they complete, all other clocks in the system (i.e. on all possible gateways) will be no more than the max_offset below the transaction's commit timestamp. This property ensures that all causally dependent transactions will have an uncertainty interval (see GlobalUncertaintyLimit) that exceeds the original transaction's commit timestamp, preventing stale reads. Without the wait, it would be possible for a read-write transaction to write a future-time value and then for a causally dependent transaction to read below that future-time value, violating "read your writes". The property must also hold for read-only transactions, which may have a commit timestamp in the future due to an uncertainty restart after observing a future-time value in their uncertainty interval. In such cases, the property that the transaction must wait for the local HLC clock to exceed its commit timestamp is not necessary to prevent stale reads, but it is necessary to ensure monotonic reads. Without the wait, it would be possible for a read-only transaction coordinated on a gateway with a fast clock to return a future-time value and then for a causally dependent read-only transaction coordinated on a gateway with a slow clock to read below that future-time value, violating "monotonic reads". In practice, most transactions do not need to wait at all, because their commit timestamps were pulled from an HLC clock (i.e. are not synthetic) and so they will be guaranteed to lead the local HLC's clock, assuming proper HLC time propagation. Only transactions whose commit timestamps were pushed into the future will need to wait, like those who wrote to a global_read range and got bumped by the closed timestamp or those who conflicted (write-read or write-write) with an existing future-time value. However, CockroachDB also supports a stricter model of consistency through its "linearizable" flag. When in linearizable mode (also known as "strict serializable" mode), all writing transactions (but not read-only transactions) must wait an additional max_offset after committing to ensure that their commit timestamp is below the current HLC clock time of any other node in the system. In doing so, all causally dependent transactions are guaranteed to start with higher timestamps, regardless of the gateway they use. This ensures that all causally dependent transactions commit with higher timestamps, even if their read and writes sets do not conflict with the original transaction's. This obviates the need for uncertainty intervals and prevents the "causal reverse" anamoly which can be observed by a third, concurrent transaction. For more, see https://www.cockroachlabs.com/blog/consistency-model/ and [docs/RFCS/20200811_non_blocking_txns.md](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20200811_non_blocking_txns.md). ---- The PR also fixes a bug by properly marking read-only txn as aborted on rollback, which was missed in a85115a. We were assuming that all calls to `commitReadOnlyTxnLocked` were for `EndTxn` requests with the Commit flag set to true, but this is not the case. This was not only confusing, but it was also leading to the `txn.commit` metric being incremented on rollback of a read-only transaction, instead of the `txn.aborts` metric. Release justification: New functionality. 61170: kvserver: remove `kv.atomic_replication_changes.enabled` setting r=aayushshah15 a=aayushshah15 This setting was added in 19.2 to provide a fallback against atomic replication changes. They've now been a part of CockroachDB for over 3 releases. They're also a requirement for non-voting replicas. Release note (backward-incompatible change): Removed the `kv.atomic_replication_changes.enabled` cluster setting. All replication changes on a range now use joint-consensus. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
- Loading branch information