Skip to content

kv: eliminate write-too-old deferral mechanism #102751

@nvb

Description

@nvb

Blind-writes can currently defer the handling of WriteTooOld errors on write-write conflicts until after they have written their intent. This serves as a form of pessimistic locking to avoid starvation in the case of contending blind writes to the same key. The mechanism was introduced in CockroachDB early in its life, unintentionally removed in #38668, and then revived by #44654 (issue: #44653).

This mechanism has been confusing, complex, and error-prone. For example, it requires mvccPutInternal and its callers to perform all side-effects (e.g. populate WriteBatches) even if a WriteTooOld error has been thrown, in case it is deferred above in evaluateBatch. We've wanted to remove this mechanism since at least 79c711d#diff-c9f3b8fbff25265dd51555fa7099eae566e5904c4e158332afba5551332ca5bcR293, but the time wasn't right.

The mechanism is also problematic for weaker isolation levels. Specifically, it interacts poorly with parallel commits, conflating read-write conflicts with write-write conflicts and permitting the following bug:

1. read on "a" @ 10
2. read on "b" @ 10
3. parallel commit write "a" @ 10, write "b" @ 10; [{Put("a"), EndTxn}, {Put("b")}]
4. write "a" gets timestamp cache pushed to 15, stages because of weak isolation level
5. write "b" gets write-too-old pushed to 14
6. implicit commit @ 15 with write-write conflict. Uh oh!

Since the deferral mechanism was re-introduced, CockroachDB's KV layer has evolved in two ways which obviate the need for it:

  • Implicit and explicit SELECT FOR UPDATE. Acquiring locks earlier in the lifecycle of an UPDATE/UPSERT statement avoids the kinds of starvation issues that the deferral mechanism was meant to solve. Transaction contention is resolved ahead of the intent write, so the intent write does not typically hit a WriteTooOld error and does not need to write an intent in the presence of a write-write conflict just to avoid starvation.
  • Server-side refreshes. Many transactions are able to ignore a WriteTooOld error through a less error-prone server-side mechanism called server-side refreshes. This mechanism catches errors that could cause a transaction to retry and adjusts the transaction's timestamp in response, under latches.

With these two mechanisms in place, we are ready to eliminate the write-too-old deferral mechanism.

Jira issue: CRDB-27634

Metadata

Metadata

Assignees

Labels

A-kv-transactionsRelating to MVCC and the transactional model.A-read-committedRelated to the introduction of Read CommittedC-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions