-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: remove synthetic timestamps #101938
Comments
nvanbenschoten
added
C-cleanup
Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.
A-kv
Anything in KV that doesn't belong in a more specific category.
T-kv
KV Team
labels
Apr 20, 2023
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jun 25, 2023
Informs cockroachdb#101938. This commit removes logic in mvcc key encoding routines that handle synthetic timestamps. As a result, we no longer write keys with synthetic timestamps, though we retain the ability to decode them. As described in cockroachdb#72121 (comment) and later in cockroachdb@24c56df (see "Future improvements"), the introduction of the mvcc value header and the optional, per-version local timestamp paved the way for the removal of synthetic timestamps. MVCC keys no longer need to carry the synthetic bit in order for reads from GLOBAL tables to behave properly. As a result, we no longer need to write it. Release note: None
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jun 25, 2023
Informs cockroachdb#101938. This commit removes logic in mvcc key decoding routines to decode synthetic timestamps. We retain the ability to decode keys with the synthetic timestamp bit set, but we simply ignore its presence. As discussed in the previous commit, the role of these synthetic timestamp markers was eliminated in cockroachdb#80706 by the local_timestamp field in the mvcc value header, which was first present in v22.2. v23.2 does not require compatibility with v22.2, so it can rely on the fact that any txn that has a synthetic timestamp (because it writes in the future) will also write local timestamps into each of its values. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jul 10, 2023
105523: kv: stop encoding or decoding synthetic timestamp bit in/from mvcc keys r=sumeerbhola a=nvanbenschoten Informs #101938. This first commit removes logic in mvcc key encoding routines that handle synthetic timestamps. As a result, we no longer write keys with synthetic timestamps, though we retain the ability to decode them. The second commit removes logic in mvcc key decoding routines to decode synthetic timestamps. We retain the ability to decode keys with the synthetic timestamp bit set, but we simply ignore its presence. As described in #72121 (comment) and later in 24c56df (see "Future improvements"), the introduction of the mvcc value header and the optional, per-version local timestamp paved the way for the removal of synthetic timestamps. MVCC keys no longer need to carry the synthetic bit in order for reads from GLOBAL tables to behave properly. As a result, we no longer need to write it. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jul 19, 2023
…mestamp Fixes cockroachdb#106569. In 3773994, we stopped decoding the synthetic timestamp bit from MVCC keys. When doing so, we removed logic from `MVCCValue.GetLocalTimestamp` that handles synthetic timestamps. In doing so, we missed the fact that the timestamp provided to `MVCCValue.GetLocalTimestamp` can also come from `MVCCMetadata.Timestamp`, which has not yet been stripped of synthetic timestamps. Eventually we will get to that as part of cockroachdb#101938, but for now, we restore this handling. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jul 20, 2023
107195: storage: restore synthetic timestamp handling in MVCCValue.GetLocalTimestamp r=sumeerbhola a=nvanbenschoten Fixes #106569. In 3773994, we stopped decoding the synthetic timestamp bit from MVCC keys. When doing so, we removed logic from `MVCCValue.GetLocalTimestamp` that handles synthetic timestamps. In doing so, we missed the fact that the timestamp provided to `MVCCValue.GetLocalTimestamp` can also come from `MVCCMetadata.Timestamp`, which has not yet been stripped of synthetic timestamps. Eventually, we will get to that as part of #101938, but for now, we restore this handling. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jan 9, 2024
Informs cockroachdb#101938. Release note: None
craig bot
pushed a commit
that referenced
this issue
Jan 10, 2024
115888: sql: update pausable portal todo r=rharding6373 a=rharding6373 Epic: None Informs: #115887 Release note: None 116830: hlc: remove the Synthetic field from Timestamp and LegacyTimestamp r=erikgrinaker a=nvanbenschoten Closes #101938. This PR completes the work to remove the `Synthetic` field from `Timestamp` and `LegacyTimestamp`. It removes the remaining uses, removes the fields from the proto definitions, and removes all access to the fields in methods. Release note: None 117429: revertccl: ALTER VIRTUAL CLUSTER RESET DATA r=dt a=dt This enables resetting a virtual cluster's data to a prior timestamp. This is possible if the prior timestamp is still retained in the mvcc history of the virtual cluster, the virtual cluster has stopped service, and is run by a user with the MANAGEVIRTUALCLUSTER (or admin) privilege in the system tenant. Revisions of data in the system tenant newer than the target time to which it is being reset are destroyed, reverting the tenant to the state it was in as of the time reverted to. Destroyed revisions are not recoverable; once a tenant has been reset to a timestamp, it cannot be 'un-reset' back to a higher timestamp. Release note (cluster virtualization): Added a new 'flashback' command to revert a virtual cluster to an earlier state using ALTER VIRTUAL CLUSTER RESET DATA. Epic: CRDB-34233. 117541: storage: fix a series of intent resolution bugs with ignored seq nums r=nvanbenschoten a=miraradeva Previously, the logic in mvccResolveWriteIntent was structured in such a way that if an intent contained both ignored and non-ignored seq nums in its intent history, the intent may end up being updated instead of aborted or unmodified (see examples in 9f00f2a5505). This commit fixes the bugs by ensuring that the intent history is modified only when an intent resolution update is not aborted, and the update and the actual intent have the same epoch. Fixes: #117553 Release note: None 117563: distsql: improve columnar operator test harness for decimals r=yuzefovich a=yuzefovich We recently merged a change to add decimals with different numbers of trailing zeroes in the "interesting datums" set, and it made some existing tests fail because they used direct string comparison for equality. This commit adjusts the test harness to be smarter for decimals. Fixes: #117543. Release note: None Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Mira Radeva <mira@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
msbutler
pushed a commit
to msbutler/cockroach
that referenced
this issue
Jan 10, 2024
Informs cockroachdb#101938. This commit removes the handling of synthetic timestamps from the the lock-table waiter. The lock-table waiter used to handle the synthetic timestamp bit in two ways: 1. if set, it would propagate it on txn pushes 2. if set, it would would push txns above the local HLC clock, because observed timestamps from the clock would not be usable to avoid uncertainty with intents written at (or pushed to) synthetic timestamps. Neither of these behaviors are necessary anymore. We don't need to propagate the flag, because it has been deprecated since v22.2 and is no longer consulted in uncertainty interval checks or by transaction commit-wait. We also don't need to push intents above the local HLC, because observed timestamps can now be used to avoid uncertainty with intents up to the intent's local timestamp, which will be set to the local HLC from before the push (see ClockWhilePending). Release note: None
mgartner
pushed a commit
to mgartner/cockroach
that referenced
this issue
Jan 12, 2024
…y checks Informs cockroachdb#101938. This commit bumps the ReplicaChecksumVersion to disable replica consistency checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This avoids the backwards incompatibility discussed in cockroachdb#117302. While here and permitted to change the replica consistency check logic, we unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during stats-only consistency checks. This form of consistency check is rarely used, but this prevents it from causing trouble with cockroachdb#101938. While here and allowed to change the consistency check hash computation, we also switch from using the LegacyTimestamp encoding to the Timestamp encoding for the hash contribution of MVCCKeys. Release note: None
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 12, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 12, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 13, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
jbowens
added a commit
to jbowens/cockroach
that referenced
this issue
Sep 17, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
craig bot
pushed a commit
that referenced
this issue
Sep 18, 2024
130453: logictest: revert incorrect test assertion update r=rafiss a=michae2 (Deja vu: this is #121556 all over again.) 103bd54 incorrectly updated the test expectations, likely because the `--rewrite` flag was used on an assertion that has the retry directive. This commit undoes that change. Fixes: #130405 Release note: None 130572: storage: GC range keys by unsetting identical suffixes r=jbowens a=jbowens In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592. 130906: sql: deflake TestValidationWithProtectedTS r=rafiss a=rafiss This test does not work if ranges get split, so we disable the split queue. fixes #130715 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
jbowens
added a commit
that referenced
this issue
Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Synthetic timestamps have been deprecated since v22.2 and the introduction of
ClockTimestamps
. We can now remove them.Jira issue: CRDB-27190
The text was updated successfully, but these errors were encountered: