Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: remove synthetic timestamps #101938

Closed
nvanbenschoten opened this issue Apr 20, 2023 · 0 comments · Fixed by #116830
Closed

kv: remove synthetic timestamps #101938

nvanbenschoten opened this issue Apr 20, 2023 · 0 comments · Fixed by #116830
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Apr 20, 2023

Synthetic timestamps have been deprecated since v22.2 and the introduction of ClockTimestamps. We can now remove them.

Jira issue: CRDB-27190

@nvanbenschoten nvanbenschoten added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv Anything in KV that doesn't belong in a more specific category. T-kv KV Team labels Apr 20, 2023
@nvanbenschoten nvanbenschoten self-assigned this Apr 20, 2023
craig bot pushed a commit that referenced this issue Apr 26, 2023
102155: kv: delete `(hlc.Timestamp).DeprecatedTryToClockTimestamp` r=nvanbenschoten a=nvanbenschoten

Informs #101938.

This function has been possible to delete since v23.1.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 25, 2023
Informs cockroachdb#101938.

This commit removes logic in mvcc key encoding routines that handle
synthetic timestamps. As a result, we no longer write keys with
synthetic timestamps, though we retain the ability to decode them.

As described in cockroachdb#72121 (comment)
and later in cockroachdb@24c56df
(see "Future improvements"), the introduction of the mvcc value header
and the optional, per-version local timestamp paved the way for the
removal of synthetic timestamps. MVCC keys no longer need to carry the
synthetic bit in order for reads from GLOBAL tables to behave properly.
As a result, we no longer need to write it.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 25, 2023
Informs cockroachdb#101938.

This commit removes logic in mvcc key decoding routines to decode
synthetic timestamps. We retain the ability to decode keys with the
synthetic timestamp bit set, but we simply ignore its presence.

As discussed in the previous commit, the role of these synthetic
timestamp markers was eliminated in cockroachdb#80706 by the local_timestamp field
in the mvcc value header, which was first present in v22.2. v23.2 does
not require compatibility with v22.2, so it can rely on the fact that
any txn that has a synthetic timestamp (because it writes in the future)
will also write local timestamps into each of its values.

Release note: None
craig bot pushed a commit that referenced this issue Jul 10, 2023
105523: kv: stop encoding or decoding synthetic timestamp bit in/from mvcc keys r=sumeerbhola a=nvanbenschoten

Informs #101938.

This first commit removes logic in mvcc key encoding routines that handle synthetic timestamps. As a result, we no longer write keys with synthetic timestamps, though we retain the ability to decode them.

The second commit removes logic in mvcc key decoding routines to decode synthetic timestamps. We retain the ability to decode keys with the synthetic timestamp bit set, but we simply ignore its presence.

As described in #72121 (comment) and later in 24c56df (see "Future improvements"), the introduction of the mvcc value header and the optional, per-version local timestamp paved the way for the removal of synthetic timestamps. MVCC keys no longer need to carry the synthetic bit in order for reads from GLOBAL tables to behave properly. As a result, we no longer need to write it.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jul 19, 2023
…mestamp

Fixes cockroachdb#106569.

In 3773994, we stopped decoding the synthetic timestamp bit from MVCC
keys. When doing so, we removed logic from `MVCCValue.GetLocalTimestamp`
that handles synthetic timestamps. In doing so, we missed the fact that
the timestamp provided to `MVCCValue.GetLocalTimestamp` can also come
from `MVCCMetadata.Timestamp`, which has not yet been stripped of
synthetic timestamps. Eventually we will get to that as part of cockroachdb#101938,
but for now, we restore this handling.

Release note: None
craig bot pushed a commit that referenced this issue Jul 20, 2023
107195: storage: restore synthetic timestamp handling in MVCCValue.GetLocalTimestamp r=sumeerbhola a=nvanbenschoten

Fixes #106569.

In 3773994, we stopped decoding the synthetic timestamp bit from MVCC keys. When doing so, we removed logic from `MVCCValue.GetLocalTimestamp` that handles synthetic timestamps. In doing so, we missed the fact that the timestamp provided to `MVCCValue.GetLocalTimestamp` can also come from `MVCCMetadata.Timestamp`, which has not yet been stripped of synthetic timestamps. Eventually, we will get to that as part of #101938, but for now, we restore this handling.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jan 9, 2024
craig bot pushed a commit that referenced this issue Jan 10, 2024
115888: sql: update pausable portal todo r=rharding6373 a=rharding6373

Epic: None
Informs: #115887

Release note: None

116830: hlc: remove the Synthetic field from Timestamp and LegacyTimestamp r=erikgrinaker a=nvanbenschoten

Closes #101938.

This PR completes the work to remove the `Synthetic` field from `Timestamp` and `LegacyTimestamp`. It removes the remaining uses, removes the fields from the proto definitions, and removes all access to the fields in methods.

Release note: None



117429: revertccl: ALTER VIRTUAL CLUSTER RESET DATA r=dt a=dt

This enables resetting a virtual cluster's data to a prior timestamp.

This is possible if the prior timestamp is still retained in the mvcc
history of the virtual cluster, the virtual cluster has stopped service,
and is run by a user with the MANAGEVIRTUALCLUSTER (or admin) privilege
in the system tenant.

Revisions of data in the system tenant newer than the target time to
which it is being reset are destroyed, reverting the tenant to the state
it was in as of the time reverted to. Destroyed revisions are not
recoverable; once a tenant has been reset to a timestamp, it cannot be
'un-reset' back to a higher timestamp.

Release note (cluster virtualization): Added a new 'flashback' command
to revert a virtual cluster to an earlier state using ALTER VIRTUAL
CLUSTER RESET DATA.


Epic: CRDB-34233.

117541: storage: fix a series of intent resolution bugs with ignored seq nums r=nvanbenschoten a=miraradeva

Previously, the logic in mvccResolveWriteIntent was structured in such a
way that if an intent contained both ignored and non-ignored seq nums
in its intent history, the intent may end up being updated instead of
aborted or unmodified (see examples in 9f00f2a5505).

This commit fixes the bugs by ensuring that the intent history is
modified only when an intent resolution update is not aborted, and the
update and the actual intent have the same epoch.

Fixes: #117553

Release note: None

117563: distsql: improve columnar operator test harness for decimals r=yuzefovich a=yuzefovich

We recently merged a change to add decimals with different numbers of trailing zeroes in the "interesting datums" set, and it made some existing tests fail because they used direct string comparison for equality. This commit adjusts the test harness to be smarter for decimals.

Fixes: #117543.

Release note: None

Co-authored-by: rharding6373 <rharding6373@users.noreply.github.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Mira Radeva <mira@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in 62ebe56 Jan 10, 2024
msbutler pushed a commit to msbutler/cockroach that referenced this issue Jan 10, 2024
Informs cockroachdb#101938.

This commit removes the handling of synthetic timestamps from the the
lock-table waiter. The lock-table waiter used to handle the synthetic
timestamp bit in two ways:
1. if set, it would propagate it on txn pushes
2. if set, it would would push txns above the local HLC clock, because
   observed timestamps from the clock would not be usable to avoid
   uncertainty with intents written at (or pushed to) synthetic timestamps.

Neither of these behaviors are necessary anymore. We don't need to
propagate the flag, because it has been deprecated since v22.2 and is no
longer consulted in uncertainty interval checks or by transaction
commit-wait. We also don't need to push intents above the local HLC,
because observed timestamps can now be used to avoid uncertainty with
intents up to the intent's local timestamp, which will be set to the
local HLC from before the push (see ClockWhilePending).

Release note: None
mgartner pushed a commit to mgartner/cockroach that referenced this issue Jan 12, 2024
…y checks

Informs cockroachdb#101938.

This commit bumps the ReplicaChecksumVersion to disable replica consistency
checks between v23.2 and v24.1 nodes when in a mixed-version cluster. This
avoids the backwards incompatibility discussed in cockroachdb#117302.

While here and permitted to change the replica consistency check logic, we
unset the Synthetic flag from RangeAppliedState.RaftClosedTimestamp during
stats-only consistency checks. This form of consistency check is rarely used,
but this prevents it from causing trouble with cockroachdb#101938.

While here and allowed to change the consistency check hash computation, we
also switch from using the LegacyTimestamp encoding to the Timestamp encoding
for the hash contribution of MVCCKeys.

Release note: None
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 12, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence cockroachdb#130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs cockroachdb#129592.
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 12, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence cockroachdb#130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs cockroachdb#129592.
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 13, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence cockroachdb#130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs cockroachdb#129592.
jbowens added a commit to jbowens/cockroach that referenced this issue Sep 17, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence cockroachdb#130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs cockroachdb#129592.
craig bot pushed a commit that referenced this issue Sep 18, 2024
130453: logictest: revert incorrect test assertion update r=rafiss a=michae2

(Deja vu: this is #121556 all over again.)

103bd54 incorrectly updated the test expectations, likely because the `--rewrite` flag was used on an assertion that has the retry directive.

This commit undoes that change.

Fixes: #130405

Release note: None

130572: storage: GC range keys by unsetting identical suffixes r=jbowens a=jbowens

In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it.

In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys.

In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs #129592.

130906: sql: deflake TestValidationWithProtectedTS r=rafiss a=rafiss

This test does not work if ranges get split, so we disable the split queue.

fixes #130715
Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In #129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In #129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence #130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting #101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs #129592.
blathers-crl bot pushed a commit that referenced this issue Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In #129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In #129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence #130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting #101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs #129592.
blathers-crl bot pushed a commit that referenced this issue Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In #129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In #129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence #130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting #101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs #129592.
jbowens added a commit that referenced this issue Sep 18, 2024
In CockroachDB's key encoding some keys have multiple logically equivalent but
physically distinct encodings. Most notably, in CockroachDB versions 23.2 and
earlier keys written to global tables encoded MVCC timestamps with a
'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this
synthetic bit, transparently ignoring it.

In #129592 we observed the existence of a bug in the CockroachDB comparator
when comparing two MVCC timestamp suffixes, specifically outside the context of
a full MVCC key. The comparator failed to consider a timestamp with the
synthetic bit and a timestamp without the synthetic bit as logically
equivalent. There are limited instances where Pebble uses the comparator to
compare "bare suffixes," and all instances are constrained to the
implementation of range keys.

In #129592 it was observed that the comparator bug could prevent the garbage
collection of MVCC delete range tombstones (the single use of range keys within
CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range
tombstone with a timestamp encoding the synthetic bit. If the cluster
subsequently upgraded to 24.1 or later, the code path to clear range keys
stopped understanding synthetic bits and wrote range key unset tombstones
without the synthetic bit set. Due to the comparator bug, Pebble did not
consider these timestamp suffixes equal and the unset was ineffective.

We initially attempted to fix this issue by fixing the comparator, but
inadvertently introduced the possibility of replica divergence #130533 by
changing the semantics of LSM state below raft.

This commit works around this comparator bug by adapting ClearMVCCRangeKey to
write range key unsets using the verbatim suffix that was read from the storage
engine. To avoid reverting #101938 and re-introducing knowledge of the
synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of
the encoded timestamp suffix when reading range keys from storage engine
iterators. If later an attempt is made to clear the range key through
ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding
the timestamp. Through avoiding the decoding/encoding roundtrip,
ClearMVCCRangeKey ensures that the suffixes it writes are identical to the
range keys that exist on disk, even if they encode a synthetic bit.

Release note (bug fix): Fixes a bug that could result in the inability to
garbage collect a MVCC range tombstone within a global table.
Epic: none
Informs #129592.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-kv KV Team
Projects
No open projects
Status: Closed
Development

Successfully merging a pull request may close this issue.

1 participant