-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: replica inconsistency after upgrade to v24.2.1 #130533
Comments
Hi @RaduBerinde, please add branch-* labels to identify which branch(es) this C-bug affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Note the reopened issue #129592 will be used to track an alternate fix for that issue. This issue will track undoing the fix on master and working out any guidance we need to provide for clusters already upgraded to v24.2.1. |
The condition for a synthetic timestamp to exist in a cluster is whether a global table has been used. Any write to a global table (including the rangedel to clear the table when dropped) will get a synthetic bit in versions <= 23.2. (thanks @nvanbenschoten) |
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
This issue also tracks backing out the range key timestamp comparison behavior from master, which is more tricky than a simple revert. I am working on that but I will be out next week; will have a PR early after that. |
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In cockroachdb#101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In cockroachdb#129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In cockroachdb#129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence cockroachdb#130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting cockroachdb#101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs cockroachdb#129592.
130453: logictest: revert incorrect test assertion update r=rafiss a=michae2 (Deja vu: this is #121556 all over again.) 103bd54 incorrectly updated the test expectations, likely because the `--rewrite` flag was used on an assertion that has the retry directive. This commit undoes that change. Fixes: #130405 Release note: None 130572: storage: GC range keys by unsetting identical suffixes r=jbowens a=jbowens In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592. 130906: sql: deflake TestValidationWithProtectedTS r=rafiss a=rafiss This test does not work if ranges get split, so we disable the split queue. fixes #130715 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com> Co-authored-by: Jackson Owens <jackson@cockroachlabs.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
In CockroachDB's key encoding some keys have multiple logically equivalent but physically distinct encodings. Most notably, in CockroachDB versions 23.2 and earlier keys written to global tables encoded MVCC timestamps with a 'synthetic bit.' In #101938 CockroachDB stopped encoding and decoding this synthetic bit, transparently ignoring it. In #129592 we observed the existence of a bug in the CockroachDB comparator when comparing two MVCC timestamp suffixes, specifically outside the context of a full MVCC key. The comparator failed to consider a timestamp with the synthetic bit and a timestamp without the synthetic bit as logically equivalent. There are limited instances where Pebble uses the comparator to compare "bare suffixes," and all instances are constrained to the implementation of range keys. In #129592 it was observed that the comparator bug could prevent the garbage collection of MVCC delete range tombstones (the single use of range keys within CockroachDB). A cluster running 23.2 or earlier may write a MVCC delete range tombstone with a timestamp encoding the synthetic bit. If the cluster subsequently upgraded to 24.1 or later, the code path to clear range keys stopped understanding synthetic bits and wrote range key unset tombstones without the synthetic bit set. Due to the comparator bug, Pebble did not consider these timestamp suffixes equal and the unset was ineffective. We initially attempted to fix this issue by fixing the comparator, but inadvertently introduced the possibility of replica divergence #130533 by changing the semantics of LSM state below raft. This commit works around this comparator bug by adapting ClearMVCCRangeKey to write range key unsets using the verbatim suffix that was read from the storage engine. To avoid reverting #101938 and re-introducing knowledge of the synthetic bit, the MVCCRangeKey data structures are adapted to retain a copy of the encoded timestamp suffix when reading range keys from storage engine iterators. If later an attempt is made to clear the range key through ClearMVCCRangeKey, this encoded timestamp suffix is used instead of re-encoding the timestamp. Through avoiding the decoding/encoding roundtrip, ClearMVCCRangeKey ensures that the suffixes it writes are identical to the range keys that exist on disk, even if they encode a synthetic bit. Release note (bug fix): Fixes a bug that could result in the inability to garbage collect a MVCC range tombstone within a global table. Epic: none Informs #129592.
This change allows `CompareSuffixes` to be stricter than `Compare` (when the prefixes are equal). This will allow reverting the CRDB comparer behavior to be consistent with previous releases (avoiding $replica inconsistency). Informs cockroachdb/cockroach#130533
This change allows `CompareSuffixes` to be stricter than `Compare` (when the prefixes are equal). This will allow reverting the CRDB comparer behavior to be consistent with previous releases (avoiding $replica inconsistency). Informs cockroachdb/cockroach#130533
The comparer changes effectively revert cockroachdb#128043, which can cause replica inconsistency during/after upgrades. Changes: * [`01dcf575`](cockroachdb/pebble@01dcf575) base: make comparer tolerate empty keys * [`d73ab80f`](cockroachdb/pebble@d73ab80f) db: allow excises to unconditionally be flushable ingests * [`b34a3937`](cockroachdb/pebble@b34a3937) base: allow CompareSuffixes to be stricter than Compare * [`90356021`](cockroachdb/pebble@90356021) db: refactor replayWAL to use flushes to make versionEdits Informs: cockroachdb#130533 Release note: none. Epic: none.
The comparer changes effectively revert cockroachdb#128043, which can cause replica inconsistency during/after upgrades. Changes: * [`0f785fec`](cockroachdb/pebble@0f785fec) metamorphic: abridge failure output * [`be56747f`](cockroachdb/pebble@be56747f) db: fix overlap check for flushable ingest excises * [`2569414a`](cockroachdb/pebble@2569414a) db: remove race in TestCrashOpenCrashAfterWALCreation * [`575f7a04`](cockroachdb/pebble@575f7a04) sstable: support columnar blocks in Layout.Describe * [`c88c7471`](cockroachdb/pebble@c88c7471) github: fix code cover publish workflow * [`c0fa4a9c`](cockroachdb/pebble@c0fa4a9c) sstable: populate CompareSuffixes on test4bSuffixComparer * [`3a76074f`](cockroachdb/pebble@3a76074f) sstable: set IndexPartitions property in columnar sstable writer * [`0595c1fb`](cockroachdb/pebble@0595c1fb) colblk: define behavior of KeyWriter.ComparePrev with no previous * [`01dcf575`](cockroachdb/pebble@01dcf575) base: make comparer tolerate empty keys * [`d73ab80f`](cockroachdb/pebble@d73ab80f) db: allow excises to unconditionally be flushable ingests * [`b34a3937`](cockroachdb/pebble@b34a3937) base: allow CompareSuffixes to be stricter than Compare * [`90356021`](cockroachdb/pebble@90356021) db: refactor replayWAL to use flushes to make versionEdits Informs: cockroachdb#130533 Release note: none. Epic: none.
The comparer changes effectively revert cockroachdb#128043, which can cause replica inconsistency during/after upgrades. Changes: * [`0f785fec`](cockroachdb/pebble@0f785fec) metamorphic: abridge failure output * [`be56747f`](cockroachdb/pebble@be56747f) db: fix overlap check for flushable ingest excises * [`2569414a`](cockroachdb/pebble@2569414a) db: remove race in TestCrashOpenCrashAfterWALCreation * [`575f7a04`](cockroachdb/pebble@575f7a04) sstable: support columnar blocks in Layout.Describe * [`c88c7471`](cockroachdb/pebble@c88c7471) github: fix code cover publish workflow * [`c0fa4a9c`](cockroachdb/pebble@c0fa4a9c) sstable: populate CompareSuffixes on test4bSuffixComparer * [`3a76074f`](cockroachdb/pebble@3a76074f) sstable: set IndexPartitions property in columnar sstable writer * [`0595c1fb`](cockroachdb/pebble@0595c1fb) colblk: define behavior of KeyWriter.ComparePrev with no previous * [`01dcf575`](cockroachdb/pebble@01dcf575) base: make comparer tolerate empty keys * [`d73ab80f`](cockroachdb/pebble@d73ab80f) db: allow excises to unconditionally be flushable ingests * [`b34a3937`](cockroachdb/pebble@b34a3937) base: allow CompareSuffixes to be stricter than Compare * [`90356021`](cockroachdb/pebble@90356021) db: refactor replayWAL to use flushes to make versionEdits Informs: cockroachdb#130533 Release note: none. Epic: none.
131366: go.mod: revert comparer change and bump Pebble to 0f785fec58c0 r=RaduBerinde a=RaduBerinde The comparer changes effectively revert #128043, which can cause replica inconsistency during/after upgrades. Changes: * [`0f785fec`](cockroachdb/pebble@0f785fec) metamorphic: abridge failure output * [`be56747f`](cockroachdb/pebble@be56747f) db: fix overlap check for flushable ingest excises * [`2569414a`](cockroachdb/pebble@2569414a) db: remove race in TestCrashOpenCrashAfterWALCreation * [`575f7a04`](cockroachdb/pebble@575f7a04) sstable: support columnar blocks in Layout.Describe * [`c88c7471`](cockroachdb/pebble@c88c7471) github: fix code cover publish workflow * [`c0fa4a9c`](cockroachdb/pebble@c0fa4a9c) sstable: populate CompareSuffixes on test4bSuffixComparer * [`3a76074f`](cockroachdb/pebble@3a76074f) sstable: set IndexPartitions property in columnar sstable writer * [`0595c1fb`](cockroachdb/pebble@0595c1fb) colblk: define behavior of KeyWriter.ComparePrev with no previous * [`01dcf575`](cockroachdb/pebble@01dcf575) base: make comparer tolerate empty keys * [`d73ab80f`](cockroachdb/pebble@d73ab80f) db: allow excises to unconditionally be flushable ingests * [`b34a3937`](cockroachdb/pebble@b34a3937) base: allow CompareSuffixes to be stricter than Compare * [`90356021`](cockroachdb/pebble@90356021) db: refactor replayWAL to use flushes to make versionEdits Informs: #130533 Release note: none. Epic: none. Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
The comparer changes effectively revert cockroachdb#128043, which can cause replica inconsistency during/after upgrades. Changes: * [`0f785fec`](cockroachdb/pebble@0f785fec) metamorphic: abridge failure output * [`be56747f`](cockroachdb/pebble@be56747f) db: fix overlap check for flushable ingest excises * [`2569414a`](cockroachdb/pebble@2569414a) db: remove race in TestCrashOpenCrashAfterWALCreation * [`575f7a04`](cockroachdb/pebble@575f7a04) sstable: support columnar blocks in Layout.Describe * [`c88c7471`](cockroachdb/pebble@c88c7471) github: fix code cover publish workflow * [`c0fa4a9c`](cockroachdb/pebble@c0fa4a9c) sstable: populate CompareSuffixes on test4bSuffixComparer * [`3a76074f`](cockroachdb/pebble@3a76074f) sstable: set IndexPartitions property in columnar sstable writer * [`0595c1fb`](cockroachdb/pebble@0595c1fb) colblk: define behavior of KeyWriter.ComparePrev with no previous * [`01dcf575`](cockroachdb/pebble@01dcf575) base: make comparer tolerate empty keys * [`d73ab80f`](cockroachdb/pebble@d73ab80f) db: allow excises to unconditionally be flushable ingests * [`b34a3937`](cockroachdb/pebble@b34a3937) base: allow CompareSuffixes to be stricter than Compare * [`90356021`](cockroachdb/pebble@90356021) db: refactor replayWAL to use flushes to make versionEdits Informs: cockroachdb#130533 Release note: none. Epic: none.
Background
In older releases, we used to (in some cases) append an extra synthetic indicator byte to timestamp. These timestamps have been deprecated in v22.2 (#101938), but they can still persist in existing KVs.
Since a difference in this indicator bit isn't supposed to cause two timestamps to not be equal, the engine key comparer (passed to Pebble) takes it into account when comparing timestamps. We recently found a bug in the engine key comparer implementation (#127914): while the implementation correctly ignores the synthetic bit when comparing two keys with timestamps, it does not ignore it when only timestamps themselves are compared. The latter happens only in the context of range keys. In particular, when unsetting a range key, the unset is only effective if the timestamps match. If the range key was set with the synthetic bit, an Unset issued by a recent version against it would be ineffective. Not long after discovering this, we had a production cluster hit this issue (#129592).
Fix that went into v24.2.1
In v24.2.1, to address #129592 we merged a fix to the comparer (#129605). Unfortunately, we now found that this fix can cause replica inconsistency (which causes nodes to crash) once any nodes are upgraded.
A detailed sequence (aptly described by @jbowens):
The comparer fix was backed out from all branches and #129592 was reopened. v24.2.1 is the only released version with that change.
CC @miraradeva @nvanbenschoten @nicktrav
Jira issue: CRDB-42109
The text was updated successfully, but these errors were encountered: