-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: corrupt ReverseScan response after DeleteRangeUsingTombstone #90642
Comments
Hi @tbg, please add branch-* labels to identify which branch(es) this release-blocker affects. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
cc @cockroachdb/replication |
We've identified the problem. Excerpt from internal Slack thread: Basically, dynamically enabling point synthesis in the reverse direction may omit a (valid) synthetic point tombstone when it's triggered on a point key colocated with the range tombstone's start key. We emit synthetic points for all range key timestamps at their start key, but in the reverse direction, the parent iterator may only discover the range key when landing on a point key that's above a range key timestamp. Thus, in the reverse direction, the synthetic point was never emitted for the lower range key, so we'll need to step to it. This is a bit problematic in that we may not know which synthetic point to position on, as it depends on what the previous positioning operation was (it may e.g. have been a versioned SeekLT which would mandate omitting certain synthetic points). Will need some thought. |
Here's a descriptive test case: # Regression test for https://github.com/cockroachdb/cockroach/issues/90642.
#
# REAL DATASET SYNTHETIC DATASET
# 2 a2 [b2] 2 a2 [b2]
# 1 [---) 1 x
# a b a b
#
# Recall that pebbleMVCCScanner only enables pointSynthesizingIter when it
# encounters a range key. In the case above, during a reverse scan, the [a-b)@1
# range key will first become visible to pebbleMVCCScanner when it lands on a@2,
# so it enabled point synthesis positioned at the a@2 point key. Notice how the
# iterator has now skipped over the synthetic point tombstone a@1.
#
# This is particularly problematic when combined with pebbleMVCCScanner peeking,
# which assumes that following a iterPeekPrev() call, an iterNext() call can
# step the parent iterator forward once to get back to the original position.
# With the above bug, that is no longer true, as it instead lands on the
# synthetic point tombstone which was skipped during reverse iteration. During
# intent processing for b@2, such an iterNext() call is expected to land on the
# intent's provisional value at b@2, but it instead lands on the intent itself
# at b@0. This in turn caused a value checksum or decoding failure, where it was
# expecting the current key to be b@2, but the actual key was b@0.
run ok
del_range_ts k=a end=b ts=1
put k=a ts=2 v=a2
with t=A
txn_begin k=b ts=2
put k=b v=b2
----
>> at end:
txn: "A" meta={id=00000000 key="b" pri=0.00000000 epo=0 ts=2.000000000,0 min=0,0 seq=0} lock=true stat=PENDING rts=2.000000000,0 wto=false gul=0,0
rangekey: {a-b}/[1.000000000,0=/<empty>]
data: "a"/2.000000000,0 -> /BYTES/a2
meta: "b"/0,0 -> txn={id=00000000 key="b" pri=0.00000000 epo=0 ts=2.000000000,0 min=0,0 seq=0} ts=2.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
data: "b"/2.000000000,0 -> /BYTES/b2
run ok
scan t=A k=a end=z reverse
----
scan: "b" -> /<err: unknown tag: UNKNOWN> @0,0
scan: "a" -> /BYTES/a2 @2.000000000,0 |
90702: storage: clone `pointSynthesizingIter` seek keys r=erikgrinaker a=erikgrinaker The seek keys passed to `pointSynthesizingIter` seek methods may have originated from `UnsafeKey()`. Repositioning the parent iterator may therefore invalidate the seek key, but the code keeps using it afterwards. This patch clones the seek key in a reusable buffer. The overhead of this should be negligible, considering the overall cost of a seek. `MVCCGet` benchmarks don't show a significant difference: ``` name old time/op new time/op delta MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=0-24 5.11µs ± 1% 5.13µs ± 1% ~ (p=0.143 n=10+10) MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=1-24 9.32µs ± 1% 9.41µs ± 1% +0.96% (p=0.001 n=10+10) MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=0-24 19.1µs ± 2% 19.2µs ± 3% ~ (p=0.075 n=10+10) MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=1-24 25.7µs ± 2% 25.7µs ± 2% ~ (p=0.853 n=10+10) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=0-24 3.10µs ± 2% 3.09µs ± 1% ~ (p=0.985 n=10+10) MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=1-24 5.05µs ± 0% 5.02µs ± 0% -0.46% (p=0.000 n=10+10) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=0-24 16.5µs ± 3% 16.4µs ± 2% ~ (p=0.579 n=10+10) MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=1-24 20.6µs ± 2% 20.7µs ± 1% +0.60% (p=0.022 n=9+10) ``` Touches #90642. Epic: CRDB-2624. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Describe the problem
Check out bba9fd0 from #89477
To Reproduce
See above. Adding this to
cmd_scan_reverse.go
triggers it closer to the source, too:Note that
(*ReverseScanResponse).Verify
doesn't use any data from the request, it looks exclusively atreply.Rows
. So this is likely a real bug, not some memory handling problem in kvnemesis.Unfortunately, I can't reproduce under
--race
.Expected behavior
Just works
Additional data / screenshots
If DeleteRangeUsingTombstone is disabled in kvnemesis' generator, this problem does not readily reproduce.
Environment:
master, see PR
Additional context
Jira issue: CRDB-20867
Epic CRDB-20465
The text was updated successfully, but these errors were encountered: