-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Use batches for direct RocksDB mutations #55708
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presuming this fixes the bug, let's disable the DBImpl::{Put,Merge,Delete,SingleDelete,DeleteRange}
code paths, or at least reverting the use of sync = true
there.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, @petermattis, and @tbg)
pkg/storage/rocksdb.go, line 516 at r1 (raw file):
return err } return b.Commit(true)
Let's annotate these trues
with Commit(true /* sync */)
.
935c1cd
to
e0a0e7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Removed the use of sync = true in those code paths. Can't remove them entirely as some tests depend on them
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @petermattis, and @tbg)
pkg/storage/rocksdb.go, line 516 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Let's annotate these
trues
withCommit(true /* sync */)
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: bugfix, I've run engine/switch/nodes=3
~50 times with no repro, and engine/switch/encrypted
~20 times. Given the latter was failing approx. 50% of the time before this change, I'm pretty confident this fixes it.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jbowens, @petermattis, and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you able to reproduce the engine/switch/nodes=3
failure without this PR?
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @jbowens, and @tbg)
pkg/storage/rocksdb.go, line 561 at r2 (raw file):
// It is safe to modify the contents of the arguments after ApplyBatchRepr // returns. func (r *RocksDB) ApplyBatchRepr(repr []byte, sync bool) error {
Can we get rid of the sync
argument here? I don't think it is every used. Or perhaps we should just ignore this as we're remove RocksDB shortly anyways.
LGTM |
Currently, doing direct mutations on a RocksDB instance bypasses custom batching / syncing logic that we've built on top of it. This, or something internal to RocksDB, started leading to some bugs when all direct mutations started passing in WriteOptions.sync = true (see cockroachdb#55240 for when this change went in). In this change, direct mutations still commit the batch with sync=true to guarantee WAL syncing, but they go through the batch commit pipeline too, just like the vast majority of operations already do. Fixes cockroachdb#55362. Release note: None.
e0a0e7a
to
8978797
Compare
Yes, I was able to repro it after ~50 runs. It's a lot less frequent than the |
TFTRs! bors r+ |
We'll want to backport this PR to 20.2, 20.1, and 19.2. |
Build succeeded: |
Currently, doing direct mutations on a RocksDB instance bypasses
custom batching / syncing logic that we've built on top of it.
This, or something internal to RocksDB, started leading to some bugs
when all direct mutations started passing in WriteOptions.sync = true
(see #55240 for when this change went in).
In this change, direct mutations still commit the batch with sync=true
to guarantee WAL syncing, but they go through the batch commit pipeline
too, just like the vast majority of operations already do.
Fixes #55362.
Release note: None.