-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862
storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862
Conversation
The last commit needs more testing and probably a benchmark but I wanted to put it up early because I suspect it's going to be at least a little bit controversial. On the whole I feel that it makes the code easier to read but I'd like some confirmation of that feeling. |
a716961
to
8a78bbe
Compare
Hey @tbg I hear you're coming back soon. This PR is certainly not ready to merge but I'd appreciate if you could give the 5th commit a look to judge the basic approach. I want to make sure the structure is reasonable before I invest in the testing and benchmarking to get this over the finish line. |
b2b137a
to
736a8a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.
The comments I left are mostly smaller suggestions to clarify the code. I have held off on reviewing some core pieces of logic pending your next round of changes.
Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, 1 of 1 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):
t.Fatalf("expected valid iterator, err=%v", err) } // SeekLT the upper bound is invalid.
Lower bound, right? Span set has [c,g)
, and insideKey==c
.
pkg/storage/gc_iterator.go, line 40 at r5 (raw file):
// this case, next is the value of an intent. func (it *gcIterator) state() ( cur, next *engine.MVCCKeyValue,
This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced
type gcIterState struct {
cur, next *engine.MVCCKeyValue
isNewest, isIntent, ok bool
err error
It would also give you a more convenient place for comments.
Is ok
really needed? A comment explaining what it does would help.
I'll assume this method does the right thing for now, will revisit in post your next round of changes.
pkg/storage/engine/gc.go, line 36 at r5 (raw file):
} // IsGarbage makes a determination whether a key (cur) is garbage. Next, if
Write 'next'
here to make this easier to parse.
pkg/storage/engine/gc.go, line 37 at r5 (raw file):
non-nil,
s/next/newer/
pkg/storage/engine/gc.go, line 40 at r5 (raw file):
// metadata KV if cur is an intent). If isNewest is false, next must be non-nil. // isNewest implies that this is the highest timestamp committed version for // this key. If isNewest is true and next is non-nil, it is an intent.
Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.
pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):
package engine import (
Do these tests need to be replaced?
pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):
Can you add a comment for spanKeyExclusive
? Something like
If spanKeyExclusive is set, the spans in
access
will be made open, i.e.[a,b)
will be considered(a,b)
.
pkg/storage/spanset/spanset.go, line 194 at r1 (raw file):
for _, cur := range s.spans[ac][scope] { if cur.Contains(span) && (!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) ||
Is this EndKey check in the right place? I think this should be
!spanKeyExclusive || !cur.Key.Equal(span.Key) || (cur.EndKey != nil && cur.Key.Equal(span.EndKey))
pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):
if cur.Contains(span) && (!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) || spanKeyExclusive && cur.EndKey.Equal(span.Key) {
Isn't spanKeyExclusive
always true if this expr is evaluated?
Having read the tests I think what's happening here is that cur = [k1, k2)
is treated like (k1, k2]
. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?
pkg/storage/spanset/spanset.go, line 225 at r1 (raw file):
for _, cur := range s.spans[ac][scope] { if (cur.Contains(span) && (!spanKeyExclusive || (cur.EndKey != nil && !cur.Key.Equal(span.Key)))) ||
Similarly confused here
pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):
defer leaktest.AfterTest(t)() var ss SpanSet
random idea: call this bdGkq
to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).
pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):
} for _, span := range allowed { if err := ss.checkAllowed(SpanReadOnly, span, true /* spanKeyExclusive */); err != nil {
Ok this explains why I'm confused. spanKeyExclusive
makes the end key inclusive? I see how that's the right thing for the SeekLT
usage but the naming is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.
Reverse iteration is definitely slower than forward iteration (25-50% depending on what/where you're measuring). But this can be well worth it if we are scanning less data due to using reverse iteration. Is that the case here? I haven't grok'd what is going on in this PR.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think we're scanning the exact same data and it's basically "all of the data". Where previously we'd iterate through all of the kv pairs in a range in forward order, we're now doing the reverse. There is an opportunity to "optimize" by seeking to the previous user key the moment current one's versions are no longer GC'able. Seek internally steps the iter a few times before resorting to the actual seek (as you well know). In this PR as written, we will manually step the iterator through every kv pair (the old code did that too, but not in reverse). Whenever we have lots of keys with >>10 non-gcable versions the actual Seek should be the better choice (and should be equivalent otherwise, due to the internal stepping optimization).
My intuition is that the slowdown of reverse iteration will not be felt much because GC is not a foreground operation and because it hasn't been performance optimized, so there will be other bottlenecks. But it seems like we should check.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
Thanks for giving it a look. I’ll move forward with writing benchmarks and testing. |
736a8a4
to
74c414a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I wrote some benchmarks and some tests. I also took this chance to carve off a new package.
The test coverage is no worse than it used to be because I left all of the old tests in place. I also wrote a test that generates a random dataset and compares the output of the old implementation to the new one. Perhaps a few more unit tests are in order and perhaps more coverage of the unchanged code related to abort spans and the local key range.
After some minor memory optimizations the benchmarks show a modest win and a dramatic reduction in memory usage.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)
pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Lower bound, right? Span set has
[c,g)
, andinsideKey==c
.
Right. Fixed.
pkg/storage/gc_iterator.go, line 40 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced
type gcIterState struct { cur, next *engine.MVCCKeyValue isNewest, isIntent, ok bool err errorIt would also give you a more convenient place for comments.
Isok
really needed? A comment explaining what it does would help.I'll assume this method does the right thing for now, will revisit in post your next round of changes.
I did indeed create a struct. I also simplified the logic.
pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Can you add a comment for
spanKeyExclusive
? Something likeIf spanKeyExclusive is set, the spans in
access
will be made open, i.e.[a,b)
will be considered(a,b)
.
spans in access will be open at the start and closed at the end, i.e. [a,b)
will be considered (a,b]
pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Isn't
spanKeyExclusive
always true if this expr is evaluated?Having read the tests I think what's happening here is that
cur = [k1, k2)
is treated like(k1, k2]
. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?
I've renamed the variable and added commentary.
pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
random idea: call this
bdGkq
to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).
Done.
pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ok this explains why I'm confused.
spanKeyExclusive
makes the end key inclusive? I see how that's the right thing for theSeekLT
usage but the naming is confusing.
Hopefully it's less confusing now.
pkg/storage/engine/gc.go, line 36 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Write
'next'
here to make this easier to parse.
Done.
pkg/storage/engine/gc.go, line 37 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
non-nil,
s/next/newer/
Done.
pkg/storage/engine/gc.go, line 40 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.
Done.
pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Do these tests need to be replaced?
definitely. I retained them in the testing file which now contains all of this old code and exists to support the old implementation.
bff3170
to
658b2f7
Compare
b0059e0
to
55b4936
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbg friendly ping
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! and thanks for all of the cleanup work. I might've missed some of the changes since my last review (Reviewable and/or I got a little confused with all the revisions), so if there's something in particular that I should give a closer look please let me know.
Reviewed 13 of 13 files at r7, 7 of 7 files at r8, 1 of 1 files at r9, 1 of 1 files at r10, 7 of 7 files at r11, 12 of 12 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):
) func TestGCIterator(t *testing.T) {
Bit of commentary as this test proceeds would be nice.
pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):
) // RunGCOld is an older implementation of Run. It is used for benchmarking and
Not that it matters much, but does it need to be exported?
pkg/storage/gc/gc.go, line 477 at r12 (raw file):
// multiple of the heartbeat timeout used by the coordinator. // // TODO(tschottdorf): this could be done in Replica.GC itself, but it's
This TODO can be removed (not worth a CI run though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and sorry for letting this sit - I've changed my GH workflow a bit and am still working out the kinks).
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
SeekLT is exclusive on the key being seeked. It should be allowed to SeekLT to the end of a span. This commit makes that possible. Release note: None
Prior to this commit, the ReplicaDataIterator only permitted forward iteration. This PR exposes the ability to iterate in reverse by adding a `Prev()` method as well as a constructor option to seek the iterator to the end of the data range. Release note: None
This commit is just code movement. It moves the helper functions of `RunGC()`, `processLocalKeyRange()` and `processAbortSpan()` from above to below. Release note: None
There was no concurrency or need for synchronization. Release note: None
This commit moves the logic of RunGC as well as the engine.GC struct into a separate subpackage. As of this commit that subpackage contains no testing other than what existed for the engine.GC code. The RunGC function is a reasonably well specified interface to separate the logic of scanning a range and collecting the garbage from the gcQueue. I was getting overwhelmed by testing boundaries and unit testing in general. Release note: None
This commit reworks the processing of replicated state underneath the gcQueue for the purpose of determining and sending GC requests. The primary intention of this commit is to remove the need to buffer all of the versions of a key in memory. As we learned in cockroachdb#42531, this bufferring can be extremely unfortunate when using sequence data types which are written to frequently. Prior to this commit, the code forward iterates through the range's data and eagerly reads all versions of the a key into memory. It then binary searches those versions to find the latest timestamp for the key which can be GC'd. It then reverse iterates through all of those versions to determine the latest version of the key which would put the current batch over its limit. This last step works to paginate the process of actually deleting the data for many versions of the same key. I suppose this pagination was added to ensure that write batches due to GC requests don't get too large. Unfortunately this logic was unable to paginate the loading of versions from the storage engine. In this new commit, the entire process of computing data to GC now uses reverse iteration; for each key we examine versions from oldest to newest. The commit adds a `gcIterator` which wraps this reverse iteration with some useful lookahead. During this GC process, at most two additional versions need to examined to determine whether a given version is garbage. While this approach relies on reverse iteration which is known to be less efficient than forward iteration, it offers the opportunity to avoid allocating memory for versions of a key which are not going to end up as a part of a GC request. This reduction in memory usage shows up in benchmarks (see below). The change retains the old implementation as a testing strategy and as a basis for the benchmarks. ``` name old time/op new time/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 924ns ± 8% 590ns ± 1% -36.13% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 976ns ± 2% 578ns ± 1% -40.75% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 944ns ± 0% 570ns ± 9% -39.63% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 903ns ± 0% 612ns ± 3% -32.29% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 994ns ± 9% 592ns ± 9% -40.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 669ns ± 4% 526ns ± 1% -21.34% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 624ns ± 0% 529ns ± 2% -15.16% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 636ns ± 4% 534ns ± 2% -16.04% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 612ns ± 1% 532ns ± 3% -13.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 638ns ± 2% 534ns ±10% -16.35% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 603ns ± 6% 527ns ± 8% -12.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 613ns ± 5% 517ns ± 6% -15.78% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 619ns ± 6% 534ns ± 4% -13.61% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 607ns ± 7% 520ns ± 2% -14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 599ns ± 4% 501ns ± 7% -16.36% (p=0.008 n=5+5) name old speed new speed delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 23.9MB/s ± 8% 37.3MB/s ± 1% +56.23% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 22.6MB/s ± 2% 38.1MB/s ± 1% +68.81% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 23.3MB/s ± 0% 38.7MB/s ± 9% +66.06% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 24.4MB/s ± 0% 36.0MB/s ± 3% +47.73% (p=0.016 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 22.2MB/s ± 8% 37.3MB/s ± 9% +68.09% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 34.4MB/s ± 4% 43.7MB/s ± 1% +27.08% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 36.9MB/s ± 0% 43.4MB/s ± 2% +17.84% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 36.2MB/s ± 4% 43.1MB/s ± 2% +19.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 37.6MB/s ± 1% 43.3MB/s ± 3% +15.02% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.0MB/s ± 2% 43.2MB/s ±10% +19.87% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 36.5MB/s ± 5% 41.8MB/s ± 9% +14.39% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 35.9MB/s ± 5% 42.7MB/s ± 6% +18.83% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 35.6MB/s ± 6% 41.2MB/s ± 4% +15.66% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 36.3MB/s ± 6% 42.3MB/s ± 2% +16.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 36.7MB/s ± 4% 44.0MB/s ± 7% +19.69% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 325B ± 0% 76B ± 0% -76.62% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 358B ± 0% 49B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 340B ± 0% 29B ± 0% -91.47% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 328B ± 0% 18B ± 0% -94.51% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 325B ± 0% 14B ± 0% -95.69% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 226B ± 0% 2B ± 0% ~ (p=0.079 n=4+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 228B ± 0% 3B ± 0% -98.69% (p=0.000 n=5+4) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 228B ± 0% 2B ± 0% -99.12% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 226B ± 0% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 388B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 389B ± 1% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 391B ± 2% 0B -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 390B ± 1% 0B -100.00% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8 4.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8 0.00 0.00 ~ (all equal) Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8 0.00 0.00 ~ (all equal) ``` Release note (bug fix): The GC process was improved to paginate the key versions of a key to fix OOM crashes which can occur when there are extremely large numbers of versions for a given key.
55b4936
to
5512030
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=tbg
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/gc/gc.go, line 477 at r12 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This TODO can be removed (not worth a CI run though)
Done.
pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Bit of commentary as this test proceeds would be nice.
Done.
pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Not that it matters much, but does it need to be exported?
When I first wrote it I made the benchmarks a package gc_test
file, I ended up reverting that because symbol sharing with other testing infrastructure got annoying. Unexported.
43862: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC r=tbg a=ajwerner This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: `storage: rework RunGC so it no longer buffers keys and values in memory`. 1) Fix the spanset Iterator such that SeekLT can seek to the end of the range. 2) Expose reverse iteration in rditer.ReplicaDataIterator. 3) Rearrange some code in gc_queue.go to put the RunGC() helpers below that function. 4) Remove lockableGCInfo because there was no need for it. 5) Move RunGC into a `gc` subpackage. 5) Rework RunGC to paginate versions of keys with reverse iteration. Fixes #42531. 44054: storage/concurrency/lock: define lock Strength and Durability modes r=nvanbenschoten a=nvanbenschoten This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`. These definitions are currently unused but will be referenced soon by #43740 and #43775. The changes around SELECT FOR UPDATE in the optimizer (#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in:
storage: rework RunGC so it no longer buffers keys and values in memory
.gc
subpackage.Fixes #42531.