Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC #43862

Merged
merged 6 commits into from
Jan 21, 2020

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Jan 10, 2020

This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: storage: rework RunGC so it no longer buffers keys and values in memory.

  1. Fix the spanset Iterator such that SeekLT can seek to the end of the range.
  2. Expose reverse iteration in rditer.ReplicaDataIterator.
  3. Rearrange some code in gc_queue.go to put the RunGC() helpers below that function.
  4. Remove lockableGCInfo because there was no need for it.
  5. Move RunGC into a gc subpackage.
  6. Rework RunGC to paginate versions of keys with reverse iteration.

Fixes #42531.

@ajwerner ajwerner requested a review from a team as a code owner January 10, 2020 00:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

The last commit needs more testing and probably a benchmark but I wanted to put it up early because I suspect it's going to be at least a little bit controversial. On the whole I feel that it makes the code easier to read but I'd like some confirmation of that feeling.

@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 4 times, most recently from a716961 to 8a78bbe Compare January 13, 2020 02:27
@ajwerner
Copy link
Contributor Author

Hey @tbg I hear you're coming back soon. This PR is certainly not ready to merge but I'd appreciate if you could give the 5th commit a look to judge the basic approach. I want to make sure the structure is reasonable before I invest in the testing and benchmarking to get this over the finish line.

@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 2 times, most recently from b2b137a to 736a8a4 Compare January 13, 2020 05:11
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.
The comments I left are mostly smaller suggestions to clarify the code. I have held off on reviewing some core pieces of logic pending your next round of changes.

Reviewed 4 of 4 files at r1, 7 of 7 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, 1 of 1 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):

		t.Fatalf("expected valid iterator, err=%v", err)
	}
	// SeekLT the upper bound is invalid.

Lower bound, right? Span set has [c,g), and insideKey==c.


pkg/storage/gc_iterator.go, line 40 at r5 (raw file):

// this case, next is the value of an intent.
func (it *gcIterator) state() (
	cur, next *engine.MVCCKeyValue,

This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced

type gcIterState struct {
  cur, next *engine.MVCCKeyValue
  isNewest, isIntent, ok bool
  err error

It would also give you a more convenient place for comments.
Is ok really needed? A comment explaining what it does would help.

I'll assume this method does the right thing for now, will revisit in post your next round of changes.


pkg/storage/engine/gc.go, line 36 at r5 (raw file):

}

// IsGarbage makes a determination whether a key (cur) is garbage. Next, if

Write 'next' here to make this easier to parse.


pkg/storage/engine/gc.go, line 37 at r5 (raw file):

non-nil,

s/next/newer/


pkg/storage/engine/gc.go, line 40 at r5 (raw file):

// metadata KV if cur is an intent). If isNewest is false, next must be non-nil.
// isNewest implies that this is the highest timestamp committed version for
// this key. If isNewest is true and next is non-nil, it is an intent.

Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.


pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):

package engine

import (

Do these tests need to be replaced?


pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):
Can you add a comment for spanKeyExclusive? Something like

If spanKeyExclusive is set, the spans in access will be made open, i.e. [a,b) will be considered (a,b).


pkg/storage/spanset/spanset.go, line 194 at r1 (raw file):

		for _, cur := range s.spans[ac][scope] {
			if cur.Contains(span) &&
				(!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) ||

Is this EndKey check in the right place? I think this should be

!spanKeyExclusive || !cur.Key.Equal(span.Key) || (cur.EndKey != nil && cur.Key.Equal(span.EndKey))


pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):

			if cur.Contains(span) &&
				(!spanKeyExclusive || cur.EndKey != nil && !cur.Key.Equal(span.Key)) ||
				spanKeyExclusive && cur.EndKey.Equal(span.Key) {

Isn't spanKeyExclusive always true if this expr is evaluated?

Having read the tests I think what's happening here is that cur = [k1, k2) is treated like (k1, k2]. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?


pkg/storage/spanset/spanset.go, line 225 at r1 (raw file):

		for _, cur := range s.spans[ac][scope] {
			if (cur.Contains(span) &&
				(!spanKeyExclusive || (cur.EndKey != nil && !cur.Key.Equal(span.Key)))) ||

Similarly confused here


pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):

	defer leaktest.AfterTest(t)()

	var ss SpanSet

random idea: call this bdGkq to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).


pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):

	}
	for _, span := range allowed {
		if err := ss.checkAllowed(SpanReadOnly, span, true /* spanKeyExclusive */); err != nil {

Ok this explains why I'm confused. spanKeyExclusive makes the end key inclusive? I see how that's the right thing for the SeekLT usage but the naming is confusing.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to have you in this area of the code! The changes look very good. I'm not too worried that things will have slowed down due to the reverse iteration, but others will have more informed opinions on that.

Reverse iteration is definitely slower than forward iteration (25-50% depending on what/where you're measuring). But this can be well worth it if we are scanning less data due to using reverse iteration. Is that the case here? I haven't grok'd what is going on in this PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we're scanning the exact same data and it's basically "all of the data". Where previously we'd iterate through all of the kv pairs in a range in forward order, we're now doing the reverse. There is an opportunity to "optimize" by seeking to the previous user key the moment current one's versions are no longer GC'able. Seek internally steps the iter a few times before resorting to the actual seek (as you well know). In this PR as written, we will manually step the iterator through every kv pair (the old code did that too, but not in reverse). Whenever we have lots of keys with >>10 non-gcable versions the actual Seek should be the better choice (and should be equivalent otherwise, due to the internal stepping optimization).
My intuition is that the slowdown of reverse iteration will not be felt much because GC is not a foreground operation and because it hasn't been performance optimized, so there will be other bottlenecks. But it seems like we should check.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

@ajwerner
Copy link
Contributor Author

Thanks for giving it a look. I’ll move forward with writing benchmarks and testing.

@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch from 736a8a4 to 74c414a Compare January 16, 2020 06:59
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I wrote some benchmarks and some tests. I also took this chance to carve off a new package.

The test coverage is no worse than it used to be because I left all of the old tests in place. I also wrote a test that generates a random dataset and compares the output of the old implementation to the new one. Perhaps a few more unit tests are in order and perhaps more coverage of the unchanged code related to abort spans and the local key range.

After some minor memory optimizations the benchmarks show a modest win and a dramatic reduction in memory usage.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)


pkg/storage/batch_spanset_test.go, line 193 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Lower bound, right? Span set has [c,g), and insideKey==c.

Right. Fixed.


pkg/storage/gc_iterator.go, line 40 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This is dangerously close to warranting a struct that is being returned. You're often returning mostly zero values, so it would actually help readability quite a bit if you introduced

type gcIterState struct {
  cur, next *engine.MVCCKeyValue
  isNewest, isIntent, ok bool
  err error

It would also give you a more convenient place for comments.
Is ok really needed? A comment explaining what it does would help.

I'll assume this method does the right thing for now, will revisit in post your next round of changes.

I did indeed create a struct. I also simplified the logic.


pkg/storage/spanset/spanset.go, line 185 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can you add a comment for spanKeyExclusive? Something like

If spanKeyExclusive is set, the spans in access will be made open, i.e. [a,b) will be considered (a,b).

spans in access will be open at the start and closed at the end, i.e. [a,b) will be considered (a,b]


pkg/storage/spanset/spanset.go, line 195 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Isn't spanKeyExclusive always true if this expr is evaluated?

Having read the tests I think what's happening here is that cur = [k1, k2) is treated like (k1, k2]. I find the bool is not named intuitively for that behavior. Could "reverse" (along with a comment) be a better name?

I've renamed the variable and added commentary.


pkg/storage/spanset/spanset_test.go, line 224 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

random idea: call this bdGkq to make it self-describing and to make these tests easier to read (this test is small anyway, but maybe this works in general).

Done.


pkg/storage/spanset/spanset_test.go, line 235 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Ok this explains why I'm confused. spanKeyExclusive makes the end key inclusive? I see how that's the right thing for the SeekLT usage but the naming is confusing.

Hopefully it's less confusing now.


pkg/storage/engine/gc.go, line 36 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Write 'next' here to make this easier to parse.

Done.


pkg/storage/engine/gc.go, line 37 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

non-nil,

s/next/newer/

Done.


pkg/storage/engine/gc.go, line 40 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Explain that there being an intent doesn't matter in the impl below. Conservatively we have to assume that the intent will get aborted, so we will be able to GC just the values that we could remove if there weren't an intent. Maybe this comment doesn't have to live in the method comment, but it would be helpful somewhere.

Done.


pkg/storage/engine/gc_test.go, line 13 at r5 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Do these tests need to be replaced?

definitely. I retained them in the testing file which now contains all of this old code and exists to support the old implementation.

@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 2 times, most recently from bff3170 to 658b2f7 Compare January 16, 2020 07:12
@ajwerner ajwerner changed the title [WIP] storage: enable reverse iteration in rditer, paginate versions during GC storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC Jan 16, 2020
@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch 3 times, most recently from b0059e0 to 55b4936 Compare January 16, 2020 16:14
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbg friendly ping

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @tbg)

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! :lgtm: and thanks for all of the cleanup work. I might've missed some of the changes since my last review (Reviewable and/or I got a little confused with all the revisions), so if there's something in particular that I should give a closer look please let me know.

Reviewed 13 of 13 files at r7, 7 of 7 files at r8, 1 of 1 files at r9, 1 of 1 files at r10, 7 of 7 files at r11, 12 of 12 files at r12.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):

)

func TestGCIterator(t *testing.T) {

Bit of commentary as this test proceeds would be nice.


pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):

)

// RunGCOld is an older implementation of Run. It is used for benchmarking and

Not that it matters much, but does it need to be exported?


pkg/storage/gc/gc.go, line 477 at r12 (raw file):

// multiple of the heartbeat timeout used by the coordinator.
//
// TODO(tschottdorf): this could be done in Replica.GC itself, but it's

This TODO can be removed (not worth a CI run though)

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and sorry for letting this sit - I've changed my GH workflow a bit and am still working out the kinks).

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

SeekLT is exclusive on the key being seeked. It should be allowed to SeekLT
to the end of a span. This commit makes that possible.

Release note: None
Prior to this commit, the ReplicaDataIterator only permitted forward iteration.
This PR exposes the ability to iterate in reverse by adding a `Prev()` method
as well as a constructor option to seek the iterator to the end of the data
range.

Release note: None
This commit is just code movement. It moves the helper functions of `RunGC()`,
`processLocalKeyRange()` and `processAbortSpan()` from above to below.

Release note: None
There was no concurrency or need for synchronization.

Release note: None
This commit moves the logic of RunGC as well as the engine.GC struct into a
separate subpackage. As of this commit that subpackage contains no testing
other than what existed for the engine.GC code.

The RunGC function is a reasonably well specified interface to separate
the logic of scanning a range and collecting the garbage from the gcQueue.

I was getting overwhelmed by testing boundaries and unit testing in general.

Release note: None
This commit reworks the processing of replicated state underneath the gcQueue
for the purpose of determining and sending GC requests. The primary intention
of this commit is to remove the need to buffer all of the versions of a key
in memory. As we learned in cockroachdb#42531, this bufferring can be extremely
unfortunate when using sequence data types which are written to frequently.

Prior to this commit, the code forward iterates through the range's data and
eagerly reads all versions of the a key into memory. It then binary searches
those versions to find the latest timestamp for the key which can be GC'd.
It then reverse iterates through all of those versions to determine the latest
version of the key which would put the current batch over its limit. This last
step works to paginate the process of actually deleting the data for many
versions of the same key. I suppose this pagination was added to ensure that
write batches due to GC requests don't get too large. Unfortunately this logic
was unable to paginate the loading of versions from the storage engine.

In this new commit, the entire process of computing data to GC now uses reverse
iteration; for each key we examine versions from oldest to newest. The commit
adds a `gcIterator` which wraps this reverse iteration with some useful
lookahead. During this GC process, at most two additional versions need to
examined to determine whether a given version is garbage.

While this approach relies on reverse iteration which is known to be less
efficient than forward iteration, it offers the opportunity to avoid allocating
memory for versions of a key which are not going to end up as a part of a GC
request. This reduction in memory usage shows up in benchmarks (see below).
The change retains the old implementation as a testing strategy and as a basis
for the benchmarks.

```
name                                                                                                                      old time/op    new time/op    delta
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8                  924ns ± 8%     590ns ± 1%   -36.13%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8               976ns ± 2%     578ns ± 1%   -40.75%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8               944ns ± 0%     570ns ± 9%   -39.63%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8               903ns ± 0%     612ns ± 3%   -32.29%  (p=0.016 n=4+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8               994ns ± 9%     592ns ± 9%   -40.47%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8               669ns ± 4%     526ns ± 1%   -21.34%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8            624ns ± 0%     529ns ± 2%   -15.16%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8            636ns ± 4%     534ns ± 2%   -16.04%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8            612ns ± 1%     532ns ± 3%   -13.08%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8            638ns ± 2%     534ns ±10%   -16.35%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8        603ns ± 6%     527ns ± 8%   -12.51%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8     613ns ± 5%     517ns ± 6%   -15.78%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8     619ns ± 6%     534ns ± 4%   -13.61%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8     607ns ± 7%     520ns ± 2%   -14.39%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8     599ns ± 4%     501ns ± 7%   -16.36%  (p=0.008 n=5+5)

name                                                                                                                      old speed      new speed      delta
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8               23.9MB/s ± 8%  37.3MB/s ± 1%   +56.23%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8            22.6MB/s ± 2%  38.1MB/s ± 1%   +68.81%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8            23.3MB/s ± 0%  38.7MB/s ± 9%   +66.06%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8            24.4MB/s ± 0%  36.0MB/s ± 3%   +47.73%  (p=0.016 n=4+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8            22.2MB/s ± 8%  37.3MB/s ± 9%   +68.09%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8            34.4MB/s ± 4%  43.7MB/s ± 1%   +27.08%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8         36.9MB/s ± 0%  43.4MB/s ± 2%   +17.84%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8         36.2MB/s ± 4%  43.1MB/s ± 2%   +19.02%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8         37.6MB/s ± 1%  43.3MB/s ± 3%   +15.02%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8         36.0MB/s ± 2%  43.2MB/s ±10%   +19.87%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8     36.5MB/s ± 5%  41.8MB/s ± 9%   +14.39%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8  35.9MB/s ± 5%  42.7MB/s ± 6%   +18.83%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8  35.6MB/s ± 6%  41.2MB/s ± 4%   +15.66%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8  36.3MB/s ± 6%  42.3MB/s ± 2%   +16.69%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8  36.7MB/s ± 4%  44.0MB/s ± 7%   +19.69%  (p=0.008 n=5+5)

name                                                                                                                      old alloc/op   new alloc/op   delta
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8                   325B ± 0%       76B ± 0%   -76.62%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8                358B ± 0%       49B ± 0%      ~     (p=0.079 n=4+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8                340B ± 0%       29B ± 0%   -91.47%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8                328B ± 0%       18B ± 0%   -94.51%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8                325B ± 0%       14B ± 0%   -95.69%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8                226B ± 0%        2B ± 0%      ~     (p=0.079 n=4+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8             228B ± 0%        3B ± 0%   -98.69%  (p=0.000 n=5+4)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8             228B ± 0%        2B ± 0%   -99.12%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8             228B ± 0%        2B ± 0%   -99.12%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8             226B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8         388B ± 2%        0B       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8      391B ± 2%        0B       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8      389B ± 1%        0B       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8      391B ± 2%        0B       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8      390B ± 1%        0B       -100.00%  (p=0.008 n=5+5)

name                                                                                                                      old allocs/op  new allocs/op  delta
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000-8                   4.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#01-8                4.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#02-8                4.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#03-8                4.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[2,3],valueLen=[1,1],keysPerValue=[1,2],deleteFrac=0.000000,intentFrac=0.100000#04-8                4.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000-8                0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#01-8             0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#02-8             0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#03-8             0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1,100],deleteFrac=0.100000,intentFrac=0.100000#04-8             0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000-8         0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#01-8      0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#02-8      0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#03-8      0.00           0.00           ~     (all equal)
Run/ts=[0,100],keySuffix=[8,8],valueLen=[8,16],keysPerValue=[1000,1000000],deleteFrac=0.100000,intentFrac=0.100000#04-8      0.00           0.00           ~     (all equal)
```

Release note (bug fix): The GC process was improved to paginate the key
versions of a key to fix OOM crashes which can occur when there are
extremely large numbers of versions for a given key.
@ajwerner ajwerner force-pushed the ajwerner/paginate-gc-of-a-row branch from 55b4936 to 5512030 Compare January 21, 2020 15:51
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=tbg

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)


pkg/storage/gc/gc.go, line 477 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This TODO can be removed (not worth a CI run though)

Done.


pkg/storage/gc/gc_iterator_test.go, line 27 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Bit of commentary as this test proceeds would be nice.

Done.


pkg/storage/gc/gc_old_test.go, line 32 at r12 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Not that it matters much, but does it need to be exported?

When I first wrote it I made the benchmarks a package gc_test file, I ended up reverting that because symbol sharing with other testing infrastructure got annoying. Unexported.

craig bot pushed a commit that referenced this pull request Jan 21, 2020
43862: storage/gc: create gc pkg, reverse iteration in rditer, paginate versions during GC r=tbg a=ajwerner

This PR comes in 6 commits. See the commit messages for more details. The meat of the change is in: `storage: rework RunGC so it no longer buffers keys and values in memory`.

1) Fix the spanset Iterator such that SeekLT can seek to the end of the range.
2) Expose reverse iteration in rditer.ReplicaDataIterator.
3) Rearrange some code in gc_queue.go to put the RunGC() helpers below that function.
4) Remove lockableGCInfo because there was no need for it.
5) Move RunGC into a `gc` subpackage.
5) Rework RunGC to paginate versions of keys with reverse iteration.


Fixes #42531.

44054: storage/concurrency/lock: define lock Strength and Durability modes r=nvanbenschoten a=nvanbenschoten

This commit introduces a new `concurrency/lock` package that provides type definitions for locking-related concepts used by concurrency control in the key-value layer. The package initially provides a lock `Strength` enumeration consisting of `Shared`, `Upgrade`, and `Exclusive` and a lock `Durability` enumeration consisting of `Unreplicated` and `Replicated`.

These definitions are currently unused but will be referenced soon by #43740 and #43775. The changes around SELECT FOR UPDATE in the optimizer (#44015) are also approaching the point where they'll soon need the type definitions to interface with the KV API.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jan 21, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM occurred while running replica GC old version
4 participants