-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util/interval: generate type-safe specializations of interval B-tree #43850
util/interval: generate type-safe specializations of interval B-tree #43850
Conversation
7b47bdf
to
fc64462
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fairly neat. Not very intrusive and the generated code is perfectly readable. I have no objections to this approach if you can define the generic type appropriately for the lock table use case.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
// T is a Template type. The methods in the interface make up its contract. type T = interface {
What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.
Oh, I see that this is just the type being replaced. What through me is the type T =
. I see other examples of go_generics
which use type T interface
. Are both styles supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)
pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Oh, I see that this is just the type being replaced. What through me is the
type T =
. I see other examples ofgo_generics
which usetype T interface
. Are both styles supported?
Yes, the type T
is replaced. I'm not sure what the difference between the type alias and the type definition is. https://github.com/scylladb/go-set uses the former and https://github.com/mmatczuk/go_generics uses the latter in its tests. Based on the code in the generator itself, it doesn't look like there's a difference in how they're handled - either way, they're removed from the resulting AST.
The interface serves two purposes: to specify a contract for the parameterized type and allow the templated go files to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks.
Reviewed 14 of 15 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):
3. Include a go generate declaration that invokes the gen.sh script with the type name as the first argument and the package name as the second argument. 4. Invoke go gnerate.
nit: generate
pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):
SetID(uint64) SetKey([]byte) SetEndKey([]byte)
why does the btree implementation need the Set*
methods? And I don't see the code in example_interval_btree.go
calling the Set*
methods.
Ah, I now see it is needed by the test. Can you add a comment here?
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
} // incRef acquires a reference to the node.
I have a hard time understanding the ref counting happening in this file and reasoning about its correctness.
A snapshot of the tree only increments the ref count on the root node. And the snapshot happens before insertion of the latches. So presumably the root has a refcount of at least 2 (if the baseline is a refcount of 1). Then the insertion needs a mutable root node which must mean a copy is made of the root, and this copy increments the refs of its children, but not a recursive increment. But mut()
also does a recursive decrement -- why (when were these recursive refs acquired)? I am assuming that mut()
acquiring refs on its children is what ensures that whatever path the insert takes will see a node that does not have a ref count of 1 and so have to make a copy, while the paths not taken don't need a recursive refcount increment -- so a refcount of > 1 at a node also protects its subtree.
Could you add a high-level comment somewhere in this file?
fc64462
to
4c6b73b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. I still need to make a few tweaks here to get it passing all lint checks. I'll ping the PR when that's done.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @sumeerbhola)
pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: generate
Done.
pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):
Previously, sumeerbhola wrote…
why does the btree implementation need the
Set*
methods? And I don't see the code inexample_interval_btree.go
calling theSet*
methods.Ah, I now see it is needed by the test. Can you add a comment here?
Done.
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why mut()
does a recursive decrement on the cloned node's refcount. On the path that you're referencing, mut()
is performing a clone so that it can write. Because mut()
is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call to decRef
(I added a comment), then that should recursively release references on the children it points to and be recycled.
Could you add a high-level comment somewhere in this file?
I added the following comment to btree.Clone
:
Incrementing the reference count on the root node is sufficient to ensure that
no node in the cloned tree can be mutated by an actor holding a reference to the
original tree and vice versa. This property is upheld because the root node in
the receiver btree and the returned btree will both necessarily have a reference
count of at least 2 when this method returns. All tree mutations recursively
acquire mutable node references (see mut) as they traverse down the tree. The
act of acquiring a mutable node reference performs a clone if a node's reference
count is greater than one. Cloning a node (see clone) increases the reference
count on each of its children, ensuring that they have a reference count of at
least 2. This, in turn, ensures that any of the child nodes that are modified
will also be copied-on-write, recursively ensuring the immutability property
over the entire tree.```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 15 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why
mut()
does a recursive decrement on the cloned node's refcount. On the path that you're referencing,mut()
is performing a clone so that it can write. Becausemut()
is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call todecRef
(I added a comment), then that should recursively release references on the children it points to and be recycled.Could you add a high-level comment somewhere in this file?
I added the following comment to
btree.Clone
:Incrementing the reference count on the root node is sufficient to ensure that no node in the cloned tree can be mutated by an actor holding a reference to the original tree and vice versa. This property is upheld because the root node in the receiver btree and the returned btree will both necessarily have a reference count of at least 2 when this method returns. All tree mutations recursively acquire mutable node references (see mut) as they traverse down the tree. The act of acquiring a mutable node reference performs a clone if a node's reference count is greater than one. Cloning a node (see clone) increases the reference count on each of its children, ensuring that they have a reference count of at least 2. This, in turn, ensures that any of the child nodes that are modified will also be copied-on-write, recursively ensuring the immutability property over the entire tree.``` </blockquote></details> This comment is helpful. I was still a bit lost, so I reread the code. The key thing that I missed earlier (a code comment would help) is that a ref represents how many parents a node has. The one special case is the root ref is how many btree structs point to it. Having one parent does not mean the node can be mutated since that parent may be part of two trees. - when mutating, when see a node N with more than one parent, eagerly clone it. This propagates information of what is shared (that is known higher up in the tree), recursively downwards by ensuring anything along the mutation path (worst case log N) below this parent is also cloned. - This cloning ensures the children of N have multiple parents -- the parents are the cloned node N' and any earlier parent nodes (which include N). Any of these children that will need to be modified will also need to be cloned. - Clone decrements the ref count of N. If the ref count of N is becoming 0 it will be deleted and no longer a parent of its children, so by the definition of what a ref count means it should decrement the ref count of its children. <!-- Sent from Reviewable.io -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the //go:generate ../../util/interval/generic/gen.sh
lines, that's a nice way to do this
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):
fi which go_generics > /dev/null || (echo "Could not find go_generics, to install run: \"go get -u github.com/mmatczuk/go_generics/cmd/go_generics\""; exit 1)
shouldn't the makefile install them into ./bin
and then this use the ones from there? otherwise we have hermetic build issues, no?
4c6b73b
to
816bc02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a little too long trying to figure out why some comments were being stripped from the generated files. First I thought they were being stripped somewhere in go_generics
. I then realized they were being stripped earlier, in go_merge
when we merged the contract.go
file with the template files. go_merge
passes ast.FilterUnassociatedComments
to the stdlib's ast.MergePackageFiles
here. Removing that didn't do the trick though, because comments were then positioned incorrectly after the file merge, completely mangling the code. It turns out that ast.MergePackageFiles
doesn't make any attempt to reposition comments as it merges other AST nodes. I gave fixing this a shot for a little too long before eventually finding out that this is a "major design flaw" in how Go handles the relation between free-floating comments and AST nodes: golang/go#20744. In fact, according to Rob Griesemer, this is the single biggest issue when manipulating the AST.
So long story short, we're dropping the dependency on go_merge
and doing the simpler thing of a small grep and then a file concatenation. That works and avoids the issues around missing comments. With that, all of the lint warnings are resolved and this is ready to go in.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):
Previously, danhhz (Daniel Harrison) wrote…
shouldn't the makefile install them into
./bin
and then this use the ones from there? otherwise we have hermetic build issues, no?
Yes, the Makefile
change ensures that these are installed into ./bin
, this language was just off. Fixed (by removing).
816bc02
to
045ed93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo a small comment on the the *_tmpl.go
files. I want to find a reason to use go_generics
myself now.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @danhhz, @nvanbenschoten, and @sumeerbhola)
pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):
// Silence unused warnings. The linter doesn't notice that these are test // functions because their file is not named *_test.go. We don't want it // to be because then it would be run by `go test`.
Show the *_tmpl.go
files contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with _
?
045ed93
to
e967bc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz, @petermattis, and @sumeerbhola)
pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Show the
*_tmpl.go
files contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with_
?
That's a good question. Changing the files to start with _
didn't appease the linter without this file. Neither did adding a // +build ignore
build tag.
However, switching the test file from interval_btree_test_tmpl.go
to interval_btree_tmpl_test.go
allowed it to get picked up as a test by the linter, so we could then get rid of this file. That then required the use of a // +build ignore
build tag, which I added and now strip out in gen.sh
.
That seems like a nicer solution to me. Thanks for the suggestion.
e967bc8
to
1dc5772
Compare
CI is trying to pick up and stress this new bors r=petermattis |
Build failed (retrying...) |
^ Everything passed except |
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
See cockroachdb#43740. This PR pulls the existing immutable interval B-Tree structure out of the spanlatch package and into a utility library. In order to avoid a loss of both type safety and performance, it opts for code generation by using https://github.com/mmatczuk/go_generics. The new util library exposes a script that is meant to be invoked by go generate to create specializations of the ordered tree structure for individual use cases. This ensures that each use can achieve the performance of being hand-written without needing to continue the pattern of copying the btree code and manually modifying it each time we want to use it.
This commit adds a comment to `btree.Clone` explaining why it is safe. Release note: None
1dc5772
to
f16ed52
Compare
We're no longer avoiding the compilation of bors r=petermattis |
43850: util/interval: generate type-safe specializations of interval B-tree r=petermattis a=nvanbenschoten See #43740. This PR pulls the existing immutable interval B-Tree structure out of the spanlatch package and into a utility library. In order to avoid a loss of both type safety and performance, it opts for code generation by using https://github.com/mmatczuk/go_generics. The new util library exposes a script that is meant to be invoked by go generate to create specializations of the ordered tree structure for individual use cases. This ensures that each use can achieve the performance of being hand-written without needing to continue the pattern of copying the btree code and manually modifying it each time we want to use it. cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment. I'm hoping this also unblocks @sumeerbhola on #43740. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
44617: util/interval/generic: avoid new allocations in btree benchmarks r=nvanbenschoten a=nvanbenschoten The generic template for interval btree makes an attempt to work with both value and pointer parameterized types. To facilitate this option in tests, it used reflection to create new instances of the item type. This reflection was preventing allocations of the item type from being inlined and avoided by escape analysis, which was skewing benchmarks. This commit fixes this by avoiding the reflection and instead adding a constructor to the contract of the parameterized type. This is cleaner and allows the allocations to be avoided in these benchmarks. Here's the new `benchdiff` output of the latch manager's btree from before #43850 to after this change: [benchdiff sheet](https://docs.google.com/spreadsheets/d/1ZxtPPSWV0z76msCiwIyqL0xnUQF530ZwnbnfrvEzNAE/edit#gid=4). Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
See #43740.
This PR pulls the existing immutable interval B-Tree structure out of the
spanlatch package and into a utility library.
In order to avoid a loss of both type safety and performance, it opts for
code generation by using https://github.com/mmatczuk/go_generics. The new
util library exposes a script that is meant to be invoked by go generate
to create specializations of the ordered tree structure for individual
use cases. This ensures that each use can achieve the performance of being
hand-written without needing to continue the pattern of copying the btree
code and manually modifying it each time we want to use it.
cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment.
I'm hoping this also unblocks @sumeerbhola on #43740.