Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/interval: generate type-safe specializations of interval B-tree #43850

Merged
merged 2 commits into from
Jan 31, 2020

Conversation

nvanbenschoten
Copy link
Member

@nvanbenschoten nvanbenschoten commented Jan 9, 2020

See #43740.

This PR pulls the existing immutable interval B-Tree structure out of the
spanlatch package and into a utility library.

In order to avoid a loss of both type safety and performance, it opts for
code generation by using https://github.com/mmatczuk/go_generics. The new
util library exposes a script that is meant to be invoked by go generate
to create specializations of the ordered tree structure for individual
use cases. This ensures that each use can achieve the performance of being
hand-written without needing to continue the pattern of copying the btree
code and manually modifying it each time we want to use it.

cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment.
I'm hoping this also unblocks @sumeerbhola on #43740.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch 4 times, most recently from 7b47bdf to fc64462 Compare January 9, 2020 23:03
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly neat. Not very intrusive and the generated code is perfectly readable. I have no objections to this approach if you can define the generic type appropriately for the lock table use case.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):

// T is a Template type. The methods in the interface make up its contract.
type T = interface {

What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

What is this about? I didn't see mention of it in the go_generics readme. That documentation is admittedly sparse.

Oh, I see that this is just the type being replaced. What through me is the type T =. I see other examples of go_generics which use type T interface. Are both styles supported?

Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis)


pkg/util/interval/generic/internal/contract.go, line 14 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Oh, I see that this is just the type being replaced. What through me is the type T =. I see other examples of go_generics which use type T interface. Are both styles supported?

Yes, the type T is replaced. I'm not sure what the difference between the type alias and the type definition is. https://github.com/scylladb/go-set uses the former and https://github.com/mmatczuk/go_generics uses the latter in its tests. Based on the code in the generator itself, it doesn't look like there's a difference in how they're handled - either way, they're removed from the resulting AST.

The interface serves two purposes: to specify a contract for the parameterized type and allow the templated go files to compile.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks.

Reviewed 14 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):

  3. Include a go generate declaration that invokes the gen.sh script with the
     type name as the first argument and the package name as the second argument.
  4. Invoke go gnerate.

nit: generate


pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):

	SetID(uint64)
	SetKey([]byte)
	SetEndKey([]byte)

why does the btree implementation need the Set* methods? And I don't see the code in example_interval_btree.go calling the Set* methods.

Ah, I now see it is needed by the test. Can you add a comment here?


pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):

}

// incRef acquires a reference to the node.

I have a hard time understanding the ref counting happening in this file and reasoning about its correctness.
A snapshot of the tree only increments the ref count on the root node. And the snapshot happens before insertion of the latches. So presumably the root has a refcount of at least 2 (if the baseline is a refcount of 1). Then the insertion needs a mutable root node which must mean a copy is made of the root, and this copy increments the refs of its children, but not a recursive increment. But mut() also does a recursive decrement -- why (when were these recursive refs acquired)? I am assuming that mut() acquiring refs on its children is what ensures that whatever path the insert takes will see a node that does not have a ref count of 1 and so have to make a copy, while the paths not taken don't need a recursive refcount increment -- so a refcount of > 1 at a node also protects its subtree.

Could you add a high-level comment somewhere in this file?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from fc64462 to 4c6b73b Compare January 14, 2020 20:48
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I still need to make a few tweaks here to get it passing all lint checks. I'll ping the PR when that's done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @petermattis and @sumeerbhola)


pkg/util/interval/generic/doc.go, line 27 at r1 (raw file):

Previously, sumeerbhola wrote…

nit: generate

Done.


pkg/util/interval/generic/internal/contract.go, line 21 at r1 (raw file):

Previously, sumeerbhola wrote…

why does the btree implementation need the Set* methods? And I don't see the code in example_interval_btree.go calling the Set* methods.

Ah, I now see it is needed by the test. Can you add a comment here?

Done.


pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):
Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why mut() does a recursive decrement on the cloned node's refcount. On the path that you're referencing, mut() is performing a clone so that it can write. Because mut() is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call to decRef (I added a comment), then that should recursively release references on the children it points to and be recycled.

Could you add a high-level comment somewhere in this file?

I added the following comment to btree.Clone:

Incrementing the reference count on the root node is sufficient to ensure that
no node in the cloned tree can be mutated by an actor holding a reference to the
original tree and vice versa. This property is upheld because the root node in
the receiver btree and the returned btree will both necessarily have a reference
count of at least 2 when this method returns. All tree mutations recursively
acquire mutable node references (see mut) as they traverse down the tree. The
act of acquiring a mutable node reference performs a clone if a node's reference
count is greater than one. Cloning a node (see clone) increases the reference
count on each of its children, ensuring that they have a reference count of at
least 2. This, in turn, ensures that any of the child nodes that are modified
will also be copied-on-write, recursively ensuring the immutability property
over the entire tree.```

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 15 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)


pkg/util/interval/generic/internal/interval_btree.tmpl.go, line 166 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Your understanding sounds almost exactly correct. The one thing that it sounds like you're still unsure about is why mut() does a recursive decrement on the cloned node's refcount. On the path that you're referencing, mut() is performing a clone so that it can write. Because mut() is cloning a node, it can release it's reference to the old node. If this is the final reference to the old node, which is only possible if we're racing with another call to decRef (I added a comment), then that should recursively release references on the children it points to and be recycled.

Could you add a high-level comment somewhere in this file?

I added the following comment to btree.Clone:

Incrementing the reference count on the root node is sufficient to ensure that
no node in the cloned tree can be mutated by an actor holding a reference to the
original tree and vice versa. This property is upheld because the root node in
the receiver btree and the returned btree will both necessarily have a reference
count of at least 2 when this method returns. All tree mutations recursively
acquire mutable node references (see mut) as they traverse down the tree. The
act of acquiring a mutable node reference performs a clone if a node's reference
count is greater than one. Cloning a node (see clone) increases the reference
count on each of its children, ensuring that they have a reference count of at
least 2. This, in turn, ensures that any of the child nodes that are modified
will also be copied-on-write, recursively ensuring the immutability property
over the entire tree.```
</blockquote></details>

This comment is helpful. I was still a bit lost, so I reread the code.

The key thing that I missed earlier (a code comment would help) is that a ref represents how many parents a node has. The one special case is the root ref is how many btree structs point to it. Having one parent does not mean the node can be mutated since that parent may be part of two trees.
- when mutating, when see a node N with more than one parent, eagerly clone it. This propagates information of what is shared (that is known higher up in the tree), recursively downwards by ensuring anything along the mutation path (worst case log N) below this parent is also cloned.
  - This cloning ensures the children of N have multiple parents -- the parents are the cloned node N' and any earlier parent nodes (which include N). Any of these children that will need to be modified will also need to be cloned.
- Clone decrements the ref count of N. If the ref count of N is becoming 0 it will be deleted and no longer a parent of its children, so by the definition of what a ref count means it should decrement the ref count of its children.


<!-- Sent from Reviewable.io -->

Copy link
Contributor

@danhhz danhhz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the //go:generate ../../util/interval/generic/gen.sh lines, that's a nice way to do this

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @petermattis, and @sumeerbhola)


pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):

fi

which go_generics > /dev/null || (echo "Could not find go_generics, to install run: \"go get -u github.com/mmatczuk/go_generics/cmd/go_generics\""; exit 1)

shouldn't the makefile install them into ./bin and then this use the ones from there? otherwise we have hermetic build issues, no?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from 4c6b73b to 816bc02 Compare January 30, 2020 01:24
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a little too long trying to figure out why some comments were being stripped from the generated files. First I thought they were being stripped somewhere in go_generics. I then realized they were being stripped earlier, in go_merge when we merged the contract.go file with the template files. go_merge passes ast.FilterUnassociatedComments to the stdlib's ast.MergePackageFiles here. Removing that didn't do the trick though, because comments were then positioned incorrectly after the file merge, completely mangling the code. It turns out that ast.MergePackageFiles doesn't make any attempt to reposition comments as it merges other AST nodes. I gave fixing this a shot for a little too long before eventually finding out that this is a "major design flaw" in how Go handles the relation between free-floating comments and AST nodes: golang/go#20744. In fact, according to Rob Griesemer, this is the single biggest issue when manipulating the AST.

So long story short, we're dropping the dependency on go_merge and doing the simpler thing of a small grep and then a file concatenation. That works and avoids the issues around missing comments. With that, all of the lint warnings are resolved and this is ready to go in.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @danhhz, @petermattis, and @sumeerbhola)


pkg/util/interval/generic/gen.sh, line 10 at r3 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

shouldn't the makefile install them into ./bin and then this use the ones from there? otherwise we have hermetic build issues, no?

Yes, the Makefile change ensures that these are installed into ./bin, this language was just off. Fixed (by removing).

@nvanbenschoten nvanbenschoten changed the title [DNM] util/interval: generate type-safe specializations of interval B-tree util/interval: generate type-safe specializations of interval B-tree Jan 30, 2020
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from 816bc02 to 045ed93 Compare January 30, 2020 01:56
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo a small comment on the the *_tmpl.go files. I want to find a reason to use go_generics myself now.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @danhhz, @nvanbenschoten, and @sumeerbhola)


pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):

// Silence unused warnings. The linter doesn't notice that these are test
// functions because their file is not named *_test.go. We don't want it
// to be because then it would be run by `go test`.

Show the *_tmpl.go files contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with _?

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from 045ed93 to e967bc8 Compare January 30, 2020 18:30
Copy link
Member Author

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @danhhz, @petermattis, and @sumeerbhola)


pkg/util/interval/generic/internal/unused.go, line 15 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Show the *_tmpl.go files contain a build tag preventing them from being built by default? Not sure if that would work. Alternately, doesn't Go ignore files that start with _?

That's a good question. Changing the files to start with _ didn't appease the linter without this file. Neither did adding a // +build ignore build tag.

However, switching the test file from interval_btree_test_tmpl.go to interval_btree_tmpl_test.go allowed it to get picked up as a test by the linter, so we could then get rid of this file. That then required the use of a // +build ignore build tag, which I added and now strip out in gen.sh.

That seems like a nicer solution to me. Thanks for the suggestion.

@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from e967bc8 to 1dc5772 Compare January 30, 2020 19:03
@nvanbenschoten
Copy link
Member Author

CI is trying to pick up and stress this new pkg/util/interval/generic/internal package, which isn't going to work now that it has no buildable go files. This should be a one-time issue due to our strategy of stressing new packages, so I'm going to go ahead and merge. Thanks for the reviews!

bors r=petermattis

@craig
Copy link
Contributor

craig bot commented Jan 30, 2020

Build failed (retrying...)

@nvanbenschoten
Copy link
Member Author

^ Everything passed except go bench, which failed due to No space left on device.

@craig
Copy link
Contributor

craig bot commented Jan 30, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jan 31, 2020

Build failed (retrying...)

@craig
Copy link
Contributor

craig bot commented Jan 31, 2020

Build failed

See cockroachdb#43740.

This PR pulls the existing immutable interval B-Tree structure out of the
spanlatch package and into a utility library.

In order to avoid a loss of both type safety and performance, it opts for
code generation by using https://github.com/mmatczuk/go_generics. The new
util library exposes a script that is meant to be invoked by go generate
to create specializations of the ordered tree structure for individual
use cases. This ensures that each use can achieve the performance of being
hand-written without needing to continue the pattern of copying the btree
code and manually modifying it each time we want to use it.
This commit adds a comment to `btree.Clone` explaining why it is safe.

Release note: None
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/generics branch from 1dc5772 to f16ed52 Compare January 31, 2020 02:45
@nvanbenschoten
Copy link
Member Author

We're no longer avoiding the compilation of contract.go so there's at least one .go buildable go source file in pkg/util/interval/generic/internal and we are now manually avoiding the unused warning on T. This all appears to be working now.

bors r=petermattis

craig bot pushed a commit that referenced this pull request Jan 31, 2020
43850: util/interval: generate type-safe specializations of interval B-tree r=petermattis a=nvanbenschoten

See #43740.

This PR pulls the existing immutable interval B-Tree structure out of the
spanlatch package and into a utility library.

In order to avoid a loss of both type safety and performance, it opts for
code generation by using https://github.com/mmatczuk/go_generics. The new
util library exposes a script that is meant to be invoked by go generate
to create specializations of the ordered tree structure for individual
use cases. This ensures that each use can achieve the performance of being
hand-written without needing to continue the pattern of copying the btree
code and manually modifying it each time we want to use it.

cc. @petermattis @sumeerbhola I'm curious to get your thoughts on this experiment.
I'm hoping this also unblocks @sumeerbhola on #43740.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Contributor

craig bot commented Jan 31, 2020

Build succeeded

@craig craig bot merged commit f16ed52 into cockroachdb:master Jan 31, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/generics branch January 31, 2020 17:08
craig bot pushed a commit that referenced this pull request Feb 3, 2020
44617: util/interval/generic: avoid new allocations in btree benchmarks r=nvanbenschoten a=nvanbenschoten

The generic template for interval btree makes an attempt to work with
both value and pointer parameterized types. To facilitate this option
in tests, it used reflection to create new instances of the item type.
This reflection was preventing allocations of the item type from being
inlined and avoided by escape analysis, which was skewing benchmarks.

This commit fixes this by avoiding the reflection and instead adding
a constructor to the contract of the parameterized type. This is cleaner
and allows the allocations to be avoided in these benchmarks.

Here's the new `benchdiff` output of the latch manager's btree from
before #43850 to after this change: [benchdiff sheet](https://docs.google.com/spreadsheets/d/1ZxtPPSWV0z76msCiwIyqL0xnUQF530ZwnbnfrvEzNAE/edit#gid=4).

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants