-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execgen: add generic inliner #49728
execgen: add generic inliner #49728
Conversation
This was making it hard to run the same benchmarks in a consistent way. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty slick -- do do you have a snippet of what the generated callsites look like now?
Alright, redid a lot of this - added plentiful comments, moved files into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the comments, it helped alot! A few questions:
- This is a decent amount of code, and it helps us inline only this simple case as you mentioned in slack -- what is missing here, and how much code do you forsee needing to add for the more complicated cases?
- Will introduction of trying to template stuff affect this, or will it just be another pass (probably scheduled before this one)
- I'm worried about execgen becoming more of a "arcane" piece of code than it is now, with the introduction of the
dst
package -- it's like a mini DSL for go ast manipulation. Maybe that is an unfounded worry though.
Thanks for the review!
Yeah, I'm definitely cognizant of how much effort this is. I guess I'm still kind of in an exploratory mode with this work. I'm not completely sure what the end goal is yet. The next step I'd like to do is templating simple boolean arguments, which seems to be a common thing throughout the exec codebase. For example:
Currently, this is done with a mix of text/template and regex which although probably less code, is pretty confusing, hacky, and hard to get started with. Once that's done, the next step would be permitting "mid-stack inlining", so that you can pass template boolean arguments through more than one call site - this is also a common ask. This part seems annoying - you have to keep a graph of functions that are calling each other. It's definitely dawning on me that this is why you have IRs, but I don't think I want to get into messing with any non-AST IRs - if it comes to that I think this project will have jumped the shark. Once that's done, the next step would be permitting non-boolean template arguments: specifically, The downside of the code that I'm writing is that inevitably it'll have some subtleties and restrictions - like I'm not planning to try to deal with having conditional expressions that mix template and runtime parameters, the engine will probably just ignore those - so that might be confusing too. I just think it'll be less confusing on the whole, and easier to get started with.
It'll be a concurrent pass. As we're collecting the functions, we'll also note their template arguments down. As we're inlining the functions, we'll evaluate the template-time conditionals to only include the matched code in the inlined output.
It's definitely a founded worry. But try to take a look at some of the other code we already have. It's very, very arcane, and you have to be arcane in subtle different ways for every operator you write, right now. I'd much prefer to have a single arcane execgen than 100 arcane templates. Also, I hear your point about this becoming arcane, but hopefully the datadriven test styles I've added will make it pretty easy to test and understand. |
This one guy has some prior art: https://github.com/cosmos72/gomacro/blob/master/doc/generics-c++.md |
Yeah, I'm unsure how far you can get without fancier IR's, but you can also just mock out some IR's with maps of AST nodes to semantic info.
Are you sure? It feels to me like inlining and templates are different topics. The templating pass would create the templated versions of these functions, and mark them as inlineable, and then the inline pass goes and inlines them. Im imagining generating two
I feel like this will be simpler if inlining and templating are considered separately, but I don't have concrete explanation why.
Heavily agree, I find the templates difficult to navigate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing forward with this work!
How is the speed of generating code using this new approach in comparison to the old one?
you have IRs
What is "IRs"?
Reviewed 1 of 1 files at r1, 5 of 7 files at r2, 6 of 8 files at r3, 1 of 1 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)
pkg/sql/colexec/main_test.go, line 71 at r1 (raw file):
flag.Parse() if f := flag.Lookup("test.bench"); f == nil || f.Value.String() == "" {
I was wondering about how one would do this at some point but ended up setting an environment variable to false
on my machine. Turns out it was this simple, very nice change, thanks!
pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):
} // Replace the return with the new assignments. return &dst.BlockStmt{List: returnAssignmentSpecs}
I wonder why it is a block stmt, does it have to be so? Just curious. Or it doesn't matter because we're using only body.List
below?
pkg/sql/colexec/execgen/inline.go, line 228 at r4 (raw file):
// extractReturnValues generates return value variables. It will produce one // statement per return return value of the input FuncDecl. For example, for
nit: s/return return/return/
.
pkg/sql/colexec/execgen/inline.go, line 256 at r4 (raw file):
} } retVal := &dst.DeclStmt{
nit: s/retVal/retValDeclStmt/
.
pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):
reassignmentSpecs := make([]dst.Spec, 0, len(formalParams)) for i, formalParam := range formalParams { if inputIdent, ok := callExpr.Args[i].(*dst.Ident); ok && inputIdent.Name == formalParam.Names[0].Name {
Just curious why we have Names[0]
, can an AST node have multiple names somehow?
pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):
decl: "func foo(a int) (int, string) {}", expectedRetDecls: `var ( __retval_0 int
super nit: I'd indent these expectedRetDecls
a little bit to the right, if that's ok.
pkg/sql/colexec/execgen/cmd/execgen/main.go, line 134 at r4 (raw file):
return err }
nit: seems like unnecessary newline.
pkg/sql/colexec/execgen/testdata/inline, line 1 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
cool test!
+1
pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):
func a() { {
nit: the indentation here seems to be done with tabs, can we adjust that? I don't have a personal opinion on tabs vs spaces, but I think in our codebase we use spaces.
The .eg.go files had unnecessary imports of execgen, which is not required in the runtime generated code, and in fact is suspicious. Release note: None
Intermediate representation - the AST is the first representation of a program, but compilers typically transform the AST into several internal representations (like CockroachDB's Expr, opt, distsql trees) for different reasons. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @yuzefovich)
pkg/sql/colexec/main_test.go, line 71 at r1 (raw file):
Previously, yuzefovich wrote…
I was wondering about how one would do this at some point but ended up setting an environment variable to
false
on my machine. Turns out it was this simple, very nice change, thanks!
🎉
pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):
Previously, yuzefovich wrote…
I wonder why it is a block stmt, does it have to be so? Just curious. Or it doesn't matter because we're using only
body.List
below?
It's not strictly necessary, but I find it aesthetic - it groups the related statements together. I'm okay deleting it if you want.
pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):
Previously, yuzefovich wrote…
Just curious why we have
Names[0]
, can an AST node have multiple names somehow?
formalParam
is a Field
, which is used in a few different places in the AST. One of the places it's used is a parameter list. In Go, you can have the syntax func a (a, b int)
, where both a
and b
are ints but int is just repeated once. That is the situation in which Names
could have more than 1 element. I'm ignoring that case for now.
Btw, I've used https://astexplorer.net a lot to help with this stuff. It's like a Godbolt for text to ASTs.
pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):
Previously, yuzefovich wrote…
super nit: I'd indent these
expectedRetDecls
a little bit to the right, if that's ok.
I think they need to be just like this, because the tests respect spacing, unfortunately. I don't have it in me to fix that right now, it's so fiddly.
pkg/sql/colexec/execgen/testdata/inline, line 1 at r3 (raw file):
Previously, yuzefovich wrote…
+1
Done.
pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):
Previously, yuzefovich wrote…
nit: the indentation here seems to be done with tabs, can we adjust that? I don't have a personal opinion on tabs vs spaces, but I think in our codebase we use spaces.
Can't adjust this unfortunately, the gofmt used by the pretty printer is hard coded to tabs. I guess I could adjust if I tried really hard but it doesn't seem worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another commit that gets rid of some unnecessary (and now invalid) imports of execgen
from colexec
generated files.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modulo the comment about the linter.
The .eg.go files had unnecessary imports of execgen, which is not
required in the runtime generated code, and in fact is suspicious.
I think this became unnecessary when we recently made a change to how imports are handled. Previously, it was necessary to either surround import of execgen
with template comments (i.e. // {{/*
and // */}}
) or to use it somehow (to remove unused warning).
Reviewed 18 of 30 files at r5, 12 of 12 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)
pkg/sql/colexec/select_in_tmpl.go, line 37 at r6 (raw file):
// Remove unused warnings. var (
super nit: could collapse it into a single line declaration.
pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
It's not strictly necessary, but I find it aesthetic - it groups the related statements together. I'm okay deleting it if you want.
No strong preference from me.
pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
formalParam
is aField
, which is used in a few different places in the AST. One of the places it's used is a parameter list. In Go, you can have the syntaxfunc a (a, b int)
, where botha
andb
are ints but int is just repeated once. That is the situation in whichNames
could have more than 1 element. I'm ignoring that case for now.Btw, I've used https://astexplorer.net a lot to help with this stuff. It's like a Godbolt for text to ASTs.
Got it, thanks.
pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
I think they need to be just like this, because the tests respect spacing, unfortunately. I don't have it in me to fix that right now, it's so fiddly.
SGTM
pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Can't adjust this unfortunately, the gofmt used by the pretty printer is hard coded to tabs. I guess I could adjust if I tried really hard but it doesn't seem worth it.
I agree, not worth it then.
pkg/testutils/lint/lint_test.go, line 1487 at r6 (raw file):
// This exception is because execgen itself uses panics during code // generation - not at execution time. ":!sql/colexec/execgen/*.go",
Hm, I think we need to be more careful here because in colexec/execgen/cmd/execgen
folder we have overloads with code snippets that will be inlined. We need to make sure that those use colexecerror
package for panics.
Maybe something like this would work :!sql/colexec/execgen/(^cmd)/*.go
? Alternatively, you could do s/panic/colexecerror.InternalError/g
in the new execgen
package and remove this exception.
This commit adds a new directive to execgen: // execgen:inline This directive causes a function that's annotated with it to be inlined into all of its callers via AST transformation. This kind of inlining is smarter than before - it permits arguments with names different from the names of the function's formal parameters, for example. This commit also changes the functions in distinct_tmpl to use this directive instead of the manual templating method. I think this is the only easy function to change. The rest have, at the simple level of difficulty, static template-time parameters, and at a harder level of difficulty, type parameters. Improving these cases holistically should come next. Release note: None
I fixed the linter (and learned something new about the syntax git uses to specify files). |
I'll give @asubiotto a chance to take a look before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 30 of 30 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @rohany, and @yuzefovich)
TFTRs! bors r+ |
Merge conflict (retrying...) |
Build succeeded |
49939: execgen: introduce // execgen:template directive for bools r=jordanlewis a=jordanlewis Based on #49728. This PR adds the // execgen:template<arg1, arg2> syntax for concrete boolean template variables. For example: ``` // execgen:inline // execgen:template<templateParam> func a (runtimeParam int, templateParam bool) int { if templateParam { return runtimeParam + 1 } else { return runtimeParam - 1 } } func main() { x := a(0, true) y := a(0, false) fmt.Println(x, y) } ``` This code will print `1, -1` when eventually executed, but the templateParam boolean will be evaluated at code generation time and not appear in the generated code at all. Two variants of `a()` will be inlined into their callsites - one with templateParam true, and one with templateParam false. Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
This commit adds a new directive to execgen:
// execgen:inline
This directive causes a function that's annotated with it to be inlined
into all of its callers via AST transformation. This kind of inlining is
smarter than before - it permits arguments with names different from the
names of the function's formal parameters, for example.
This commit also changes the functions in distinct_tmpl to use this
directive instead of the manual templating method.
I think this is the only easy function to change. The rest have, at the
simple level of difficulty, static template-time parameters, and at a
harder level of difficulty, type parameters. Improving these cases
holistically should come next.
Release note: None