Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execgen: add generic inliner #49728

Merged
merged 3 commits into from
Jun 8, 2020
Merged

execgen: add generic inliner #49728

merged 3 commits into from
Jun 8, 2020

Conversation

jordanlewis
Copy link
Member

@jordanlewis jordanlewis commented May 30, 2020

This commit adds a new directive to execgen:

// execgen:inline

This directive causes a function that's annotated with it to be inlined
into all of its callers via AST transformation. This kind of inlining is
smarter than before - it permits arguments with names different from the
names of the function's formal parameters, for example.

This commit also changes the functions in distinct_tmpl to use this
directive instead of the manual templating method.

I think this is the only easy function to change. The rest have, at the
simple level of difficulty, static template-time parameters, and at a
harder level of difficulty, type parameters. Improving these cases
holistically should come next.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

This was making it hard to run the same benchmarks in a consistent way.

Release note: None
@jordanlewis jordanlewis changed the title WIP, DNM: example of ast manipulation for execgen execgen: add generic inliner Jun 5, 2020
@jordanlewis jordanlewis requested a review from a team June 5, 2020 20:27
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty slick -- do do you have a snippet of what the generated callsites look like now?

pkg/sql/colexec/execgen/cmd/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/cmd/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/cmd/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/cmd/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/cmd/execgen/inline.go Outdated Show resolved Hide resolved
@jordanlewis
Copy link
Member Author

https://gist.github.com/jordanlewis/36b2cf775198886c6a65ac91de1911d0

@jordanlewis
Copy link
Member Author

Alright, redid a lot of this - added plentiful comments, moved files into pkg/sql/colexec/execgen instead of cmd/execgen, added unit tests for a couple of functions, and added a datadriven test that works on Go files with execgen annotations.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the comments, it helped alot! A few questions:

  • This is a decent amount of code, and it helps us inline only this simple case as you mentioned in slack -- what is missing here, and how much code do you forsee needing to add for the more complicated cases?
  • Will introduction of trying to template stuff affect this, or will it just be another pass (probably scheduled before this one)
  • I'm worried about execgen becoming more of a "arcane" piece of code than it is now, with the introduction of the dst package -- it's like a mini DSL for go ast manipulation. Maybe that is an unfounded worry though.

pkg/sql/colexec/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/inline.go Outdated Show resolved Hide resolved
pkg/sql/colexec/execgen/inline.go Show resolved Hide resolved
pkg/sql/colexec/execgen/inline.go Show resolved Hide resolved
pkg/sql/colexec/execgen/inline_test.go Show resolved Hide resolved
pkg/sql/colexec/execgen/testdata/inline Show resolved Hide resolved
@jordanlewis
Copy link
Member Author

Thanks for the review!

This is a decent amount of code, and it helps us inline only this simple case as you mentioned in slack -- what is missing here, and how much code do you forsee needing to add for the more complicated cases?

Yeah, I'm definitely cognizant of how much effort this is. I guess I'm still kind of in an exploratory mode with this work. I'm not completely sure what the end goal is yet.

The next step I'd like to do is templating simple boolean arguments, which seems to be a common thing throughout the exec codebase. For example:

func a(i int) {
   if col.MaybeHasNulls() {
     x = stuff(i, true)
   } else {
     x = stuff(i, false)
   }
}

// execgen:inline
// execgen:template<hasNulls>
func stuff(runtimeArg int, hasNulls bool) int {
   if hasNulls {
      return 3
   } else {
      return 4
   }
}

Currently, this is done with a mix of text/template and regex which although probably less code, is pretty confusing, hacky, and hard to get started with.

Once that's done, the next step would be permitting "mid-stack inlining", so that you can pass template boolean arguments through more than one call site - this is also a common ask. This part seems annoying - you have to keep a graph of functions that are calling each other. It's definitely dawning on me that this is why you have IRs, but I don't think I want to get into messing with any non-AST IRs - if it comes to that I think this project will have jumped the shark.

Once that's done, the next step would be permitting non-boolean template arguments: specifically, types.T template arguments. If that part can be figured out, we'll be able to delete 90% of the custom, per-file template code.

The downside of the code that I'm writing is that inevitably it'll have some subtleties and restrictions - like I'm not planning to try to deal with having conditional expressions that mix template and runtime parameters, the engine will probably just ignore those - so that might be confusing too. I just think it'll be less confusing on the whole, and easier to get started with.

Will introduction of trying to template stuff affect this, or will it just be another pass (probably scheduled before this one)

It'll be a concurrent pass. As we're collecting the functions, we'll also note their template arguments down. As we're inlining the functions, we'll evaluate the template-time conditionals to only include the matched code in the inlined output.

I'm worried about execgen becoming more of a "arcane" piece of code than it is now, with the introduction of the dst package -- it's like a mini DSL for go ast manipulation. Maybe that is an unfounded worry though.

It's definitely a founded worry. But try to take a look at some of the other code we already have. It's very, very arcane, and you have to be arcane in subtle different ways for every operator you write, right now. I'd much prefer to have a single arcane execgen than 100 arcane templates. Also, I hear your point about this becoming arcane, but hopefully the datadriven test styles I've added will make it pretty easy to test and understand.

@jordanlewis
Copy link
Member Author

@rohany
Copy link
Contributor

rohany commented Jun 6, 2020

It's definitely dawning on me that this is why you have IRs, but I don't think I want to get into messing with any non-AST IRs - if it comes to that I think this project will have jumped the shark.

Yeah, I'm unsure how far you can get without fancier IR's, but you can also just mock out some IR's with maps of AST nodes to semantic info.

It'll be a concurrent pass. As we're collecting the functions, we'll also note their template arguments down. As we're inlining the functions, we'll evaluate the template-time conditionals to only include the matched code in the inlined output.

Are you sure? It feels to me like inlining and templates are different topics. The templating pass would create the templated versions of these functions, and mark them as inlineable, and then the inline pass goes and inlines them. Im imagining generating two stuff implementations (from above) with inline annotations, replacing each call with the correct one.

Once that's done, the next step would be permitting "mid-stack inlining", so that you can pass template boolean arguments through more than one call site - this is also a common ask. This part seems annoying - you have to keep a graph of functions that are calling each other.

I feel like this will be simpler if inlining and templating are considered separately, but I don't have concrete explanation why.

I'd much prefer to have a single arcane execgen than 100 arcane templates.

Heavily agree, I find the templates difficult to navigate.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing forward with this work!

How is the speed of generating code using this new approach in comparison to the old one?

you have IRs

What is "IRs"?

Reviewed 1 of 1 files at r1, 5 of 7 files at r2, 6 of 8 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/colexec/main_test.go, line 71 at r1 (raw file):

		flag.Parse()
		if f := flag.Lookup("test.bench"); f == nil || f.Value.String() == "" {

I was wondering about how one would do this at some point but ended up setting an environment variable to false on my machine. Turns out it was this simple, very nice change, thanks!


pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):

				}
				// Replace the return with the new assignments.
				return &dst.BlockStmt{List: returnAssignmentSpecs}

I wonder why it is a block stmt, does it have to be so? Just curious. Or it doesn't matter because we're using only body.List below?


pkg/sql/colexec/execgen/inline.go, line 228 at r4 (raw file):

// extractReturnValues generates return value variables. It will produce one
// statement per return return value of the input FuncDecl. For example, for

nit: s/return return/return/.


pkg/sql/colexec/execgen/inline.go, line 256 at r4 (raw file):

		}
	}
	retVal := &dst.DeclStmt{

nit: s/retVal/retValDeclStmt/.


pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):

	reassignmentSpecs := make([]dst.Spec, 0, len(formalParams))
	for i, formalParam := range formalParams {
		if inputIdent, ok := callExpr.Args[i].(*dst.Ident); ok && inputIdent.Name == formalParam.Names[0].Name {

Just curious why we have Names[0], can an AST node have multiple names somehow?


pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):

			decl: "func foo(a int) (int, string) {}",
			expectedRetDecls: `var (
		__retval_0 int

super nit: I'd indent these expectedRetDecls a little bit to the right, if that's ok.


pkg/sql/colexec/execgen/cmd/execgen/main.go, line 134 at r4 (raw file):

			return err
		}

nit: seems like unnecessary newline.


pkg/sql/colexec/execgen/testdata/inline, line 1 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

cool test!

+1


pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):


func a() {
	{

nit: the indentation here seems to be done with tabs, can we adjust that? I don't have a personal opinion on tabs vs spaces, but I think in our codebase we use spaces.

The .eg.go files had unnecessary imports of execgen, which is not
required in the runtime generated code, and in fact is suspicious.

Release note: None
@jordanlewis
Copy link
Member Author

What is "IRs"?

Intermediate representation - the AST is the first representation of a program, but compilers typically transform the AST into several internal representations (like CockroachDB's Expr, opt, distsql trees) for different reasons.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @yuzefovich)


pkg/sql/colexec/main_test.go, line 71 at r1 (raw file):

Previously, yuzefovich wrote…

I was wondering about how one would do this at some point but ended up setting an environment variable to false on my machine. Turns out it was this simple, very nice change, thanks!

🎉


pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):

Previously, yuzefovich wrote…

I wonder why it is a block stmt, does it have to be so? Just curious. Or it doesn't matter because we're using only body.List below?

It's not strictly necessary, but I find it aesthetic - it groups the related statements together. I'm okay deleting it if you want.


pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):

Previously, yuzefovich wrote…

Just curious why we have Names[0], can an AST node have multiple names somehow?

formalParam is a Field, which is used in a few different places in the AST. One of the places it's used is a parameter list. In Go, you can have the syntax func a (a, b int), where both a and b are ints but int is just repeated once. That is the situation in which Names could have more than 1 element. I'm ignoring that case for now.

Btw, I've used https://astexplorer.net a lot to help with this stuff. It's like a Godbolt for text to ASTs.


pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):

Previously, yuzefovich wrote…

super nit: I'd indent these expectedRetDecls a little bit to the right, if that's ok.

I think they need to be just like this, because the tests respect spacing, unfortunately. I don't have it in me to fix that right now, it's so fiddly.


pkg/sql/colexec/execgen/testdata/inline, line 1 at r3 (raw file):

Previously, yuzefovich wrote…

+1

Done.


pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):

Previously, yuzefovich wrote…

nit: the indentation here seems to be done with tabs, can we adjust that? I don't have a personal opinion on tabs vs spaces, but I think in our codebase we use spaces.

Can't adjust this unfortunately, the gofmt used by the pretty printer is hard coded to tabs. I guess I could adjust if I tried really hard but it doesn't seem worth it.

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another commit that gets rid of some unnecessary (and now invalid) imports of execgen from colexec generated files.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @rohany, and @yuzefovich)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo the comment about the linter.

The .eg.go files had unnecessary imports of execgen, which is not
required in the runtime generated code, and in fact is suspicious.

I think this became unnecessary when we recently made a change to how imports are handled. Previously, it was necessary to either surround import of execgen with template comments (i.e. // {{/* and // */}}) or to use it somehow (to remove unused warning).

Reviewed 18 of 30 files at r5, 12 of 12 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @rohany)


pkg/sql/colexec/select_in_tmpl.go, line 37 at r6 (raw file):

// Remove unused warnings.
var (

super nit: could collapse it into a single line declaration.


pkg/sql/colexec/execgen/inline.go, line 118 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

It's not strictly necessary, but I find it aesthetic - it groups the related statements together. I'm okay deleting it if you want.

No strong preference from me.


pkg/sql/colexec/execgen/inline.go, line 290 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

formalParam is a Field, which is used in a few different places in the AST. One of the places it's used is a parameter list. In Go, you can have the syntax func a (a, b int), where both a and b are ints but int is just repeated once. That is the situation in which Names could have more than 1 element. I'm ignoring that case for now.

Btw, I've used https://astexplorer.net a lot to help with this stuff. It's like a Godbolt for text to ASTs.

Got it, thanks.


pkg/sql/colexec/execgen/inline_test.go, line 72 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

I think they need to be just like this, because the tests respect spacing, unfortunately. I don't have it in me to fix that right now, it's so fiddly.

SGTM


pkg/sql/colexec/execgen/testdata/inline, line 21 at r4 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Can't adjust this unfortunately, the gofmt used by the pretty printer is hard coded to tabs. I guess I could adjust if I tried really hard but it doesn't seem worth it.

I agree, not worth it then.


pkg/testutils/lint/lint_test.go, line 1487 at r6 (raw file):

			// This exception is because execgen itself uses panics during code
			// generation - not at execution time.
			":!sql/colexec/execgen/*.go",

Hm, I think we need to be more careful here because in colexec/execgen/cmd/execgen folder we have overloads with code snippets that will be inlined. We need to make sure that those use colexecerror package for panics.

Maybe something like this would work :!sql/colexec/execgen/(^cmd)/*.go? Alternatively, you could do s/panic/colexecerror.InternalError/g in the new execgen package and remove this exception.

This commit adds a new directive to execgen:

// execgen:inline

This directive causes a function that's annotated with it to be inlined
into all of its callers via AST transformation. This kind of inlining is
smarter than before - it permits arguments with names different from the
names of the function's formal parameters, for example.

This commit also changes the functions in distinct_tmpl to use this
directive instead of the manual templating method.

I think this is the only easy function to change. The rest have, at the
simple level of difficulty, static template-time parameters, and at a
harder level of difficulty, type parameters. Improving these cases
holistically should come next.

Release note: None
@jordanlewis
Copy link
Member Author

I fixed the linter (and learned something new about the syntax git uses to specify files).

@jordanlewis
Copy link
Member Author

I'll give @asubiotto a chance to take a look before merging.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: this looks great!

Reviewed 1 of 1 files at r1, 30 of 30 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @rohany, and @yuzefovich)

@jordanlewis
Copy link
Member Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 8, 2020

Merge conflict (retrying...)

@craig
Copy link
Contributor

craig bot commented Jun 8, 2020

Build succeeded

@craig craig bot merged commit a966fd4 into cockroachdb:master Jun 8, 2020
craig bot pushed a commit that referenced this pull request Jun 10, 2020
49939: execgen: introduce // execgen:template directive for bools r=jordanlewis a=jordanlewis

Based on #49728.

This PR adds the // execgen:template<arg1, arg2> syntax for
concrete boolean template variables.

For example:

```
// execgen:inline
// execgen:template<templateParam>
func a (runtimeParam int, templateParam bool) int {
    if templateParam {
        return runtimeParam + 1
    } else {
        return runtimeParam - 1
    }
}

func main() {
    x := a(0, true)
    y := a(0, false)
    fmt.Println(x, y)
}
```

This code will print `1, -1` when eventually executed, but the
templateParam boolean will be evaluated at code generation time and not
appear in the generated code at all. Two variants of `a()` will be
inlined into their callsites - one with templateParam true, and one with
templateParam false.

Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@jordanlewis jordanlewis deleted the execgen branch June 11, 2020 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants