Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making fast AST rewriter faster #7726

Merged
merged 6 commits into from
Mar 23, 2021
Merged

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Mar 22, 2021

The work started with #7716 to make AST Rewriter faster. That achieved 20% improvement.

While doing the first part of the optimisation, we removed the use of panic/recover, and made the rewriter methods return an explicit error instead.

In this PR, we changed our minds, and now use panic without a recover to signal a bug in the rewriter framework. This is still safe to do, since Vitess recovers higher up in the stack anyway.

By this time, @vmg pointed us to the fact that we had a a couple of variables that were escaping the methods they lived on and were being moved to the heap. After fixing that, we got these splendid numbers.

The following optimisations resulted in improvement of 17%+ in CPU time and 15% on memory allocation.

  1. Removed closure of using looping index in function call to passing it explicitly.
  2. Removed closure of err in function to using panic. (Panic recovery to be handled at higher level)
  3. Changed all rewrite functions to return bool than error.
name                        old time/op    new time/op    delta
RewriteLargeExpression/0-8     169ns ±14%     157ns ±12%     ~     (p=0.310 n=5+5)
RewriteLargeExpression/1-8     216ns ± 5%     175ns ± 3%  -18.72%  (p=0.008 n=5+5)
RewriteLargeExpression/2-8     210ns ± 1%     178ns ± 2%  -15.32%  (p=0.008 n=5+5)
RewriteLargeExpression/3-8    4.06µs ± 1%    2.94µs ± 1%  -27.65%  (p=0.008 n=5+5)
RewriteLargeExpression/4-8    1.57µs ± 2%    1.15µs ± 1%  -26.69%  (p=0.008 n=5+5)
RewriteLargeExpression/5-8     181ns ± 2%     159ns ±10%  -12.46%  (p=0.008 n=5+5)
RewriteLargeExpression/6-8    53.3µs ± 0%    45.9µs ± 1%  -14.00%  (p=0.008 n=5+5)
RewriteLargeExpression/7-8    67.5µs ± 1%    59.9µs ± 2%  -11.31%  (p=0.008 n=5+5)
RewriteLargeExpression/8-8     106µs ± 1%      96µs ± 6%   -9.35%  (p=0.008 n=5+5)
RewriteLargeExpression/9-8     207ns ± 1%     208ns ± 1%     ~     (p=0.722 n=5+5)
name                        old alloc/op   new alloc/op   delta
RewriteLargeExpression/0-8      128B ± 0%      128B ± 0%     ~     (all equal)
RewriteLargeExpression/1-8      128B ± 0%      128B ± 0%     ~     (all equal)
RewriteLargeExpression/2-8      128B ± 0%      128B ± 0%     ~     (all equal)
RewriteLargeExpression/3-8    1.34kB ± 0%    1.26kB ± 0%   -5.39%  (p=0.008 n=5+5)
RewriteLargeExpression/4-8      544B ± 0%      512B ± 0%   -5.88%  (p=0.008 n=5+5)
RewriteLargeExpression/5-8      128B ± 0%      128B ± 0%     ~     (all equal)
RewriteLargeExpression/6-8    13.0kB ± 0%    12.0kB ± 0%   -7.23%  (p=0.008 n=5+5)
RewriteLargeExpression/7-8    17.2kB ± 0%    16.1kB ± 0%   -6.34%  (p=0.008 n=5+5)
RewriteLargeExpression/8-8    24.0kB ± 0%    22.4kB ± 0%   -6.93%  (p=0.000 n=5+4)
RewriteLargeExpression/9-8      128B ± 0%      128B ± 0%     ~     (all equal)
name                        old allocs/op  new allocs/op  delta
RewriteLargeExpression/0-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)
RewriteLargeExpression/1-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)
RewriteLargeExpression/2-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)
RewriteLargeExpression/3-8      68.0 ± 0%      59.0 ± 0%  -13.24%  (p=0.008 n=5+5)
RewriteLargeExpression/4-8      27.0 ± 0%      23.0 ± 0%  -14.81%  (p=0.008 n=5+5)
RewriteLargeExpression/5-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)
RewriteLargeExpression/6-8       724 ± 0%       607 ± 0%  -16.16%  (p=0.008 n=5+5)
RewriteLargeExpression/7-8       914 ± 0%       778 ± 0%  -14.88%  (p=0.008 n=5+5)
RewriteLargeExpression/8-8     1.33k ± 0%     1.12k ± 0%  -15.60%  (p=0.008 n=5+5)
RewriteLargeExpression/9-8      5.00 ± 0%      5.00 ± 0%     ~     (all equal)

For future references:

We can look at the compile output to spot problems like these. In the go/tools/asthelpergen/integration directory, you can run:

go tool compile -m ast_helper.go types.go test_helpers.go > compile.profile

We can then search for these types of variables by doing:

grep "moved to heap" compile.profile

This will show that a bunch of out variables are moved to the heap, these are expected to be moved to the heap. But we also see a few lines like these:

ast_helper.go:914:6: moved to heap: i
ast_helper.go:885:6: moved to heap: i
ast_helper.go:1047:6: moved to heap: i
ast_helper.go:1054:6: moved to heap: i
ast_helper.go:1179:6: moved to heap: err
ast_helper.go:1212:6: moved to heap: err
ast_helper.go:1142:6: moved to heap: i
ast_helper.go:1149:6: moved to heap: i

These are the variables that were escaping their blocks and needed to live on the heap. err was easy to fix - we are no longer returning errors, so we don't need this variable any more.

The i however was a little trickier to see. The old code was doing this:

for i, el := range node {
	if errF := a.rewriteAST(node, el, func(newNode, parent AST) {
		parent.(InterfaceSlice)[i] = newNode.(AST)
	}); errF != nil {
		return errF
	}
}

To fix it, we changed the code to:

for i, el := range node {
	if !a.rewriteAST(node, el, func(idx int) func(newNode, parent AST) {
		return func(newNode, parent AST) {
			parent.(InterfaceSlice)[idx] = newNode.(AST)
		}
	}(i)) {
		return false
	}
}

The difference is that before, i was being sent over as a closure, and that is costly. By sending the index through a function argument, nothing is being closed over and we save on memory allocations.

Like always with performance optimisations, it's not intuitive that one is faster than the other. The second code block here looks like it will need to create a function object for every call to rewriteAST, which sounds slow. Instead we get a massive performance boost. ¯_(ツ)_/¯

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
…eters

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Signed-off-by: Harshit Gangal <harshit@planetscale.com>
@shlomi-noach
Copy link
Contributor

2fast2furious

Signed-off-by: Harshit Gangal <harshit@planetscale.com>
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Reviewed during pairing this morning. 👍

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I request a detailed PR description?
What parts are slow and why and how you improved them?
Along the lines of #7710 will be nice. That way everyone can understand what was done and we can convert this to a blog post in the performance series we are planning.

@systay systay merged commit 3428d57 into vitessio:master Mar 23, 2021
@systay systay deleted the rewriter-closure branch March 23, 2021 06:19
@deepthi
Copy link
Member

deepthi commented Mar 23, 2021

Very nice description. Thank you @harshit-gangal and @systay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants