refactor($parse): new and faster $parse #10592

lgalfaso · 2014-12-28T21:15:53Z

Change the way parse works from the old mechanism to a multiple stages
parsing and code generation. The new parse is a four stages parsing

Lexer
AST building
AST processing
Cacheing, one-time binding and $watch optimizations

The Lexer phase remains unchanged.

AST building phase follows Mozilla Parse API [1] and generates an AST that is compatible. The only exception was needed for filters as JavaScript does not support filters, in this case, a filter is transformed into a CallExpression that has an extra property named filter with the value of true. This phase is heavily based on the previous implementation of $parse.

The AST processing phase transforms the AST into a function that can be executed to evaluate the expression. The logic for expressions remains unchanged. The AST processing phase works in two different ways depending if csp is enabled or disabled. If csp is enabled, the processing phase returns pre-generated function that interpret specific parts of the AST.
When csp is disabled, then the entire expression is compiled into a single function that is later evaluated using Function. In both cases, the returning function has the properties constant, literal and inputs as in the previous implementation. These are used in the next phase to perform different optimizations.

The cacheing, one-time binding and $watch optimizations phase remains mostly unchanged.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API

Change the way parse works from the old mechanism to a multiple stages parsing and code generation. The new parse is a four stages parsing * Lexer * AST building * AST processing * Cacheing, one-time binding and `$watch` optimizations The Lexer phase remains unchanged. AST building phase follows Mozilla Parse API [1] and generates an AST that is compatible. The only exception was needed for `filters` as JavaScript does not support filters, in this case, a filter is transformed into a `CallExpression` that has an extra property named `filter` with the value of `true`. The AST processing phase transforms the AST into a function that can be executed to evaluate the expression. The logic for expressions remains unchanged. The AST processing phase works in two different ways depending if csp is enabled or disabled. If csp is enabled, the processing phase returns pre-generated function that interpret specific parts of the AST. When csp is disabled, then the entire expression is compiled into a single function that is later evaluated using `Function`. In both cases, the returning function has the properties `constant`, `literal` and `inputs` as in the previous implementation. These are used in the next phase to perform different optimizations. The cacheing, one-time binding and `$watch` optimizations phase remains mostly unchanged. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API

jbedard · 2014-12-29T06:17:02Z

I think adding the AST step is great and will really help with testing (such as preventing that perf regression I caused not too long ago...). I think going directly from tokens => result caused a lot of issues. Using the Mozilla Parse API is also awesome. Even if nothing else changed I think this part (and all the AST tests) would be great.

The generating of functions instead of closures I'm not too sure about, I'll have to play around with it more. But it seems to be much more complicated (from a quick glance and looking at the LOC in parse.js). This also makes the csp/non-csp cases completely different implementations unlike previously where only getters were different.

What are the details of the performance changes? From running some random largetable/parsed-exp benchmarks it seems many benchmarks are much slower (some 2x) while only a few are faster (most was ~20%). Memory usage, GC time and GC amounts also seem to have gone up.

Can you explain more about the generated functions? What is the advantage over the previous method of generating of closures? What is the "clean" state in those functions?

Why is the factory instead of the result function cached in $parse? This seems to effect memory usage and performance a lot.

The way expression "inputs" are computed has changed when there are duplicate inputs. For example a + a previously only had one a for the inputs where now it has two.

; separated expressions only look at the last expression for input (previously ; separated statements did not declare inputs at all since it seemed rare enough it wasn't worth it). For example foo(); a+b only returns a and b as inputs (in this case it should be nothing due to the function call).

I'll keep playing around with it, but that's enough for tonight!

jbedard · 2014-12-29T06:58:23Z

Another random thing I've thought of using a traversable AST for... instead of the inputs/constant flags being scattered throughout parse.js I keep coming back to the thought of AST nodes declaring only their own properties (stateful, mutator, branching - for example, nothing computed from children) then properties like constant can be calculated using visitor (once before the expression is put into the cache):

function isContant(ast) {
   var c = true;
   traverse(ast, function(e) { c &= !(e.stateful || e.mutator); });
   return c;
}

or determining the inputs of an expression:

function collectInputs(ast) {
   var inputs = [];
   traverse(ast, function(e) {
      if (e.stateful || e.branching && hasMutator(e)) {
         if (inputs.indexOf(e) === -1) inputs.push(e);
         return false/*= don't traverse children*/;
      }
   });
   return inputs;
}

This PR isolates the constant/inputs within a function so it is no longer scattered throughout the file, but a more generic traverse might still be useful? Would it simplify anything? I kind of like not having the node-type switch statement + recursion in those helpers...

lgalfaso · 2014-12-29T19:10:01Z

Hi,
There is a lot to answer, so this will be a long post :)

I think adding the AST step is great and will really help with testing (such as preventing that perf regression I caused not too long ago...). I think going directly from tokens => result caused a lot of issues. Using the Mozilla Parse API is also awesome. Even if nothing else changed I think this part (and all the AST tests) would be great.

I think that it simplifies some parts, but does so at the cost of bytes. Anyhow, it was a conscious call as there are some edge cases that were difficult to solve with the old approach. Eg (foo.bar)(), when calling bar, foo had to be this. The old approach made this hard to solve (even when this was still possible). The new approach solves this for free.

The generating of functions instead of closures I'm not too sure about, I'll have to play around with it more. But it seems to be much more complicated (from a quick glance and looking at the LOC in parse.js). This also makes the csp/non-csp cases completely different implementations unlike previously where only getters were different.

It is more complex, but that is why we have unit tests that check that both implementations behave equally. If they do not, then it is a bug that should be solved.

What are the details of the performance changes? From running some random largetable/parsed-exp benchmarks it seems many benchmarks are much slower (some 2x) while only a few are faster (most was ~20%). Memory usage, GC time and GC amounts also seem to have gone up.

Did a sample with real-world applications, the performance improvements vary a lot depending on the browser (and how aggressive the browser is to JIT code generated with Function. What was measured is the total time to do an digest cycle with no modification. Chrome is not very aggressive and the performance improvements are in the 20-30%. In the case of Firefox +50%. With IE11, the performance improvements were small (about 10-15%), but still significant.
In all cases, the functions that took more time were

the safe* function
inputsWatchDelegate
the digest itself

The reason why the current tests show that it is sometimes slower is that we are using inputsWatchDelegate more (and this function needs some performance help)

Can you explain more about the generated functions? What is the advantage over the previous method of generating of closures? What is the "clean" state in those functions?

The old mechanism for inputs did the following:

Calculate the inputs
If there was a change, evaluate the entire expression

The new approach does the following

Calculate the inputs
If there is a change, continue calculating the entire expression using the current inputs (this is only when csp is disabled). Doing the same with csp enabled is still pending

To keep the inputs in the closure, we need a generating function (this may change in the future to ease the GC).

The variable clean is the one that knows if the inputs in the closure should be used or calculated. Again, this may change in the future.

Why is the factory instead of the result function cached in $parse? This seems to effect memory usage and performance a lot.

Memory usage on a single expression is lower with the new approach, but there is an extra cost when the same expression is active multiple times in the same page. The memory usage is equal when the same expression is active ~5 times. The tests use the same expression 100s of times, this is why you are seeing a higher memory usage. There are some improvements on the drawing board, but given that this is a complex patch, it is important that it is out early for people to review it extensively.

The way expression "inputs" are computed has changed when there are duplicate inputs. For example a + a previously only had one a for the inputs where now it has two.

In most real-world applications, this had close to no effect

; separated expressions only look at the last expression for input (previously ; separated statements did not declare inputs at all since it seemed rare enough it wasn't worth it). For example foo(); a+b only returns a and b as inputs (in this case it should be nothing due to the function call).

This is the way it used to work, the result of the multiple expressions only depends on the last expression and not on the prior ones

I'll keep playing around with it, but that's enough for tonight!

Another random thing I've thought of using a traversable AST for... instead of the inputs/constant flags being scattered throughout parse.js I keep coming back to the thought of AST nodes declaring only their own properties (stateful, mutator, branching - for example, nothing computed from children) then properties like constant can be calculated using visitor (once before the expression is put into the cache):

The first approach used a visitor pattern and the code was a lot bigger and more difficult to follow. Part of the reason was that different node types keep their children using different formats.

Removed second closure for `inputs` and use `inputsWatchDelegate` to know when the computation is partial

lgalfaso · 2014-12-29T20:16:07Z

The concern from @jbedard on GC pressure and using a second closure are now resolved in the follow-up commit

jbedard · 2014-12-29T20:30:32Z

If there is a change, continue calculating the entire expression using the current inputs

I always wanted to do that! But all my attempts at it were too ugly or had too much duplication. I always tried having 2 methods: one to execute the expression with the computed input values and one to execute with the scope/locals (which would normally just delegate to the other after computing the inputs). Clearly I need to look into this more though, because I didn't even notice it...

Why is inputsWatchDelegate being used more though? Isn't it watching the same inputs as before (except the rare a + a case)?

This is the way it used to work, the result of the multiple expressions only depends on the last expression and not on the prior ones

Previously ; separated statements did not declare inputs so no input watching would occur. Now they do declare inputs. How the result is determined has not changed, but the input watching has. For foo(); a+b, even though a+b is the only part that is returned, foo() might modify a or b so the full expression foo(); a+b can not be watched via inputs. (Unless I'm missing something, which is very possible... :)

The first approach used a visitor pattern and the code was a lot bigger and more difficult to follow. Part of the reason was that different node types keep their children using different formats.

I thought a generic traverse would do the opposite such that the different children formats would only have to be done once (in traverse) instead of once in each helper, at least for the two use cases I had above. The Function generation one wouldn't benefit since it will always need the switch statement. I might have to try it myself to prove it is worse :P

The memory usage is equal when the same expression is active ~5 times. The tests use the same expression 100s of times, this is why you are seeing a higher memory usage

There are many situations where $parse is called at link time and this will be a regression (if linked more then 5 times). But you just added a commit that I think fixes this so I'll stop typing for now!

jbedard · 2014-12-30T07:15:28Z

src/ng/parse.js

+      '};' +
+      extra +
+      this.watchFns() +
+      'fn.literal=literal;fn.constant=constant;' +


Is there a reason to do this within the generated function instead of just putting this line (without the quotes) at the end of the function?

Now that the second closure is gone, there is no reason, will change it

jbedard · 2014-12-30T07:27:05Z

Here's an example of using a visitor for the inputs/constants: jbedard@4f75c0d

It's a little more complicated then I was hoping, but it is less code. I think I prefer it over the toWatch but that's debatable. The output should be the same and cached so it should have no performance/memory changes.

There's a few other unrelated things in there that I do think are more worthwhile though (moving the constant/inputs assignment out of the generated function and adding a isLiteral helper).

lgalfaso · 2014-12-30T10:34:44Z

Thanks for jbedard/angular.js@4f75c0d, there are two parts in the patch, the visitor approach and other cleanups. As it is in the patch, the visitor approach makes the code a lot harder to read and follow. Eg

for (var a = ast.arguments || ast.body || ast.elements || ast.properties, i = 0, ii = a.length; i < ii; i++) {
  traverse(a[i].value || a[i].expression || a[i], visitor);
  ...
}

it is trying to be too smart in handling many different cases in two lines, so I think that it is highly debatable if this is better overall.

Most of the cleanups are nice, will look into doing some in the next few days.

Thanks

jbedard · 2014-12-30T11:18:19Z

Yes that specific example is ugly and can probably be nicer, but I still found the other helpers that use traverse easier to read (imo, and it's a pretty minor difference).

Some other random ideas while trying to improve the benchmarks (the first two actually make a difference)...

jbedard@9c9c4a7 (remove the locals argument/check from input methods) There's probably a better way of doing the "stage" thing though...
jbedard@b53ef9b (skip the falsy-scope check for input methods)
jbedard@780158f (switches some if (!a) b else c to if (a) c else b) Not sure if there's any point in this one...

lgalfaso · 2014-12-30T12:07:52Z

Hi, jbedard/angular.js@9c9c4a7 and jbedard/angular.js@b53ef9b are interesting as they makes the assumption that when computing the inputs, there will be a scope and no locals, and I think merge them.
On jbedard/angular.js@780158f, I think we can do better

Cleanup several odd parst of the code

Conflicts: src/ng/parse.js

Remove the use of `expressionFactory` as the second context was removed

googlebot · 2014-12-30T12:28:06Z

We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm.

Delegate the function building to the recursion function

ilanbiala · 2014-12-30T15:52:32Z

@lgalfaso so what are the current perf. improvements? Is this something that is going to be merged into 1.3.x or 1.4.x?

lgalfaso · 2014-12-30T15:58:20Z

@ilanbiala the performance improvements vary a lot depending on the browser (and how aggressive the browser is to JIT code generated with Function. What was measured is the total time to do an digest cycle with no modification. Chrome is not very aggressive and the performance improvements are in the 20-30%. In the case of Firefox +50%. With IE11, the performance improvements were small (about 10-15%), but still significant.

This will be part of 1.4

ilanbiala · 2014-12-30T16:06:12Z

@lgalfaso So no change from before. I just wanted to double check to see if any later commits made a performance change. Sounds good for the timeline by the way!

lgalfaso · 2014-12-30T16:12:01Z

@ilanbiala the later commits made a small change during $digest and a larger change in creation/GC

Implements resumed evaluation of expressions when CSP is enabled

Remove reference to the ast on the generated function when CSP is enabled

jbedard · 2015-01-16T08:29:28Z

Here's an issue and fix with how inputs/valueOf works: jbedard@f9fd993

That commit just modifies a test to reproduce it, but it might be nice to have better tests around that.

jbedard · 2015-01-16T08:36:04Z

Do we want the tests from 8690081 since it was only merged to 1.3?

…of the valueOf

lgalfaso · 2015-01-16T11:51:45Z

@jbedard will cherry pick these tests later today

lgalfaso · 2015-01-16T17:24:34Z

There is something wrong with 8690081 as we should only throw with

scope.$eval("c.a = 1", {c: Function.prototype.constructor}

if expensive checks is enabled

jbedard · 2015-01-16T18:11:38Z

The setter method always does expensive checks though. 8690081 didn't change that.

If you do the same thing with scope.c = Function.prototype.constructor instead of locals it also throws (with or without 8690081)...

lgalfaso · 2015-01-16T18:23:11Z

Then we need more tests and a clear definition of what "expensive checks"
means, let me ping a few people and then I will create the tests

jbedard · 2015-01-16T18:46:32Z

The way I understood it "expensive checks" means each and every object in a getterFn gets checked. Where non-expensive only checks "suspicious looking" parts of a getterFn (".constructor" is the only suspicious thing today). The setter was doing the check on every object before the "expensive" stuff.

lgalfaso · 2015-01-16T20:02:37Z

@jbedard I know how it works, but the fact that it works different for getters and setters is odd (the fact that a['b'] is also different than a.b is also odd, but for other reasons)

Add the expression to the error in two cases that were missing. Added a few tests for this, and more tests that define the behavior of expensive checks and assignments.

lgalfaso · 2015-01-31T19:36:53Z

landed as 0d42426

googlebot added the cla: yes label Dec 28, 2014

lgalfaso changed the title ~~refactor($parse): new and more performant $parse~~ refactor($parse): new and faster $parse Dec 28, 2014

lgalfaso force-pushed the parse-ast branch from 9261ae2 to cb54250 Compare December 29, 2014 20:12

refactor($parse): remove second closure

3850637

Removed second closure for `inputs` and use `inputsWatchDelegate` to know when the computation is partial

lgalfaso force-pushed the parse-ast branch from cb54250 to 3850637 Compare December 29, 2014 20:13

jbedard reviewed Dec 30, 2014
View reviewed changes

lgalfaso and others added 4 commits December 30, 2014 13:08

chore($parse): cleanup code

72a9363

Cleanup several odd parst of the code

refactor($parse): remove support for locals in input watchers

c439a11

Conflicts: src/ng/parse.js

refactor($parse): remove scope null check on input watchers

5859d56

chore($parse): remove unused variable

7d9cf9d

Remove the use of `expressionFactory` as the second context was removed

googlebot added cla: no and removed cla: yes labels Dec 30, 2014

chore($parse): delegate the function bundling to the recursion function

f53c13d

Delegate the function building to the recursion function

lgalfaso added 2 commits December 31, 2014 12:02

refactor($parse): implement resumed evaluations when csp is enabled

aba10b0

Implements resumed evaluation of expressions when CSP is enabled

perf($parse): remove reference to the ast when csp is enabled

c7e0443

Remove reference to the ast on the generated function when CSP is enabled

petebacondarwin assigned IgorMinar Jan 15, 2015

petebacondarwin added branch: 1.4.x cla: yes component: $parse frequency: moderate type: refactor type: perf labels Jan 15, 2015

fix($parse): passing original input values to parse function instead …

e598aa1

…of the valueOf

googlebot removed the cla: yes label Jan 16, 2015

pkozlowski-opensource modified the milestones: 1.4.0-beta.1 / 1.3.10, 1.4.0-beta.2 / 1.3.11 Jan 20, 2015

fix($parse): add the expression to the error in a few sandbox checks

22e9584

Add the expression to the error in two cases that were missing. Added a few tests for this, and more tests that define the behavior of expensive checks and assignments.

caitp modified the milestones: 1.4.0-beta.2 / 1.3.11, 1.4.0-beta.3 / 1.3.12 Jan 27, 2015

lgalfaso mentioned this pull request Jan 30, 2015

fix($parse): make equality and relational operators left-associative #6971

Closed

lgalfaso closed this Jan 31, 2015

lgalfaso mentioned this pull request Jun 15, 2015

$eval assignment expression and locals #4664

Closed

refactor($parse): new and faster $parse #10592

refactor($parse): new and faster $parse #10592

Uh oh!

Conversation

lgalfaso commented Dec 28, 2014

Uh oh!

jbedard commented Dec 29, 2014

Uh oh!

jbedard commented Dec 29, 2014

Uh oh!

lgalfaso commented Dec 29, 2014

Uh oh!

lgalfaso commented Dec 29, 2014

Uh oh!

jbedard commented Dec 29, 2014

Uh oh!

jbedard Dec 30, 2014

Choose a reason for hiding this comment

Uh oh!

lgalfaso Dec 30, 2014

Choose a reason for hiding this comment

Uh oh!

jbedard commented Dec 30, 2014

Uh oh!

lgalfaso commented Dec 30, 2014

Uh oh!

jbedard commented Dec 30, 2014

Uh oh!

lgalfaso commented Dec 30, 2014

Uh oh!

googlebot commented Dec 30, 2014

Uh oh!

ilanbiala commented Dec 30, 2014

Uh oh!

lgalfaso commented Dec 30, 2014

Uh oh!

ilanbiala commented Dec 30, 2014

Uh oh!

lgalfaso commented Dec 30, 2014

Uh oh!

jbedard commented Jan 16, 2015

Uh oh!

jbedard commented Jan 16, 2015

Uh oh!

lgalfaso commented Jan 16, 2015

Uh oh!

lgalfaso commented Jan 16, 2015

Uh oh!

jbedard commented Jan 16, 2015

Uh oh!

lgalfaso commented Jan 16, 2015

Uh oh!

jbedard commented Jan 16, 2015

Uh oh!

lgalfaso commented Jan 16, 2015

Uh oh!

lgalfaso commented Jan 31, 2015

Uh oh!

Uh oh!