-
Notifications
You must be signed in to change notification settings - Fork 27.4k
refactor($parse): new and faster $parse #10592
Conversation
Change the way parse works from the old mechanism to a multiple stages parsing and code generation. The new parse is a four stages parsing * Lexer * AST building * AST processing * Cacheing, one-time binding and `$watch` optimizations The Lexer phase remains unchanged. AST building phase follows Mozilla Parse API [1] and generates an AST that is compatible. The only exception was needed for `filters` as JavaScript does not support filters, in this case, a filter is transformed into a `CallExpression` that has an extra property named `filter` with the value of `true`. The AST processing phase transforms the AST into a function that can be executed to evaluate the expression. The logic for expressions remains unchanged. The AST processing phase works in two different ways depending if csp is enabled or disabled. If csp is enabled, the processing phase returns pre-generated function that interpret specific parts of the AST. When csp is disabled, then the entire expression is compiled into a single function that is later evaluated using `Function`. In both cases, the returning function has the properties `constant`, `literal` and `inputs` as in the previous implementation. These are used in the next phase to perform different optimizations. The cacheing, one-time binding and `$watch` optimizations phase remains mostly unchanged. [1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API
I think adding the AST step is great and will really help with testing (such as preventing that perf regression I caused not too long ago...). I think going directly from tokens => result caused a lot of issues. Using the Mozilla Parse API is also awesome. Even if nothing else changed I think this part (and all the AST tests) would be great. The generating of functions instead of closures I'm not too sure about, I'll have to play around with it more. But it seems to be much more complicated (from a quick glance and looking at the LOC in parse.js). This also makes the csp/non-csp cases completely different implementations unlike previously where only getters were different. What are the details of the performance changes? From running some random largetable/parsed-exp benchmarks it seems many benchmarks are much slower (some 2x) while only a few are faster (most was ~20%). Memory usage, GC time and GC amounts also seem to have gone up. Can you explain more about the generated functions? What is the advantage over the previous method of generating of closures? What is the "clean" state in those functions? Why is the factory instead of the result function cached in The way expression "inputs" are computed has changed when there are duplicate inputs. For example
I'll keep playing around with it, but that's enough for tonight! |
Another random thing I've thought of using a traversable AST for... instead of the inputs/constant flags being scattered throughout parse.js I keep coming back to the thought of AST nodes declaring only their own properties ( function isContant(ast) {
var c = true;
traverse(ast, function(e) { c &= !(e.stateful || e.mutator); });
return c;
} or determining the function collectInputs(ast) {
var inputs = [];
traverse(ast, function(e) {
if (e.stateful || e.branching && hasMutator(e)) {
if (inputs.indexOf(e) === -1) inputs.push(e);
return false/*= don't traverse children*/;
}
});
return inputs;
} This PR isolates the constant/inputs within a function so it is no longer scattered throughout the file, but a more generic |
Hi,
I think that it simplifies some parts, but does so at the cost of bytes. Anyhow, it was a conscious call as there are some edge cases that were difficult to solve with the old approach. Eg
It is more complex, but that is why we have unit tests that check that both implementations behave equally. If they do not, then it is a bug that should be solved.
Did a sample with real-world applications, the performance improvements vary a lot depending on the browser (and how aggressive the browser is to JIT code generated with
The reason why the current tests show that it is sometimes slower is that we are using
The old mechanism for
The new approach does the following
To keep the The variable
Memory usage on a single expression is lower with the new approach, but there is an extra cost when the same expression is active multiple times in the same page. The memory usage is equal when the same expression is active ~5 times. The tests use the same expression 100s of times, this is why you are seeing a higher memory usage. There are some improvements on the drawing board, but given that this is a complex patch, it is important that it is out early for people to review it extensively.
In most real-world applications, this had close to no effect
This is the way it used to work, the result of the multiple expressions only depends on the last expression and not on the prior ones
The first approach used a visitor pattern and the code was a lot bigger and more difficult to follow. Part of the reason was that different node types keep their children using different formats. |
Removed second closure for `inputs` and use `inputsWatchDelegate` to know when the computation is partial
The concern from @jbedard on GC pressure and using a second closure are now resolved in the follow-up commit |
I always wanted to do that! But all my attempts at it were too ugly or had too much duplication. I always tried having 2 methods: one to execute the expression with the computed input values and one to execute with the scope/locals (which would normally just delegate to the other after computing the inputs). Clearly I need to look into this more though, because I didn't even notice it... Why is
Previously
I thought a generic traverse would do the opposite such that the different children formats would only have to be done once (in
There are many situations where |
'};' + | ||
extra + | ||
this.watchFns() + | ||
'fn.literal=literal;fn.constant=constant;' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to do this within the generated function instead of just putting this line (without the quotes) at the end of the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the second closure is gone, there is no reason, will change it
Here's an example of using a visitor for the inputs/constants: jbedard@4f75c0d It's a little more complicated then I was hoping, but it is less code. I think I prefer it over the There's a few other unrelated things in there that I do think are more worthwhile though (moving the constant/inputs assignment out of the generated function and adding a |
Thanks for jbedard/angular.js@4f75c0d, there are two parts in the patch, the visitor approach and other cleanups. As it is in the patch, the visitor approach makes the code a lot harder to read and follow. Eg for (var a = ast.arguments || ast.body || ast.elements || ast.properties, i = 0, ii = a.length; i < ii; i++) {
traverse(a[i].value || a[i].expression || a[i], visitor);
...
} it is trying to be too smart in handling many different cases in two lines, so I think that it is highly debatable if this is better overall. Most of the cleanups are nice, will look into doing some in the next few days. Thanks |
Yes that specific example is ugly and can probably be nicer, but I still found the other helpers that use Some other random ideas while trying to improve the benchmarks (the first two actually make a difference)...
|
Hi, jbedard/angular.js@9c9c4a7 and jbedard/angular.js@b53ef9b are interesting as they makes the assumption that when computing the inputs, there will be a scope and no locals, and I think merge them. |
Cleanup several odd parst of the code
Conflicts: src/ng/parse.js
Remove the use of `expressionFactory` as the second context was removed
We found a Contributor License Agreement for you (the sender of this pull request) and all commit authors, but as best as we can tell these commits were authored by someone else. If that's the case, please add them to this pull request and have them confirm that they're okay with these commits being contributed to Google. If we're mistaken and you did author these commits, just reply here to confirm. |
Delegate the function building to the recursion function
@lgalfaso so what are the current perf. improvements? Is this something that is going to be merged into 1.3.x or 1.4.x? |
@ilanbiala the performance improvements vary a lot depending on the browser (and how aggressive the browser is to JIT code generated with This will be part of 1.4 |
@lgalfaso So no change from before. I just wanted to double check to see if any later commits made a performance change. Sounds good for the timeline by the way! |
@ilanbiala the later commits made a small change during |
Implements resumed evaluation of expressions when CSP is enabled
Remove reference to the ast on the generated function when CSP is enabled
Here's an issue and fix with how inputs/valueOf works: jbedard@f9fd993 That commit just modifies a test to reproduce it, but it might be nice to have better tests around that. |
Do we want the tests from 8690081 since it was only merged to 1.3? |
@jbedard will cherry pick these tests later today |
There is something wrong with 8690081 as we should only throw with scope.$eval("c.a = 1", {c: Function.prototype.constructor} if expensive checks is enabled |
Then we need more tests and a clear definition of what "expensive checks" |
The way I understood it "expensive checks" means each and every object in a getterFn gets checked. Where non-expensive only checks "suspicious looking" parts of a getterFn (".constructor" is the only suspicious thing today). The |
@jbedard I know how it works, but the fact that it works different for getters and setters is odd (the fact that |
Add the expression to the error in two cases that were missing. Added a few tests for this, and more tests that define the behavior of expensive checks and assignments.
landed as 0d42426 |
Change the way parse works from the old mechanism to a multiple stages
parsing and code generation. The new parse is a four stages parsing
$watch
optimizationsThe Lexer phase remains unchanged.
AST building phase follows Mozilla Parse API [1] and generates an AST that is compatible. The only exception was needed for
filters
as JavaScript does not support filters, in this case, a filter is transformed into aCallExpression
that has an extra property namedfilter
with the value oftrue
. This phase is heavily based on the previous implementation of$parse
.The AST processing phase transforms the AST into a function that can be executed to evaluate the expression. The logic for expressions remains unchanged. The AST processing phase works in two different ways depending if csp is enabled or disabled. If csp is enabled, the processing phase returns pre-generated function that interpret specific parts of the AST.
When csp is disabled, then the entire expression is compiled into a single function that is later evaluated using
Function
. In both cases, the returning function has the propertiesconstant
,literal
andinputs
as in the previous implementation. These are used in the next phase to perform different optimizations.The cacheing, one-time binding and
$watch
optimizations phase remains mostly unchanged.[1] https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API