-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Parser.isValidMethodTypeArguments is slow #31536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
adding "area-front-end" since the parser we use is the same as the FE's |
This code is a direct consequence of adding generic method arguments that are ambiguous without being willing to make reasonable compromises on how to resolve the ambiguity. This is what happens when you have a committee of people designing the language not being willing to listen to feedback from people with actual experience. This is what happens when the language team "helps" out by making naive/below-par implementations despite objections from people with actual experience. Fortunately, the language team has recently changed course and is now willing to listen. This method would be much less problematic if it was:
I suggested this originally, but the language team insisting on parsing all possible type arguments twice. |
Well, the code in Of course, the code was updated several times since then, e.g., to make functions like
only became relevant for Next, if it is indeed a serious part of this problem that some functions should be inlined but aren't, it should be (1) an issue on compilers, and (2) possible to perform the inlining manually in this particular piece of code, with an unfolding of as many levels as needed for best performance. The remaining issue here is whether the parser should rely on a "bracket only" check to make the decision, as in Peter's implementation: bool isValidMethodTypeArguments(Token token) {
if (optional("<", token)) {
BeginToken open = token;
Token close = open.endGroup;
return close == null ? false /* 2 */: optional("(", close.next);
} else {
return false; // 1
}
} or it should check for the required syntactic forms in a more complete manner. It is not a trivial trade-off: Obviously, the "bracket only" implementation is more concise than the existing implementation, and, trusting Peter's very careful approach to these things, surely it's also faster. However, the two implementations are very similar in the cases where the "bracket only" implementation returns false: They do the same things (e.g., with no recursive function calls). I'm sure Peter's approach brings down the cost of doing this (which is a constant cost in both cases), but nothing prevents us from optimizing the existing implementation by replacing, say Concretely, corresponding to if (endToken == null || !identical(endToken.next.kind, OPEN_PAREN_TOKEN)) {
return null;
} So Peter's function obviously arrives at the same But the actual difference arises in the case where In this case the "bracket only" implementation immediately returns true. The existing implementation proceeds to check (now that we know that the overall bracket structure fits the expectations for a type argument list) that we are actually looking at a type argument list. So the existing implementation will return false in some situations where the "bracket only" has already returned true, for a good reason. The difference shows up in cases like the following: int a = 1, b = 2, c = 3;
foo(bool b1, bool b2) {}
main() => foo(a < b, 2 > (c)); where the "bracket only" approach will decide that the argument list must be parsed as one generic function invocation, and it will then encounter a parse error at
With the current implementation the invocation will parse just fine, as an invocation where two boolean arguments are passed to So we could of course decide that "people should just stop writing programs like that", and then reject the programs where something "looks like" a generic function invocation based on brackets only, but isn't. Then we tell developers to put more parentheses into their programs until they can be parsed. I don't think it's obviously a good idea to reduce the set of valid programs just because the parser can be faster, but I do obviously think that the parser should be as fast as it can be, and hence all the improvements that we can find should be applied to functions like So is there a middle ground here, somewhere? Here's a tiny experiment which would be interesting (it simply puts the "brackets only" function in front of the existing implementation, which might make sure that the existing implementation is only called very rarely). If that helps then it might make sense to clean up that approach, and get both the improved performance in the typical case, and the full check in the rare situation where it's needed.
|
@eernstg "So is there a middle ground here, somewhere?" I opened the issue because I saw something was slow. That can be fixed in a number of ways. if (!identical(token.kind, LT_TOKEN)) return false; Any of these modifications would effectively remove ``isValidMethodTypeArguments` from the profile. |
Ah, sorry, I didn't see that. Those local functions used to be instance methods, and I guess they might have been changed to local functions because a local function never needs OO dispatch, but with private methods it might also be possible to avoid the dispatch (because the compiler can know that they are never overridden). Otherwise, I think we'd need to fold the code a bit (because some conditions are statically known to be always-true or always-false when we start running Here is an attempt to clean up the code and use a style which is closer to Peter's. I've added you as a reviewer, Stephen. |
PS: I've dropped the above-mentioned CL in order to avoid interfering with ongoing work. |
Noting that the |
About 3% of parse time on a 57MB input is spent in
isValidMethodTypeArguments
.This is a function that 99.x% of the time checks that the next token is not '<' and returns false.
I discovered this looking at the cpu_profile in Observatory.
A lot of new-gen GCs happen when allocating _Closures.
In parsing, a third of GCs triggered by _Closures come from this function.
About 1% of total parse time is GC originating in this function.
There is a lot here that makes me sad:
The language failed us.
The VM implementation failed us:
dart2js is pretty terrible too.
Duration.toString.sixDigits
for an example of needless closure creation.)Suggestion
I suggest in the short term that these local functions are moved to be private methods on the class.
The text was updated successfully, but these errors were encountered: