Parser.isValidMethodTypeArguments is slow #31536

rakudrama · 2017-12-05T01:32:39Z

About 3% of parse time on a 57MB input is spent in isValidMethodTypeArguments.
This is a function that 99.x% of the time checks that the next token is not '<' and returns false.

I discovered this looking at the cpu_profile in Observatory.
A lot of new-gen GCs happen when allocating _Closures.
In parsing, a third of GCs triggered by _Closures come from this function.
About 1% of total parse time is GC originating in this function.

There is a lot here that makes me sad:

The language failed us.

It is too hard to write mutually recursive local functions.
The unnatural code we have to write instead is harder for compilers to understand too.
This has been an open issue in the bug tracker for five years.

The VM implementation failed us:

The profile data hints that four closures are allocated per call.
Ideally, there would be no allocations - the four closures do not escape and are called directly.
I would expect tryParseMethodTypeArguments to be inlined - it has just one call site, with no recursion. The 99.x% taken path would then return immediately.

dart2js is pretty terrible too.

Due it the JIT-like legacy of compiling functions one-at-a-time and weak inlining ability, it too cannot do anything sensible. (See Duration.toString.sixDigits for an example of needless closure creation.)

Suggestion

I suggest in the short term that these local functions are moved to be private methods on the class.

The text was updated successfully, but these errors were encountered:

sigmundch · 2017-12-05T03:36:24Z

adding "area-front-end" since the parser we use is the same as the FE's

peter-ahe-google · 2017-12-05T08:15:19Z

This code is a direct consequence of adding generic method arguments that are ambiguous without being willing to make reasonable compromises on how to resolve the ambiguity. This is what happens when you have a committee of people designing the language not being willing to listen to feedback from people with actual experience. This is what happens when the language team "helps" out by making naive/below-par implementations despite objections from people with actual experience.

Fortunately, the language team has recently changed course and is now willing to listen.

This method would be much less problematic if it was:

bool  isValidMethodTypeArguments(Token token) {
  if (optional("<", token)) {
    BeginToken open = token;
    Token close = open.endGroup;
    return close == null ? false : optional("(", close.next);
  } else {
    return false;
  }
}

I suggested this originally, but the language team insisting on parsing all possible type arguments twice.

eernstg · 2017-12-05T12:37:53Z

Well, the code in isValidMethodTypeArguments is the direct consequence of me pushing some keys on my fancy keyboard. ;-)

Of course, the code was updated several times since then, e.g., to make functions like tryParseMethodTypeArguments and tryParseQualified local, and using a local variable Token Function(Token token) tryParseType rather than a local function (because it can't otherwise have the required mutual recursion with the other functions). So problems like

[it is] too hard to write mutually recursive local functions .. [and the work-around runs too slowly]

only became relevant for isValidMethodTypeArguments after the last time I touched the code, which of course doesn't change the fact that mutually recursive local functions aren't supported directly. But that feature is requested here, and the discussions about having or not having that feature is a separate issue.

Next, if it is indeed a serious part of this problem that some functions should be inlined but aren't, it should be (1) an issue on compilers, and (2) possible to perform the inlining manually in this particular piece of code, with an unfolding of as many levels as needed for best performance.

The remaining issue here is whether the parser should rely on a "bracket only" check to make the decision, as in Peter's implementation:

bool isValidMethodTypeArguments(Token token) {
  if (optional("<", token)) {
    BeginToken open = token;
    Token close = open.endGroup;
    return close == null ? false /* 2 */: optional("(", close.next);
  } else {
    return false; // 1
  }
}

or it should check for the required syntactic forms in a more complete manner. It is not a trivial trade-off:

Obviously, the "bracket only" implementation is more concise than the existing implementation, and, trusting Peter's very careful approach to these things, surely it's also faster.

However, the two implementations are very similar in the cases where the "bracket only" implementation returns false: They do the same things (e.g., with no recursive function calls). I'm sure Peter's approach brings down the cost of doing this (which is a constant cost in both cases), but nothing prevents us from optimizing the existing implementation by replacing, say !identical(endToken.next.kind, OPEN_PAREN_TOKEN) by !optional("(", close.next).

Concretely, corresponding to // 1, the existing implementation calls a local function whose first statement is if (!identical(token.kind, LT_TOKEN)) return null;; corresponding to /* 2 */ it runs the following:

if (endToken == null || !identical(endToken.next.kind, OPEN_PAREN_TOKEN)) {
  return null;
}

So Peter's function obviously arrives at the same false result more directly, and the existing implementation ought to be optimized similarly. However, in all these cases they are both running in constant time, ignoring the contents of the type argument list.

But the actual difference arises in the case where optional("(", close.next) evaluates to true.

In this case the "bracket only" implementation immediately returns true. The existing implementation proceeds to check (now that we know that the overall bracket structure fits the expectations for a type argument list) that we are actually looking at a type argument list. So the existing implementation will return false in some situations where the "bracket only" has already returned true, for a good reason.

The difference shows up in cases like the following:

int a = 1, b = 2, c = 3;
foo(bool b1, bool b2) {}
main() => foo(a < b, 2 > (c));

where the "bracket only" approach will decide that the argument list must be parsed as one generic function invocation, and it will then encounter a parse error at 2, which can't be a type argument.

DART_CONFIGURATION=ReleaseX64 sdk/bin/dart2js_developer --generate-code-with-compile-time-errors --test-mode --allow-mock-compilation --categories=all --packages=/usr/local/google/home/eernst/devel/dart/work/sdk/.packages /usr/local/google/home/eernst/devel/dart/work/sdk/tests/language/scratch_test.dart --out=/usr/local/google/home/eernst/devel/dart/work/sdk/out/ReleaseX64/generated_compilations/dart2js/tests_language_scratch_test/out.js

stdout:
tests/language/scratch_test.dart:10:14:
Error: Expected a type, but got '2'.
  foo(a < b, 2 > (c));
             ^
Error: Compilation failed.

With the current implementation the invocation will parse just fine, as an invocation where two boolean arguments are passed to foo.

So we could of course decide that "people should just stop writing programs like that", and then reject the programs where something "looks like" a generic function invocation based on brackets only, but isn't. Then we tell developers to put more parentheses into their programs until they can be parsed.

I don't think it's obviously a good idea to reduce the set of valid programs just because the parser can be faster, but I do obviously think that the parser should be as fast as it can be, and hence all the improvements that we can find should be applied to functions like isValidMethodTypeArguments.

So is there a middle ground here, somewhere?

Here's a tiny experiment which would be interesting (it simply puts the "brackets only" function in front of the existing implementation, which might make sure that the existing implementation is only called very rarely). If that helps then it might make sense to clean up that approach, and get both the improved performance in the typical case, and the full check in the rare situation where it's needed.

diff --git a/pkg/front_end/lib/src/fasta/parser/parser.dart b/pkg/front_end/lib/src/fasta/parser/parser.dart
index 2f766c8180..ef4a95490b 100644
--- a/pkg/front_end/lib/src/fasta/parser/parser.dart
+++ b/pkg/front_end/lib/src/fasta/parser/parser.dart
@@ -1171,13 +1171,13 @@ class Parser {
   /// construct `typeArguments`, but it is required here such that type
   /// arguments in generic method invocations can be recognized, and as few as
   /// possible other constructs will pass (e.g., 'a < C, D > 3').
-  bool isValidMethodTypeArguments(Token token) {
+  bool oldIsValidMethodTypeArguments(Token token) {
     Token Function(Token token) tryParseType;

     /// Returns token after match if [token] matches '<' type (',' type)* '>'
     /// '(', and otherwise returns null. Does not produce listener events. With
     /// respect to the final '(', please see the description of
-    /// [isValidMethodTypeArguments].
+    /// [oldIsValidMethodTypeArguments].
     Token tryParseMethodTypeArguments(Token token) {
       if (!identical(token.kind, LT_TOKEN)) return null;
       Token endToken = closeBraceTokenFor(token);
@@ -1244,6 +1244,18 @@ class Parser {
     return tryParseMethodTypeArguments(token) != null;
   }

+  bool isValidMethodTypeArguments(Token token) {
+    if (optional("<", token)) {
+      BeginToken open = token;
+      Token close = open.endGroup;
+      return close == null ? false : optional("(", close.next)
+          ? oldIsValidMethodTypeArguments(token)
+          : false;
+    } else {
+      return false;
+    }
+  }
+
   /// ```
   /// qualified:
   ///   identifier qualifiedRest*

rakudrama · 2017-12-05T22:16:14Z

@eernstg "So is there a middle ground here, somewhere?"

I opened the issue because I saw something was slow. That can be fixed in a number of ways.
I suggested moving these local functions to be private methods. Your suggested fix is fine too.
Another option would be an early exit at the top of isValidMethodTypeArguments:

if (!identical(token.kind, LT_TOKEN)) return false;

Any of these modifications would effectively remove ``isValidMethodTypeArguments` from the profile.

eernstg · 2017-12-06T15:25:32Z

I suggested moving these local functions to be private methods.

Ah, sorry, I didn't see that. Those local functions used to be instance methods, and I guess they might have been changed to local functions because a local function never needs OO dispatch, but with private methods it might also be possible to avoid the dispatch (because the compiler can know that they are never overridden).

Otherwise, I think we'd need to fold the code a bit (because some conditions are statically known to be always-true or always-false when we start running oldIsValidMethodTypeArguments).

Here is an attempt to clean up the code and use a style which is closer to Peter's. I've added you as a reviewer, Stephen.

eernstg · 2017-12-06T16:11:17Z

PS: I've dropped the above-mentioned CL in order to avoid interfering with ongoing work.

eernstg · 2017-12-07T09:00:19Z

Noting that the tryParseType function hasn't been updated to handle Function types, I submitted #31572.

rakudrama added web-dart2js area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Dec 5, 2017

sigmundch added the legacy-area-front-end Legacy: Use area-dart-model instead. label Dec 5, 2017

jensjoha added the P3 A lower priority bug or feature request label Jan 5, 2018

sigmundch added customer-dart2js customer-vm and removed web-dart2js area-language Dart language related items (some items might be better tracked at github.com/dart-lang/language). area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Jul 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser.isValidMethodTypeArguments is slow #31536

Parser.isValidMethodTypeArguments is slow #31536

rakudrama commented Dec 5, 2017

sigmundch commented Dec 5, 2017 •

edited

Loading

peter-ahe-google commented Dec 5, 2017

eernstg commented Dec 5, 2017 •

edited

Loading

rakudrama commented Dec 5, 2017

eernstg commented Dec 6, 2017

eernstg commented Dec 6, 2017

eernstg commented Dec 7, 2017

Parser.isValidMethodTypeArguments is slow #31536

Parser.isValidMethodTypeArguments is slow #31536

Comments

rakudrama commented Dec 5, 2017

The language failed us.

The VM implementation failed us:

dart2js is pretty terrible too.

Suggestion

sigmundch commented Dec 5, 2017 • edited Loading

peter-ahe-google commented Dec 5, 2017

eernstg commented Dec 5, 2017 • edited Loading

rakudrama commented Dec 5, 2017

eernstg commented Dec 6, 2017

eernstg commented Dec 6, 2017

eernstg commented Dec 7, 2017

sigmundch commented Dec 5, 2017 •

edited

Loading

eernstg commented Dec 5, 2017 •

edited

Loading