Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine tune method inlining of static calls in AOT #32167

Closed
alexmarkov opened this issue Feb 14, 2018 · 8 comments
Closed

Fine tune method inlining of static calls in AOT #32167

alexmarkov opened this issue Feb 14, 2018 · 8 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@alexmarkov
Copy link
Contributor

A lot of instance calls are converted to static calls by CHA-based devirtualization / global type flow analysis. However, they are rarely inlined.

For example, consider the following code (extracted from velocity_tracker Flutter microbenchmark):

import 'dart:typed_data';

class _Vector {
  _Vector(int size) : _offset = 0, _length = size, _elements = new Float64List(size);

  _Vector.fromVOL(List<double> values, int offset, int length)
      : _offset = offset, _length = length, _elements = values;

  final int _offset;

  final int _length;

  final List<double> _elements;

  double operator [](int i) => _elements[i + _offset];
  void operator []=(int i, double value) {
    _elements[i + _offset] = value;
  }

  double operator *(_Vector a) {
    double result = 0.0;
    for (int i = 0; i < _length; i += 1)
      result += this[i] * a[i];
    return result;
  }
}

_Vector v = new _Vector(10);
double x = 0.0;

main(List<String> args) {

  Stopwatch timer = new Stopwatch()..start();

  for (int i = 0; i < 100000000; i++) {
    x = x + v * v;
  }

  timer.stop();
  print ("Elapsed ${timer.elapsedMilliseconds}ms, result $x");
}

In this code, _Vector.[] is not inlined into _Vector.*.

--trace-inlining shows the following:

Inlining calls in _Vector.*
  Depth 1 ----------
  Polymorphic Instance Calls (0)
  Static Calls (2)
  => _Vector.[] (deopt count 0)
     Bailout: cold 0.000000
  => _Vector.[] (deopt count 0)
     Bailout: cold 0.000000

The heuristic which bails out is based on StaticCallInfo.ratio which is calculated using call counters (which don't make much sense in AOT).

Also, this is very different from instance calls: InstanceCallInfo.ratio is calculated but it is not taken into account, and there is no similar heuristic for instance calls.

To summarize, the inlining heuristics should be fine tuned for static calls to allow inlining of small methods.

/cc @mraleph

@kmillikin kmillikin added area-front-end Use area-front-end for front end / CFE / kernel format related issues. front-end-kernel and removed area-front-end Use area-front-end for front end / CFE / kernel format related issues. area-kernel labels Sep 19, 2018
@alexmarkov alexmarkov added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. and removed area-front-end Use area-front-end for front end / CFE / kernel format related issues. front-end-kernel labels Oct 23, 2018
@aartbik
Copy link
Contributor

aartbik commented Oct 24, 2018

Adding feedback from Alex here so we remember.

"The problem is that if we have compiled a function already, we have recorded number of instructions after optimizations. If we haven't compiled a function being inlined, then inliner builds graph and does a very limited set of optimizations before calculating number of instructions. This makes inliner heuristics weaker if we haven't compiled function yet. Maybe we can try compiling a function before inlining it to calculate inline weights for always-inline heuristic.
To see the difference, it is enough to comment out PrecompileConstructors call in precompiler.cc and run it on Golem. There are several benchmarks which regress considerably because constructors are not inlined, as always-inline heuristic doesn't work well on not-fully-optimized functions."

dart-bot pushed a commit that referenced this issue Oct 24, 2018
Rationale:
Without proper execution counters, the inline AOT inliner
marks every call site "cold", effectively disabling inlining
altogether. This change introduces loop-based static heuristic
that assumes statements nested inside loops are executed more
frequently. This results in more inlining.

Note:
Conservative version is used for now which yields
more performance without increasing code size too much.
There is still a lot of performance left at the table
which we could exploit if we fine tune heuristics
regarding code size.

Bug:
#34473
#32167


Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea
Reviewed-on: https://dart-review.googlesource.com/c/81187
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
dart-bot pushed a commit that referenced this issue Oct 24, 2018
This reverts commit daae20d.

Reason for revert: 

The kernel-precomp bots are seeing this error:

../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function

Not sure what that is yet, but reverting to get bots green again while I investigate.


Original change's description:
> [vm/compiler] Use loop framework for AOT inline heuristics
> 
> Rationale:
> Without proper execution counters, the inline AOT inliner
> marks every call site "cold", effectively disabling inlining
> altogether. This change introduces loop-based static heuristic
> that assumes statements nested inside loops are executed more
> frequently. This results in more inlining.
> 
> Note:
> Conservative version is used for now which yields
> more performance without increasing code size too much.
> There is still a lot of performance left at the table
> which we could exploit if we fine tune heuristics
> regarding code size.
> 
> Bug:
> #34473
> #32167
> 
> 
> Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea
> Reviewed-on: https://dart-review.googlesource.com/c/81187
> Commit-Queue: Aart Bik <ajcbik@google.com>
> Reviewed-by: Alexander Markov <alexmarkov@google.com>

TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/81335
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik
Copy link
Contributor

aartbik commented Oct 24, 2018

The new heuristic exposed an issue with more inlining.
The kernel-precomp bots were seeing this error:

../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function

This needs to be addressed before we can tune the heuristic properly.

@aartbik
Copy link
Contributor

aartbik commented Oct 25, 2018

I did a bisection search on this failure, and found the offending inlining:

Static StaticCall:12( _Link@17069316.fromRawPath<0> v150, v148) has ratio 0.000000

 => new Link.fromRawPath (deopt count 0)
     Success
       with reason AlwaysInline, code size 5, call sites: 1

Which is interesting, since it is not enabled by the new heuristic (ratio is 0.0). So another inlining must somehow interact badly with this one. Looking further.

@mraleph

@aartbik
Copy link
Contributor

aartbik commented Oct 25, 2018

With a bisection search on just the inlining based on (the new) nonzero ratio, we arrive at the same culprit, but the cause-and-effect is more clear. We set the ratio of the following call to 1.0:

Static v75 <- StaticCall:134( Link.fromRawPath<0> v0, v73 T{Uint8List?}) T{_Link} has ratio 1.000000

which in turn allows the inlining of this call site in into

dart:io__AsyncDirectoryLister@17069316_next_<anonymous closure>

This in turn allows for inlining of the

StaticCall:12( _Link@17069316.fromRawPath<0> v150, v148)

call I already found above.

Callee graph for inlining dart:io_Link_Link.fromRawPath (unoptimized)
==== dart:io_Link_Link.fromRawPath

--> doms(#1)=B49, preds= <--
B0[graph]:0 {
      v142 <- Constant(#null)
      v143 <- Constant(#<optimized out>)
}

--> dom=B0 doms(#0)= preds=B0, <--
B49[function entry]:2 {
      v144 <- Parameter(0)
      v145 <- Parameter(1)
      v146 <- Constant(#null)
      v148 <- Parameter(1) T{Uint8List?}
}
    DebugStepCheck:10()
    v150 <- AllocateObject(_Link)
    PushArgument(v150)
    PushArgument(v148)
    StaticCall:12( _Link@17069316.fromRawPath<0> v150, v148)
    DebugStepCheck:14()
    Return:16(v150)

@aartbik
Copy link
Contributor

aartbik commented Oct 25, 2018

In this method, we have three call sites that are inlined in a very similar way.
The original source code is:

    for (int i = 0; i < result.length; i++) {
          assert(i % 2 == 0);
          switch (result[i++]) {
            case listFile:
              controller.add(new File.fromRawPath(result[i]));
              break;
            case listDirectory:
              controller.add(new Directory.fromRawPath(result[i]));
              break;
            case listLink:
              controller.add(new Link.fromRawPath(result[i]));
              break;

For example, the File call site

   PushArgument(v95 T{Uint8List?})
   v97 <- StaticCall:100( File.fromRawPath<0> v0, v95 T{Uint8List?}) T{_File}
   PushArgument(v97)

becomes

    v170 <- AllocateObject(_File) T{_File}
    PushArgument(v170)
    PushArgument(v95 T{Uint8List?})
    StaticCall:12( _File@17069316.fromRawPath<0> v170, v95 T{Uint8List?})
    PushArgument(v170 T{_File})

But in the offending inlining of Link, the call site

    PushArgument(v73 T{Uint8List?})
    v75 <- StaticCall:134( Link.fromRawPath<0> v0, v73 T{Uint8List?}) T{_Link}
    PushArgument(v75)

becomes (note the _Link@17069316 instead of _File@17069316):

    v150 <- AllocateObject(_Link) T{_Link}
    PushArgument(v150)
    PushArgument(v73 T{Uint8List?})
    StaticCall:12( _Link@17069316.fromRawPath<0> v150, v73 T{Uint8List?})
    PushArgument(v150 T{_Link})

The only difference I noticed is that File and Directory have a @pragma("vm:entry-point") on the fromRawPath method, and Link does not. Adding this pragma to Link actually makes the test pass.

  @pragma("vm:entry-point")
  factory Link.fromRawPath(Uint8List rawPath) {
    // TODO(bkonyi): handle overrides
    return new _Link.fromRawPath(rawPath);
  }

@mraleph does that make sense?

@mraleph
Copy link
Member

mraleph commented Oct 25, 2018

@aartbik I think it makes sense if the test is testing directory listing or something like that.

If we inline all invocations of fromRawPath then we would tree shake the factory away. But in reality there are external invocations of this factory from other places.

@sjindel-google actually discovered a lot of missing entry-point annotations in his obfuscation fixing CL (see https://dart-review.googlesource.com/c/sdk/+/81009/17/sdk/lib/io/link.dart ). And he also looked into strengthening checks around entry-points.

@aartbik
Copy link
Contributor

aartbik commented Oct 25, 2018

Cool. I verified that all failures with the bots yesterday go away with this one line change. So I will send out a separate CL for that, since it makes File/Directory/Link equivalent again, and unblocks my inlining heuristic CL. I assume @sjindel-google will follow up with other cases.

dart-bot pushed a commit that referenced this issue Oct 25, 2018
Rationale:
This makes File/Directory/Link's fromRawPath
equivalent again, and unblocks the pending
loop-based inlining heuristic.

Reverted CL because of this bug:
https://dart-review.googlesource.com/c/sdk/+/81335

#34473
#32167

Change-Id: I769a8365ea32d6f6830d0dd8ef97fc6c2467fb90
Reviewed-on: https://dart-review.googlesource.com/c/81460
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Oct 25, 2018
This reverts commit 0170b8d.

Reason for revert: 
The underlying problem has been fixed.


Original change's description:
> Revert "[vm/compiler] Use loop framework for AOT inline heuristics"
> 
> This reverts commit daae20d.
> 
> Reason for revert: 
> 
> The kernel-precomp bots are seeing this error:
> 
> ../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function
> 
> Not sure what that is yet, but reverting to get bots green again while I investigate.
> 
> 
> Original change's description:
> > [vm/compiler] Use loop framework for AOT inline heuristics
> > 
> > Rationale:
> > Without proper execution counters, the inline AOT inliner
> > marks every call site "cold", effectively disabling inlining
> > altogether. This change introduces loop-based static heuristic
> > that assumes statements nested inside loops are executed more
> > frequently. This results in more inlining.
> > 
> > Note:
> > Conservative version is used for now which yields
> > more performance without increasing code size too much.
> > There is still a lot of performance left at the table
> > which we could exploit if we fine tune heuristics
> > regarding code size.
> > 
> > Bug:
> > #34473
> > #32167
> > 
> > 
> > Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea
> > Reviewed-on: https://dart-review.googlesource.com/c/81187
> > Commit-Queue: Aart Bik <ajcbik@google.com>
> > Reviewed-by: Alexander Markov <alexmarkov@google.com>
> 
> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com
> 
> Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b
> No-Presubmit: true
> No-Tree-Checks: true
> No-Try: true
> Reviewed-on: https://dart-review.googlesource.com/c/81335
> Reviewed-by: Aart Bik <ajcbik@google.com>
> Commit-Queue: Aart Bik <ajcbik@google.com>

TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: I8c160f59f8e91782f235cbcea607aef4227963f4
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/81500
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Nov 8, 2018
Rationale:
Last experiment left some performance at the table
(but for an unacceptable 200% code size increase).
This CL regains that performance for a much better
trade-off of just 8%. Performance gains vary from
10% to 100%, with e.g. 27% for DeltaBlue, 30% for
MD5 and 100% for SHA256). This is obtained by
always inlining byteswaps (to avoid calling into
the runtime when getInts remain) and a bit more
specific tuning of loop based heuristics. This
also fixes a bug in a list lookup.

#34473
#34969
#32167

Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb
Reviewed-on: https://dart-review.googlesource.com/c/82942
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Nov 8, 2018
This reverts commit 74d2c6a.

Reason for revert: 

../../runtime/vm/compiler/backend/il.cc: 778: error: unreachable code

Original change's description:
> [vm/compiler] Tune AOT inline heuristics
> 
> Rationale:
> Last experiment left some performance at the table
> (but for an unacceptable 200% code size increase).
> This CL regains that performance for a much better
> trade-off of just 8%. Performance gains vary from
> 10% to 100%, with e.g. 27% for DeltaBlue, 30% for
> MD5 and 100% for SHA256). This is obtained by
> always inlining byteswaps (to avoid calling into
> the runtime when getInts remain) and a bit more
> specific tuning of loop based heuristics. This
> also fixes a bug in a list lookup.
> 
> #34473
> #34969
> #32167
> 
> Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb
> Reviewed-on: https://dart-review.googlesource.com/c/82942
> Reviewed-by: Ryan Macnak <rmacnak@google.com>
> Commit-Queue: Aart Bik <ajcbik@google.com>

TBR=vegorov@google.com,kustermann@google.com,rmacnak@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: Id0f30f4458a3548af2e573a737e441ca11ae3b11
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/83643
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Nov 8, 2018
Rationale:
Although not semantically disastrous, list inclusion
was not properly implemented, which yielded some very
hard to debug recompilations (or lack thereof) due
to speculation.

#34473
#34969
#32167

Change-Id: I54613127bd3ab93820d4a467c75e41f3df16ee7a
Reviewed-on: https://dart-review.googlesource.com/c/83883
Reviewed-by: Siva Annamalai <asiva@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Nov 8, 2018
Rationale:
Always inlining byteswaps avoids calling into
the runtime when getInts remain.

#34473
#34969
#32167

Change-Id: I95c0348fa960e08f973a12a2c5304785e76289b3
Reviewed-on: https://dart-review.googlesource.com/c/83884
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Jan 15, 2019
Rationale:
Yields substantial improvements on various benchmarks
(1.8x on HMAC stand-alone, around 5x on TypedData settters and getters),
with only moderate increase in code size (3.2% on Flutter gallery).

#34473
#32167

Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4
Reviewed-on: https://dart-review.googlesource.com/c/89162
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik
Copy link
Contributor

aartbik commented Jan 15, 2019

Closing this issue, since we went from never inlining static calls to using a reasonably effective heuristics for inlining static calls under AOT (although fine-tuning these is expected to continue of course).

@aartbik aartbik closed this as completed Jan 15, 2019
dart-bot pushed a commit that referenced this issue Jan 15, 2019
This reverts commit 2908e61.

Reason for revert: regress_32322_test crashes

Original change's description:
> [vm/compiler] AOT inline heuristics improvements
> 
> Rationale:
> Yields substantial improvements on various benchmarks
> (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters),
> with only moderate increase in code size (3.2% on Flutter gallery).
> 
> #34473
> #32167
> 
> Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4
> Reviewed-on: https://dart-review.googlesource.com/c/89162
> Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
> Reviewed-by: Alexander Markov <alexmarkov@google.com>
> Commit-Queue: Aart Bik <ajcbik@google.com>

TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com

Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/89740
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Jan 17, 2019
Rationale:
Previously, we avoided introducing redefinitions that
introduced the empty non-nullable null type. This situation
arises when we do a null check on an actual null value
(making all subsequent uses effectively dead code). This
is too simple, however, since it still allows hoisting the
uses before the check. This CL gives a better solution
by introducing redefinitions without a constraining type
(which are not removed and avoid the type). In the long
run perhaps the best solution would be to simply remove
all subsequent uses as dead.

#32167
#34473
#35335

Change-Id: Ib5dd072a9e546f6b91faa52ea08e8c0f6350d7e0
Reviewed-on: https://dart-review.googlesource.com/c/89922
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Jan 18, 2019
This reverts commit 43a96d4.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> Revert "[vm/compiler] AOT inline heuristics improvements"
> 
> This reverts commit 2908e61.
> 
> Reason for revert: regress_32322_test crashes
> 
> Original change's description:
> > [vm/compiler] AOT inline heuristics improvements
> > 
> > Rationale:
> > Yields substantial improvements on various benchmarks
> > (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters),
> > with only moderate increase in code size (3.2% on Flutter gallery).
> > 
> > #34473
> > #32167
> > 
> > Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4
> > Reviewed-on: https://dart-review.googlesource.com/c/89162
> > Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
> > Reviewed-by: Alexander Markov <alexmarkov@google.com>
> > Commit-Queue: Aart Bik <ajcbik@google.com>
> 
> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com
> 
> Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288
> No-Presubmit: true
> No-Tree-Checks: true
> No-Try: true
> Reviewed-on: https://dart-review.googlesource.com/c/89740
> Reviewed-by: Aart Bik <ajcbik@google.com>
> Commit-Queue: Aart Bik <ajcbik@google.com>

TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Change-Id: Iace9857654b63af2fbcd2808d19802fb60305973
Reviewed-on: https://dart-review.googlesource.com/c/90141
Reviewed-by: Aart Bik <ajcbik@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

5 participants