-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
package:crypto Hmac code is around 5-7x slower in AOT #34473
Comments
Runtimes on my desktop
The profiles below strongly support mraleph's observations above. JIT hot method profile:
AOT hot method profile:
|
I wrote some internal microbenchmarks (we already have a few related, but none with Hmac).
|
With some quick and dirty prototype around inlining and speculation, the performance of AOT improves as follows. Still not JIT, but in some cases already about 2x better than where it was....
|
Bit more prototyping brings this down more, over 3x. Of course we will need to balance the heuristics on what work wells overall.
|
Rationale: (1) reset loop_info of block entries, since recomputing on a modified graph may encounter blocks that are no longer headers (2) replaced O(n) linear scan over loop headers just to test membership of the current block with an O(1) test (3) added some comments #34473 Change-Id: I9e24a76bac9cf0225d8f1c996adfa6d82c175504 Reviewed-on: https://dart-review.googlesource.com/c/78703 Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Started a more structured implementation of detecting loops and the loop hierarchy in the Dart flow graph. This new framework can replace some of the more ad-hoc approach taken now, as already started in this CL, and also forms a foundation for more optimizations later. #34473 Change-Id: I75130cb6863324910e3d068e4f2628c25fdf880c Reviewed-on: https://dart-review.googlesource.com/c/79147 Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Rather than relying on a separate loop detector, rely on the new loop framework, which avoids code duplication and ensures any improvement in loop detection/handling will benefit this phase too. Note, most of the time, the same loops are discovered with a few exceptions (which is okay, since this is "just" heuristic usage). This CL also simplifies loop detection a bit. #34473 Change-Id: I1a1b19b99a698c74822473d2a1fe370287c1ade4 Reviewed-on: https://dart-review.googlesource.com/c/80523 Commit-Queue: Aart Bik <ajcbik@google.com> Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Reviewed-by: Alexander Markov <alexmarkov@google.com>
A loop-based heuristic gives an immediate 2.5x speedup, without resorting to the code size hungry "inline always" heuristic (although we still need to double check that code size increase is acceptable). JIT --> 2119 Some preliminary experiments on our internal benchmark also look extremely promising! |
Rationale: Without proper execution counters, the inline AOT inliner marks every call site "cold", effectively disabling inlining altogether. This change introduces loop-based static heuristic that assumes statements nested inside loops are executed more frequently. This results in more inlining. Note: Conservative version is used for now which yields more performance without increasing code size too much. There is still a lot of performance left at the table which we could exploit if we fine tune heuristics regarding code size. Bug: #34473 #32167 Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea Reviewed-on: https://dart-review.googlesource.com/c/81187 Commit-Queue: Aart Bik <ajcbik@google.com> Reviewed-by: Alexander Markov <alexmarkov@google.com>
This reverts commit daae20d. Reason for revert: The kernel-precomp bots are seeing this error: ../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function Not sure what that is yet, but reverting to get bots green again while I investigate. Original change's description: > [vm/compiler] Use loop framework for AOT inline heuristics > > Rationale: > Without proper execution counters, the inline AOT inliner > marks every call site "cold", effectively disabling inlining > altogether. This change introduces loop-based static heuristic > that assumes statements nested inside loops are executed more > frequently. This results in more inlining. > > Note: > Conservative version is used for now which yields > more performance without increasing code size too much. > There is still a lot of performance left at the table > which we could exploit if we fine tune heuristics > regarding code size. > > Bug: > #34473 > #32167 > > > Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea > Reviewed-on: https://dart-review.googlesource.com/c/81187 > Commit-Queue: Aart Bik <ajcbik@google.com> > Reviewed-by: Alexander Markov <alexmarkov@google.com> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/81335 Reviewed-by: Aart Bik <ajcbik@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: This makes File/Directory/Link's fromRawPath equivalent again, and unblocks the pending loop-based inlining heuristic. Reverted CL because of this bug: https://dart-review.googlesource.com/c/sdk/+/81335 #34473 #32167 Change-Id: I769a8365ea32d6f6830d0dd8ef97fc6c2467fb90 Reviewed-on: https://dart-review.googlesource.com/c/81460 Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
This reverts commit 0170b8d. Reason for revert: The underlying problem has been fixed. Original change's description: > Revert "[vm/compiler] Use loop framework for AOT inline heuristics" > > This reverts commit daae20d. > > Reason for revert: > > The kernel-precomp bots are seeing this error: > > ../../runtime/vm/object.h: 3134: error: Handle check failed: saw 2249186640 expected Function > > Not sure what that is yet, but reverting to get bots green again while I investigate. > > > Original change's description: > > [vm/compiler] Use loop framework for AOT inline heuristics > > > > Rationale: > > Without proper execution counters, the inline AOT inliner > > marks every call site "cold", effectively disabling inlining > > altogether. This change introduces loop-based static heuristic > > that assumes statements nested inside loops are executed more > > frequently. This results in more inlining. > > > > Note: > > Conservative version is used for now which yields > > more performance without increasing code size too much. > > There is still a lot of performance left at the table > > which we could exploit if we fine tune heuristics > > regarding code size. > > > > Bug: > > #34473 > > #32167 > > > > > > Change-Id: I86ba60f93bdab363cd22ab6bdbcf6688f2042fea > > Reviewed-on: https://dart-review.googlesource.com/c/81187 > > Commit-Queue: Aart Bik <ajcbik@google.com> > > Reviewed-by: Alexander Markov <alexmarkov@google.com> > > TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com > > Change-Id: If5ca82966966ebef4ec0b4e921515d23f6bd492b > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Reviewed-on: https://dart-review.googlesource.com/c/81335 > Reviewed-by: Aart Bik <ajcbik@google.com> > Commit-Queue: Aart Bik <ajcbik@google.com> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com Change-Id: I8c160f59f8e91782f235cbcea607aef4227963f4 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/81500 Reviewed-by: Aart Bik <ajcbik@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Last experiment left some performance at the table (but for an unacceptable 200% code size increase). This CL regains that performance for a much better trade-off of just 8%. Performance gains vary from 10% to 100%, with e.g. 27% for DeltaBlue, 30% for MD5 and 100% for SHA256). This is obtained by always inlining byteswaps (to avoid calling into the runtime when getInts remain) and a bit more specific tuning of loop based heuristics. This also fixes a bug in a list lookup. #34473 #34969 #32167 Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb Reviewed-on: https://dart-review.googlesource.com/c/82942 Reviewed-by: Ryan Macnak <rmacnak@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
This reverts commit 74d2c6a. Reason for revert: ../../runtime/vm/compiler/backend/il.cc: 778: error: unreachable code Original change's description: > [vm/compiler] Tune AOT inline heuristics > > Rationale: > Last experiment left some performance at the table > (but for an unacceptable 200% code size increase). > This CL regains that performance for a much better > trade-off of just 8%. Performance gains vary from > 10% to 100%, with e.g. 27% for DeltaBlue, 30% for > MD5 and 100% for SHA256). This is obtained by > always inlining byteswaps (to avoid calling into > the runtime when getInts remain) and a bit more > specific tuning of loop based heuristics. This > also fixes a bug in a list lookup. > > #34473 > #34969 > #32167 > > Change-Id: Id1a64c3930c503546ae2d7f31ca3e597741022bb > Reviewed-on: https://dart-review.googlesource.com/c/82942 > Reviewed-by: Ryan Macnak <rmacnak@google.com> > Commit-Queue: Aart Bik <ajcbik@google.com> TBR=vegorov@google.com,kustermann@google.com,rmacnak@google.com,alexmarkov@google.com,ajcbik@google.com Change-Id: Id0f30f4458a3548af2e573a737e441ca11ae3b11 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/83643 Reviewed-by: Aart Bik <ajcbik@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Although not semantically disastrous, list inclusion was not properly implemented, which yielded some very hard to debug recompilations (or lack thereof) due to speculation. #34473 #34969 #32167 Change-Id: I54613127bd3ab93820d4a467c75e41f3df16ee7a Reviewed-on: https://dart-review.googlesource.com/c/83883 Reviewed-by: Siva Annamalai <asiva@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Always inlining byteswaps avoids calling into the runtime when getInts remain. #34473 #34969 #32167 Change-Id: I95c0348fa960e08f973a12a2c5304785e76289b3 Reviewed-on: https://dart-review.googlesource.com/c/83884 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik you have made significant improvements to the Hmac code, are we still considerably slower? Can the issue be closed? |
@a-siva let's keep this open; there is still much more we can do (better inlining without codesize increase for one, but even after that better instruction selection); this bug is a nice way to track progress in that area. |
This is the current status, with some plans on how to go forward: JIT |
Pending CL yields the following results (more than an other 1.8x), which only 3.4% codesize increase on e.g. FlutterGallery. Many other benchmarks speedup substantially as well. JIT |
Rationale: Yields substantial improvements on various benchmarks (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), with only moderate increase in code size (3.2% on Flutter gallery). #34473 #32167 Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 Reviewed-on: https://dart-review.googlesource.com/c/89162 Reviewed-by: Vyacheslav Egorov <vegorov@google.com> Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Closing this issue since the huge gap has been bridged (fine tuning the arithmetic operations is expected to continue of course). |
This reverts commit 2908e61. Reason for revert: regress_32322_test crashes Original change's description: > [vm/compiler] AOT inline heuristics improvements > > Rationale: > Yields substantial improvements on various benchmarks > (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), > with only moderate increase in code size (3.2% on Flutter gallery). > > #34473 > #32167 > > Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 > Reviewed-on: https://dart-review.googlesource.com/c/89162 > Reviewed-by: Vyacheslav Egorov <vegorov@google.com> > Reviewed-by: Alexander Markov <alexmarkov@google.com> > Commit-Queue: Aart Bik <ajcbik@google.com> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://dart-review.googlesource.com/c/89740 Reviewed-by: Aart Bik <ajcbik@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
Rationale: Previously, we avoided introducing redefinitions that introduced the empty non-nullable null type. This situation arises when we do a null check on an actual null value (making all subsequent uses effectively dead code). This is too simple, however, since it still allows hoisting the uses before the check. This CL gives a better solution by introducing redefinitions without a constraining type (which are not removed and avoid the type). In the long run perhaps the best solution would be to simply remove all subsequent uses as dead. #32167 #34473 #35335 Change-Id: Ib5dd072a9e546f6b91faa52ea08e8c0f6350d7e0 Reviewed-on: https://dart-review.googlesource.com/c/89922 Reviewed-by: Alexander Markov <alexmarkov@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
This reverts commit 43a96d4. Reason for revert: <INSERT REASONING HERE> Original change's description: > Revert "[vm/compiler] AOT inline heuristics improvements" > > This reverts commit 2908e61. > > Reason for revert: regress_32322_test crashes > > Original change's description: > > [vm/compiler] AOT inline heuristics improvements > > > > Rationale: > > Yields substantial improvements on various benchmarks > > (1.8x on HMAC stand-alone, around 5x on TypedData settters and getters), > > with only moderate increase in code size (3.2% on Flutter gallery). > > > > #34473 > > #32167 > > > > Change-Id: I0909efd7afc72229524ff8edb7322ce025a14af4 > > Reviewed-on: https://dart-review.googlesource.com/c/89162 > > Reviewed-by: Vyacheslav Egorov <vegorov@google.com> > > Reviewed-by: Alexander Markov <alexmarkov@google.com> > > Commit-Queue: Aart Bik <ajcbik@google.com> > > TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com > > Change-Id: I9c7dadb18935ad32f4d4cd72872838e8ac9cc288 > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Reviewed-on: https://dart-review.googlesource.com/c/89740 > Reviewed-by: Aart Bik <ajcbik@google.com> > Commit-Queue: Aart Bik <ajcbik@google.com> TBR=vegorov@google.com,alexmarkov@google.com,ajcbik@google.com # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: Iace9857654b63af2fbcd2808d19802fb60305973 Reviewed-on: https://dart-review.googlesource.com/c/90141 Reviewed-by: Aart Bik <ajcbik@google.com> Commit-Queue: Aart Bik <ajcbik@google.com>
I looked at the code and I think there are two main reasons for this:
_rotr32(int n, int x) => (x >> n) | ((x << (32 - n)) & mask32);
is not inlined) - our AOT compiler is much less eager to inline static methods (basically almost never inlines them for the code size reasons)._rotr32
produces this graph:The code below provided by @MariaMelnik
Similar number when running on my Nexus 5X:
/cc @aartbik might be a good example for the "better integer math" project.
The text was updated successfully, but these errors were encountered: