Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with TypedData._getX/_setX operations #33205

Closed
2 tasks done
mraleph opened this issue May 23, 2018 · 4 comments
Closed
2 tasks done

Performance issues with TypedData._getX/_setX operations #33205

mraleph opened this issue May 23, 2018 · 4 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@mraleph
Copy link
Member

mraleph commented May 23, 2018

These functions only have a full native implementation in runtime and inliner only supports inlining them if receiver type is known.

The main reason for only inlining them if receiver type is known is because they are polymorphic with respect to receiver type:

  • Their implementations do bounds checking even though bounds should already be checked by callers. The bounds checking depends on lengthInBytes, which makes these operations statically polymorphic - as lengthInBytes is not stores in TypedData and instead needs to be computed based on the receiver class.
  • Implementation is also polymorphic with respect to external vs non-external TypedData.

Inability to inline these methods is most visible in AOT code that works with typed data views:
where we end up always calling runtime:

import 'dart:typed_data';

const K = 100;
const N = 1000000;

void doIt(Uint8List list) {
  for (int i = 0; i < list.length; i++) {
    list[i] = 0;
  }
}

void main(List<String> args) {
  if (args.contains('--use-view')) {
    for (var i = 0; i < K; i++)
      doIt(new Uint8List.view(new Uint8List(N).buffer));
  } else {
    for (var i = 0; i < K; i++)
      doIt(new Uint8List(N));
  }
}
$ pkg/vm/tool/precompiler2 /tmp/b.dart /tmp/b.aot 
$ time pkg/vm/tool/dart_precompiled_runtime2 /tmp/b.aot --use-view
pkg/vm/tool/dart_precompiled_runtime2 /tmp/b.aot --use-view  10.63s user 0.04s system 100% cpu 10.606 total
$ time pkg/vm/tool/dart_precompiled_runtime2 /tmp/b.aot           
pkg/vm/tool/dart_precompiled_runtime2 /tmp/b.aot  0.94s user 0.06s system 105% cpu 0.955 total

The list of things we could improve:

  • These methods should not check bounds.
  • We should provide graph intrinsic and an inline graph fro these methods that can handle both internal and external typed_data similar to how runtime method handles it.

Longer term we should investigate if we could normalize representation of views in such a way that view into external typed data is not different from the view into non-external typed_data.

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label May 23, 2018
@aartbik aartbik self-assigned this May 30, 2018
dart-bot pushed a commit that referenced this issue May 31, 2018
Rationale:
Some native typed setters/getters in the typed_data library
are only called from clients that perform explicit bounds checks
first. Under strongly typed Dart2, there is no need to repeat
these tests. Avoids overhead, and reduces the polymorphicness
of these calls (preparing more inlining later).

#33205

Tests:
./tools/test.py --mode release --arch x64 -r vm
tools/test.py -m release -r vm -c dartk --strong standalone_2/typed_data_test
Change-Id: I81771a4f4b41a21e344f2d50745bbc30480b2a6c
Reviewed-on: https://dart-review.googlesource.com/57460
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Jun 5, 2018
Rationale:
Always inline int convertors that don't do
much more than testing and anding.
For example, force inline
  v26 <- StaticCall:66( _toUint8@6027147<0> v4 T{Type: class 'int'?}) T{Type: class 'int'?}
to
  v47 <- Constant(#255)
  ..
  CheckSmi:10(v4 T{Type: class 'int'?})
  v45 <- BinarySmiOp:10(&, v4 T{_Smi}, v47 T{_Smi}) T{_Smi}

#33205

Change-Id: I595d9a64365e16ae244480b5e27f8be23c43d164
Reviewed-on: https://dart-review.googlesource.com/58061
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
dart-bot pushed a commit that referenced this issue Jun 12, 2018
Rationale:
Handles remaining polymorphic reason for typed_data
setters and getters (internal vs. external) during inlining.
Also introduces high level flow graph utilities that
can be reused throughout the compiler to reduce
future code duplication. Disables type speculation
for 64-bit AOT Dart2 to make all work.

Performance:
About 4x speedup on micro benchmarks (AOT64).

#33205

Change-Id: I678426719e49cd8aa1e5051523da12178120b3ba
Reviewed-on: https://dart-review.googlesource.com/59000
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
@aartbik
Copy link
Contributor

aartbik commented Jun 12, 2018

Timings of view cases on AOT 64-bit have substantially improved. On my desktop:

user 0m0.980s
user 0m1.596s --use-view

Still a few i's to dot before we can close this.

@mraleph
Copy link
Member Author

mraleph commented Jun 12, 2018

We should check if ARM32 saw any improvement at all and if not - see what is missing to get better code (e.g. be better with choosing non-speculative path on 32-bit platforms?).

dart-bot pushed a commit that referenced this issue Jun 13, 2018
Rationale:
Rather than forced inlining of clamped convertors
(the "saturated" method _toClampedUint8() from
dart:typed_data), exposing the fact that it always
returns smi values (since they check fail on null inputs)
yields much better Uint8ClampedListView performance
(it avoids re-compilation due to a speculative CheckSmi).

Note:
In the long run, we may still want them inlined
and improve range analysis to deal with clamping.

Performance:
About 2.8x faster than previous optimized version,
about 4.5x faster than original.

#33205

Change-Id: I86a06525d2f2ea0476effd3c3d856ff8d9ab1d87
Reviewed-on: https://dart-review.googlesource.com/60201
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@aartbik
Copy link
Contributor

aartbik commented Jun 14, 2018

The i's to dot:
(1) intrinsify/inline set indexed for 64-bit
(2) avoid retry-compilation for "view" on 64-bit int and f32/f64
(3) finalize story on other platforms, 32-bit

dart-bot pushed a commit that referenced this issue Jun 15, 2018
Rationale:
With limited integers, signed/unsigned 64-bit int typed
data is just a bit pattern, no need to sign/zero extend
these into bigger ints on load. This enables more
intrinsification of indexed stores/loads.

Note:
Still TBD, inline these indexed operations too.

Performance:
About 10x improvement on micro benchmarks.

#33205


Change-Id: I640c324a7d91e57fb4edc025e0dd456ad34fe906
Reviewed-on: https://dart-review.googlesource.com/60403
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
dart-bot pushed a commit that referenced this issue Jul 2, 2018
Rationale:
This CL finalizes recent improvements by allowing
inlining 64-bit and double getter/setter operations.
The runtime over all data types (with and without
view) now no longer has outliers.

Note:
64-bit targets only, 32-bit targets still tbd.

Performance:
About 4x improvement on micro benchmarks.

#33205

Change-Id: Ic82fa24167a68e3c196edf4843f0829c7fbcf9e1
Reviewed-on: https://dart-review.googlesource.com/60451
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
@aartbik
Copy link
Contributor

aartbik commented Jul 2, 2018

Internal benchmark shows "flat" behavior now for all typed data setters and getters on 64-bit AOT. Last item to look at is doing a bit more on 32-bit as well (also AOT vs JIT differences are tracked elsewhere).

dart-bot pushed a commit that referenced this issue Jul 6, 2018
Rationale:
All recent improvements, now for 32-bit too.

Performance:
Many large improvements on micro benchmarks.
Meteor down as expected.


#33205

Change-Id: Ie9ebcfdfe9c5e265595c95d5e943ae35c5700a97
Reviewed-on: https://dart-review.googlesource.com/63685
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
@aartbik aartbik closed this as completed Jul 6, 2018
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
With limited integers, signed/unsigned 64-bit int typed
data is just a bit pattern, no need to sign/zero extend
these into bigger ints on load. This enables more
intrinsification of indexed stores/loads.

Note:
Still TBD, inline these indexed operations too.

Performance:
About 10x improvement on micro benchmarks.

dart-lang#33205


Change-Id: I640c324a7d91e57fb4edc025e0dd456ad34fe906
Reviewed-on: https://dart-review.googlesource.com/60403
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Aart Bik <ajcbik@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Rationale:
All recent improvements, now for 32-bit too.

Performance:
Many large improvements on micro benchmarks.
Meteor down as expected.


dart-lang#33205

Change-Id: Ie9ebcfdfe9c5e265595c95d5e943ae35c5700a97
Reviewed-on: https://dart-review.googlesource.com/63685
Commit-Queue: Aart Bik <ajcbik@google.com>
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

2 participants