Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MemoryCopy IL instruction and take advantage of it in various places #42072

Open
3 of 5 tasks
mkustermann opened this issue May 27, 2020 · 2 comments
Open
3 of 5 tasks
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@mkustermann
Copy link
Member

mkustermann commented May 27, 2020

We should add a generic MemoryCopy IL instruction, capable of copying ranges of bytes where source and destination can be _OneByteString / _TwoByteString as well as various typed data as long as their element sizes agree.

This can be used for

  • Faster UTF-8 decoding
  • Faster UTF-8 encoding
  • Faster List.setRange() implementation for typed data
  • Faster String.fromCharCodes() implementation for Uint8List input
  • Faster String.codeUnits implementation
@mkustermann mkustermann added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label May 27, 2020
dart-bot pushed a commit that referenced this issue Jun 10, 2020
Used for copying the bytes from the Uint8List to the _OneByteString in
String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.

Issue #42072
Closes #41703

Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
Commit-Queue: Aske Simon Christensen <askesc@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
dart-bot pushed a commit that referenced this issue Jun 10, 2020
…d strings."

This reverts commit 6ecd8a1.

Reason for revert: Breaks ABI

Original change's description:
> [vm] MemoryCopy instruction for copying between typed data and strings.
> 
> Used for copying the bytes from the Uint8List to the _OneByteString in
> String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.
> 
> Issue #42072
> Closes #41703
> 
> Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
> Commit-Queue: Aske Simon Christensen <askesc@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TBR=kustermann@google.com,askesc@google.com

Change-Id: I5fc0b58da80dca23c91b57ec2833492e9d0885a8
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/150802
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
dart-bot pushed a commit that referenced this issue Jun 11, 2020
…d strings."

This is a reland of 6ecd8a1

Original change's description:
> [vm] MemoryCopy instruction for copying between typed data and strings.
> 
> Used for copying the bytes from the Uint8List to the _OneByteString in
> String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.
> 
> Issue #42072
> Closes #41703
> 
> Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
> Commit-Queue: Aske Simon Christensen <askesc@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

Change-Id: Ia231c521e5f2db168cfc6094dfc7322327dedc6d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/150925
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Aske Simon Christensen <askesc@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
Used for copying the bytes from the Uint8List to the _OneByteString in
String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.

Issue dart-lang#42072
Closes dart-lang#41703

Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
Commit-Queue: Aske Simon Christensen <askesc@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
…d strings."

This reverts commit 6ecd8a1.

Reason for revert: Breaks ABI

Original change's description:
> [vm] MemoryCopy instruction for copying between typed data and strings.
> 
> Used for copying the bytes from the Uint8List to the _OneByteString in
> String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.
> 
> Issue dart-lang#42072
> Closes dart-lang#41703
> 
> Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
> Commit-Queue: Aske Simon Christensen <askesc@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TBR=kustermann@google.com,askesc@google.com

Change-Id: I5fc0b58da80dca23c91b57ec2833492e9d0885a8
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/150802
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Ryan Macnak <rmacnak@google.com>
tekknolagi pushed a commit to tekknolagi/dart-assembler that referenced this issue Nov 3, 2020
…d strings."

This is a reland of 6ecd8a1

Original change's description:
> [vm] MemoryCopy instruction for copying between typed data and strings.
> 
> Used for copying the bytes from the Uint8List to the _OneByteString in
> String.fromCharCodes and the pure-ASCII case of UTF-8 decoding.
> 
> Issue dart-lang#42072
> Closes dart-lang#41703
> 
> Change-Id: I1ae300222877d1c6e64e32c2f40b8fb187a584c0
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/149500
> Commit-Queue: Aske Simon Christensen <askesc@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

Change-Id: Ia231c521e5f2db168cfc6094dfc7322327dedc6d
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/150925
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Aske Simon Christensen <askesc@google.com>
@mkustermann
Copy link
Member Author

@sstrickl is going to look into making TypedData.setRange() faster.

copybara-service bot pushed a commit that referenced this issue Aug 9, 2023
The MemoryCopy benchmark suite measures the overhead of copying
data between compatible TypedData or Pointer values.

Change-Id: Iaf5ea27b7f9177f4800880da36234afd2b908db2
Bug: #42072
Bug: b/294114694
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/318661
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
@sstrickl
Copy link
Contributor

sstrickl commented Sep 1, 2023

Copying comments from Alex's last review of https://dart-review.googlesource.com/c/sdk/+/319521 here so I don't forget to address them later in followup work:

  • "Can we do something like CCallInstr to call memmove directly, without going to Dart VM runtime?"
  • "We can consider detecting actual cut-off size [of when to switch from using the MemoryCopy instruction to memmove] at run time, depending on the cache sizes."

Also should look at when time allows:

  • Generating static call to _checkSetRangeArguments in IL faster than inlining it into setRange prior to calling _fastSetRange or _slowSetRange and passing the calculated count to those methods (either in addition to or instead of end, depending on if they need end).
  • Performance regressions on MemoryCopy.{512,4096}.setRange.*.Uint8 on ARM7/ARM8.

copybara-service bot pushed a commit that referenced this issue Sep 4, 2023
When setRange is called on a TypedData receiver and the source is also
a TypedData object with the same element size and clamping is not
required, the VM implementation now calls _boundsCheckAndMemcpyN for
element size N. The generated IL for these methods performs the copy
using the MemoryCopy instruction (mostly, see the note below).

Since the two TypedData objects might have the same underlying
buffer, the CL adds a can_overlap flag to the MemoryCopy instruction
which checks for overlapping regions. If can_overlap is set, then
the copy is performed backwards instead of forwards when needed
to ensure that elements of the source region are read before
they are overwritten.

The existing uses of the MemoryCopy instruction are adjusted as
follows:
* The IL generated for copyRangeFromUint8ListToOneByteString
  passes false for can_overlap, as all uses currently ensure that
  the OneByteString is non-external and thus cannot overlap.
* The IL generated for _memCopy, used by the FFI library, passes
  true for can_overlap, as there is no guarantee that the regions
  pointed at by the Pointer objects do not overlap.

The MemoryCopy instruction has also been adjusted so that all numeric
inputs (the two start offsets and the length) are either boxed or
unboxed instead of just the length. This exposed an issue
in the inliner, where unboxed constants in the callee graph were
replaced with boxed constants when inlining into the caller graph,
since withList calls setRange with constant starting offsets of 0.
Now the representation of constants in the callee graph are preserved
when inlining the callee graph into the caller graph.

Fixes #51237 by using TMP
and TMP2 for the LDP/STP calls in the 16-byte element size case, so no
temporaries need to be allocated for the instruction.

On ARM when not unrolling the memory copy loop, uses TMP and a single
additional temporary for LDM/STM calls in the 8-byte and 16-byte
element cases, with the latter just using two LDM/STM calls within
the loop, a different approach than the one described in
#51229 .

Note: Once the number of elements being copied reaches a certain
threshold (1048576 on X86, 256 otherwise), _boundsCheckAndMemcpyN
instead calls _nativeSetRange, which is a native call that uses memmove
from the standard C library for non-clamped inputs. It does this
because the code currently emitted for MemoryCopy performs poorly
compared to the more optimized memmove implementation when copying
larger regions of memory.

Notable benchmark changes for dart-aot:
* X64
  * TypedDataDuplicate.*.fromList improvement from ~13%-~250%
  * Uf8Encode.*.10 improvement from ~50%-~75%
  * MapCopy.Map.*.of.Map.* improvement from ~13%-~65%
  * MemoryCopy.*.setRange.* improvement from ~13%-~500%
* ARM7
  * Uf8Encode.*.10 improvement from ~35%-~70%
  * MapCopy.Map.*.of.Map.* improvement from ~6%-~75%
  * MemoryCopy.*.setRange.{8,64} improvement from ~22%-~500%
    * Improvement of ~100%-~200% for MemoryCopy.512.setRange.*.Double
    * Regression of ~40% for MemoryCopy.512.setRange.*.Uint8
    * Regression of ~85% for MemoryCopy.4096.setRange.*.Uint8
* ARM8
  * Uf8Encode.*.10 improvement from ~35%-~70%
  * MapCopy.Map.*.of.Map.* improvement from ~7%-~75%
  * MemoryCopy.*.setRange.{8,64} improvement from ~22%-~500%
    * Improvement of ~75%-~160% for MemoryCopy.512.setRange.*.Double
    * Regression of ~40% for MemoryCopy.512.setRange.*.Uint8
    * Regression of ~85% for MemoryCopy.4096.setRange.*.Uint8

TEST=vm/cc/IRTest_Memory, co19{,_2}/LibTest/typed_data,
     lib{,_2}/typed_data, corelib{,_2}/list_test

Issue: #42072
Issue: b/294114694
Issue: b/259315681

Change-Id: Ic75521c5fe10b952b5b9ce5f2020c7e3f03672a9
Cq-Include-Trybots: luci.dart.try:vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-simriscv64-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-aot-linux-release-simarm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-ffi-qemu-linux-release-arm-try,vm-aot-msan-linux-release-x64-try,vm-msan-linux-release-x64-try,vm-aot-tsan-linux-release-x64-try,vm-tsan-linux-release-x64-try,vm-linux-release-ia32-try,vm-linux-release-simarm-try,vm-linux-release-simarm64-try,vm-linux-release-x64-try,vm-mac-release-arm64-try,vm-mac-release-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-aot-android-release-arm64c-try,vm-ffi-android-debug-arm64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/319521
Reviewed-by: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
copybara-service bot pushed a commit that referenced this issue Sep 5, 2023
The bounds checking was implemented in Dart previously, but this
removes _checkSetRangeArguments, inlining it into
_TypedListBase.setRange, renames _checkBoundsAndMemcpyN to _memMoveN
since it no longer performs bounds checking, and also removes the now
unneeded bounds checking from the native function TypedData_setRange.

TEST=co19{,_2}/LibTest/typed_data lib{,_2}/typed_data
     corelib{,_2}/list_test

Issue: #42072
Cq-Include-Trybots: luci.dart.try:vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-release-arm64-try,vm-linux-release-x64-try,vm-mac-release-arm64-try,vm-kernel-precomp-linux-release-x64-try
Change-Id: I85ec751708f603f68729f4109d7339dd8407ae77
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324102
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
copybara-service bot pushed a commit that referenced this issue Sep 7, 2023
When copying elements that are smaller than words on non-X86
architectures, only copy element by element until the remaining elements
can be copied in word-sized chunks, and then do so.

TEST=vm/cc/IRTest_Memory, co19{,_2}/LibTest/typed_data,
     lib{,_2}/typed_data, corelib{,_2}/list_test

Issue: #42072

Cq-Include-Trybots: luci.dart.try:vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-simriscv64-try,vm-linux-debug-simriscv64-try,vm-mac-debug-arm64-try,vm-aot-linux-release-simarm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-mac-release-arm64-try,vm-ffi-qemu-linux-release-riscv64-try,vm-ffi-qemu-linux-release-arm-try,vm-linux-release-simarm-try,vm-linux-release-simarm64-try,vm-mac-release-arm64-try,vm-aot-android-release-arm64c-try,vm-ffi-android-debug-arm64c-try
Change-Id: I61eab310b92a6bc5ebd88fa63d562103d887cb74
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324280
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
copybara-service bot pushed a commit that referenced this issue Sep 11, 2023
TEST=vm/cc/IRTest_Memory, co19{,_2}/LibTest/typed_data,
     lib{,_2}/typed_data, corelib{,_2}/list_test

Issue: #42072

Cq-Include-Trybots: luci.dart.try:vm-mac-debug-arm64-try,vm-aot-linux-release-simarm64-try,vm-aot-mac-release-arm64-try,vm-linux-release-simarm64-try,vm-mac-release-arm64-try,vm-aot-android-release-arm64c-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-linux-debug-simriscv64-try,vm-linux-release-simarm-try
Change-Id: Ife645a1d09be862d74e198162b124e657878280a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324683
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Alexander Markov <alexmarkov@google.com>
copybara-service bot pushed a commit that referenced this issue Sep 11, 2023
Instead of making a StaticCall to _TypedListBase.nativeSetRange
inside _memMoveN, make a CCall to the memmove leaf runtime entry.

Rename _TypedListBase._nativeSetRange to _setClampedRange, since
it's now only used when per-element clamping is necessary.

Fix the load optimizer so that loads of unboxed fields from freshly
allocated objects do not have the tagged null value forwarded
as their initial post-allocation value.

TEST=co19{,_2}/LibTest/typed_data lib{,_2}/typed_data
     corelib{,_2}/list_test
     vm/cc/LoadOptimizer_LoadDataFieldOfNewTypedData

Issue: #42072
Change-Id: Ib82e24a5b3287fa53099fffd3b563a27d777507e
Cq-Include-Trybots: luci.dart.try:vm-aot-linux-debug-simarm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try,vm-aot-mac-release-arm64-try,vm-aot-msan-linux-release-x64-try,vm-msan-linux-release-x64-try,vm-aot-tsan-linux-release-x64-try,vm-tsan-linux-release-x64-try,vm-linux-release-x64-try,vm-mac-release-arm64-try,vm-kernel-precomp-linux-release-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324080
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 13, 2023
This CL adds the ability to pass the payload address of the source
and destination directly to the MemoryCopy instruction as an untagged
value.

The new translation of the _TypedListBase._memMoveN methods use the new
MemoryCopy constructor, retrieving the untagged value of the data field
of both the source and destination. This way, if inlining exposes the
allocation of the object from which the data field is being retrieved,
then allocation sinking can remove the intermediate allocation if there
are no escaping uses of the object.

Since Pointer.asTypedList allocates such ExternalTypedData objects,
this CL makes that method inlined if at all possible, which removes
the intermediate allocation if the only use of the TypedData object
is to call setRange for memory copying purposes.

This CL also separates unboxed native slots into two groups: those
that contain untagged addresses and those that do not. The former
group now have the kUntagged representation, which mimics the old
use of LoadUntagged for the PointerBase data field and also ensures
that any arithmetic operations on untagged addresses must first be
explicitly converted to an unboxed integer and then explicitly converted
back to untagged before being stored in a slot that contains untagged
addresses.

When a unboxed native slot that contains untagged addresses is defined,
the definition also includes a boolean which represents whether
addresses that may be moved by the GC can be stored in this slot or not.
The redundancy eliminator uses this to decide whether it is safe to
eliminate a duplicate load, replace a load with the value originally
stored in the slot, or lift a load out of a loop.

In particular, the PointerBase data field may contain GC-moveable
addresses, but only for internal TypedData objects and views, not
for external TypedData objects or Pointers. To allow load optimizations
involving the latter, the LoadField and StoreField instructions now
take boolean flags for whether loads or stores from the slot are
guaranteed to not be GC-moveable, to override the information from
the slot argument.

Notable benchmark changes on x64 (similar for other archs unless noted):

JIT:
* FfiMemory.PointerPointer: 250.7%
* FfiStructCopy.Copy1Bytes: -26.73% (only x64)
* FfiStructCopy.Copy32Bytes: -25.18% (only x64)
* MemoryCopy.64.setRange.Pointer.Uint8: 19.36%
* MemoryCopy.64.setRange.Pointer.Double: 18.96%
* MemoryCopy.8.setRange.Pointer.Double: 17.59%
* MemoryCopy.8.setRange.Pointer.Uint8: 19.46%

AOT:
* FfiMemory.PointerPointer: 323.5%
* FfiStruct.FieldLoadStore: 483.3%
* FileIO_readwrite_64kb: 15.39%
* FileIO_readwrite_512kb (Intel Xeon): 46.22%
* MemoryCopy.512.setRange.Pointer.Uint8: 35.20%
* MemoryCopy.64.setRange.Pointer.Uint8: 55.40%
* MemoryCopy.512.setRange.Pointer.Double: 29.45%
* MemoryCopy.64.setRange.Pointer.Double: 60.37%
* MemoryCopy.8.setRange.Pointer.Double: 59.54%
* MemoryCopy.8.setRange.Pointer.Uint8: 55.40%
* FfiStructCopy.Copy32Bytes: 398.3%
* FfiStructCopy.Copy1Bytes: 1233%

TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list

Issue: #42072
Fixes: #53124

Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try
Change-Id: I563e0bfac5b1ac6cf1111649934067c12891b631
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324820
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 13, 2023
…es."

This reverts commit 06d7a23.

Reason for revert: everything crashes on vm-aot-linux-debug-simarm_x64

Original change's description:
> [vm/compiler] Change MemoryCopy to also take untagged addresses.
>
> This CL adds the ability to pass the payload address of the source
> and destination directly to the MemoryCopy instruction as an untagged
> value.
>
> The new translation of the _TypedListBase._memMoveN methods use the new
> MemoryCopy constructor, retrieving the untagged value of the data field
> of both the source and destination. This way, if inlining exposes the
> allocation of the object from which the data field is being retrieved,
> then allocation sinking can remove the intermediate allocation if there
> are no escaping uses of the object.
>
> Since Pointer.asTypedList allocates such ExternalTypedData objects,
> this CL makes that method inlined if at all possible, which removes
> the intermediate allocation if the only use of the TypedData object
> is to call setRange for memory copying purposes.
>
> This CL also separates unboxed native slots into two groups: those
> that contain untagged addresses and those that do not. The former
> group now have the kUntagged representation, which mimics the old
> use of LoadUntagged for the PointerBase data field and also ensures
> that any arithmetic operations on untagged addresses must first be
> explicitly converted to an unboxed integer and then explicitly converted
> back to untagged before being stored in a slot that contains untagged
> addresses.
>
> When a unboxed native slot that contains untagged addresses is defined,
> the definition also includes a boolean which represents whether
> addresses that may be moved by the GC can be stored in this slot or not.
> The redundancy eliminator uses this to decide whether it is safe to
> eliminate a duplicate load, replace a load with the value originally
> stored in the slot, or lift a load out of a loop.
>
> In particular, the PointerBase data field may contain GC-moveable
> addresses, but only for internal TypedData objects and views, not
> for external TypedData objects or Pointers. To allow load optimizations
> involving the latter, the LoadField and StoreField instructions now
> take boolean flags for whether loads or stores from the slot are
> guaranteed to not be GC-moveable, to override the information from
> the slot argument.
>
> Notable benchmark changes on x64 (similar for other archs unless noted):
>
> JIT:
> * FfiMemory.PointerPointer: 250.7%
> * FfiStructCopy.Copy1Bytes: -26.73% (only x64)
> * FfiStructCopy.Copy32Bytes: -25.18% (only x64)
> * MemoryCopy.64.setRange.Pointer.Uint8: 19.36%
> * MemoryCopy.64.setRange.Pointer.Double: 18.96%
> * MemoryCopy.8.setRange.Pointer.Double: 17.59%
> * MemoryCopy.8.setRange.Pointer.Uint8: 19.46%
>
> AOT:
> * FfiMemory.PointerPointer: 323.5%
> * FfiStruct.FieldLoadStore: 483.3%
> * FileIO_readwrite_64kb: 15.39%
> * FileIO_readwrite_512kb (Intel Xeon): 46.22%
> * MemoryCopy.512.setRange.Pointer.Uint8: 35.20%
> * MemoryCopy.64.setRange.Pointer.Uint8: 55.40%
> * MemoryCopy.512.setRange.Pointer.Double: 29.45%
> * MemoryCopy.64.setRange.Pointer.Double: 60.37%
> * MemoryCopy.8.setRange.Pointer.Double: 59.54%
> * MemoryCopy.8.setRange.Pointer.Uint8: 55.40%
> * FfiStructCopy.Copy32Bytes: 398.3%
> * FfiStructCopy.Copy1Bytes: 1233%
>
> TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list
>
> Issue: #42072
> Fixes: #53124
>
> Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try
> Change-Id: I563e0bfac5b1ac6cf1111649934067c12891b631
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324820
> Reviewed-by: Alexander Markov <alexmarkov@google.com>
> Commit-Queue: Tess Strickland <sstrickl@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

Issue: #42072
Change-Id: I7c31434e01108487de69a32154bbefd1538c6f0f
Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330523
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Commit-Queue: Alexander Aprelev <aam@google.com>
Auto-Submit: Alexander Markov <alexmarkov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 17, 2023
…es."

This is a reland of commit 06d7a23

This version fixes an issue when a phi node has multiple inputs with
different unboxed integer representations. The original CL made a
change where only the representations were considered, not the range
of values for the Phi calculated by range analysis. The reland goes
back to the old behavior for this case.

Also fixes the new tests on 32-bit architectures.

Original change's description:
> [vm/compiler] Change MemoryCopy to also take untagged addresses.
>
> This CL adds the ability to pass the payload address of the source
> and destination directly to the MemoryCopy instruction as an untagged
> value.
>
> The new translation of the _TypedListBase._memMoveN methods use the new
> MemoryCopy constructor, retrieving the untagged value of the data field
> of both the source and destination. This way, if inlining exposes the
> allocation of the object from which the data field is being retrieved,
> then allocation sinking can remove the intermediate allocation if there
> are no escaping uses of the object.
>
> Since Pointer.asTypedList allocates such ExternalTypedData objects,
> this CL makes that method inlined if at all possible, which removes
> the intermediate allocation if the only use of the TypedData object
> is to call setRange for memory copying purposes.
>
> This CL also separates unboxed native slots into two groups: those
> that contain untagged addresses and those that do not. The former
> group now have the kUntagged representation, which mimics the old
> use of LoadUntagged for the PointerBase data field and also ensures
> that any arithmetic operations on untagged addresses must first be
> explicitly converted to an unboxed integer and then explicitly converted
> back to untagged before being stored in a slot that contains untagged
> addresses.
>
> When a unboxed native slot that contains untagged addresses is defined,
> the definition also includes a boolean which represents whether
> addresses that may be moved by the GC can be stored in this slot or not.
> The redundancy eliminator uses this to decide whether it is safe to
> eliminate a duplicate load, replace a load with the value originally
> stored in the slot, or lift a load out of a loop.
>
> In particular, the PointerBase data field may contain GC-moveable
> addresses, but only for internal TypedData objects and views, not
> for external TypedData objects or Pointers. To allow load optimizations
> involving the latter, the LoadField and StoreField instructions now
> take boolean flags for whether loads or stores from the slot are
> guaranteed to not be GC-moveable, to override the information from
> the slot argument.
>
> Notable benchmark changes on x64 (similar for other archs unless noted):
>
> JIT:
> * FfiMemory.PointerPointer: 250.7%
> * FfiStructCopy.Copy1Bytes: -26.73% (only x64)
> * FfiStructCopy.Copy32Bytes: -25.18% (only x64)
> * MemoryCopy.64.setRange.Pointer.Uint8: 19.36%
> * MemoryCopy.64.setRange.Pointer.Double: 18.96%
> * MemoryCopy.8.setRange.Pointer.Double: 17.59%
> * MemoryCopy.8.setRange.Pointer.Uint8: 19.46%
>
> AOT:
> * FfiMemory.PointerPointer: 323.5%
> * FfiStruct.FieldLoadStore: 483.3%
> * FileIO_readwrite_64kb: 15.39%
> * FileIO_readwrite_512kb (Intel Xeon): 46.22%
> * MemoryCopy.512.setRange.Pointer.Uint8: 35.20%
> * MemoryCopy.64.setRange.Pointer.Uint8: 55.40%
> * MemoryCopy.512.setRange.Pointer.Double: 29.45%
> * MemoryCopy.64.setRange.Pointer.Double: 60.37%
> * MemoryCopy.8.setRange.Pointer.Double: 59.54%
> * MemoryCopy.8.setRange.Pointer.Uint8: 55.40%
> * FfiStructCopy.Copy32Bytes: 398.3%
> * FfiStructCopy.Copy1Bytes: 1233%
>
> TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list
>
> Issue: #42072
> Fixes: #53124
>
> Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try
> Change-Id: I563e0bfac5b1ac6cf1111649934067c12891b631
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/324820
> Reviewed-by: Alexander Markov <alexmarkov@google.com>
> Commit-Queue: Tess Strickland <sstrickl@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TEST=vm/dart/address_local_pointer, vm/dart/pointer_as_typed_list

Issue: #42072
Fixes: #53124

Change-Id: Iabb0e910f12636d0ff51e711c8c9c98ad40e5811
Cq-Include-Trybots: luci.dart.try:vm-ffi-qemu-linux-release-arm-try,vm-eager-optimization-linux-release-x64-try,vm-linux-release-x64-try,vm-linux-debug-x64-try,vm-aot-linux-release-x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-release-simarm_x64-try,vm-aot-linux-debug-simarm_x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330600
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue Oct 17, 2023
Missed a couple of LoadUntaggeds for PointerBase.data in the graph
intrinsifier during 4d1bdaa.

Creates slots for ExternalOneByteString.external_data and
ExternalTwoByteString.external_data and uses LoadField with those slots
instead of LoadUntagged.

TEST=ci

Issue: #42072
Fixes: #53124
Change-Id: I900281644c4c42ad303cc7a6121e4c8bb7852cfe
Cq-Include-Trybots: luci.dart.try:vm-aot-linux-release-simarm_x64-try,vm-aot-linux-release-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/330787
Reviewed-by: Alexander Markov <alexmarkov@google.com>
Commit-Queue: Tess Strickland <sstrickl@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

3 participants