Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Provide memcopy function #43968

Closed
dcharkes opened this issue Oct 28, 2020 · 13 comments
Closed

[vm/ffi] Provide memcopy function #43968

dcharkes opened this issue Oct 28, 2020 · 13 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug

Comments

@dcharkes
Copy link
Contributor

dcharkes commented Oct 28, 2020

After adressing #43967 we might also expose a memcopy function in the dart:ffi API to provide faster mem copying in and out of TypedData that is still GC-safe.

@dcharkes
Copy link
Contributor Author

dcharkes commented Jun 1, 2023

The internal API:

/// Copies data byte-wise from [source] to [target].
///
/// [source] and [target] should either be [Pointer] or [TypedData].
@pragma("vm:entry-point")
@pragma("vm:recognized", "other")
external void _memCopy(Object target, int targetOffsetInBytes, Object source,
int sourceOffsetInBytes, int lengthInBytes);

A public API should probably have some more guardrails.

@mraleph
Copy link
Member

mraleph commented Jun 2, 2023

FWIW I would prefer that we simply provide fast path in List.setRange (and possibly a TFA based specialization of TypedData.setRange(TypedData). This way we can also improve memory copying for non-FFI scenarios.

@dcharkes
Copy link
Contributor Author

dcharkes commented Jun 2, 2023

I took a stab at this and modelled the API after the asTypedList extension methods on Pointers.

https://dart-review.googlesource.com/c/sdk/+/307040

extension Int64Pointer on Pointer<Int64> {
  /// ...
  external Int64List asTypedList(
    int length, {
    @Since('3.1') Pointer<NativeFinalizerFunction>? finalizer,
    @Since('3.1') Pointer<Void>? token,
  });

  /// Copies memory from `this` to [target].
  ///
  /// Copies `8 * length` bytes of memory.
  ///
  /// The memory backing `this` and [target] must not overlap.
  ///
  /// The [length] must not be larger than [target.length].
  @Since('3.1')
  external memCopyToTypedList(Int64List target, int length);

  /// Copies memory from [source] to `this`.
  ///
  /// Copies `8 * length` bytes of memory.
  ///
  /// The memory backing `this` and [source] must not overlap.
  ///
  /// The [length] must not be larger than [source.length].
  @Since('3.1')
  external memCopyFromTypedList(Int64List source, int length);

  /// Copies memory from `this` to [target].
  ///
  /// Copies `8 * length` bytes of memory.
  ///
  /// The memory backing `this` and [target] must not overlap.
  @Since('3.1')
  external memCopy(Pointer<Int64> target, int length);
}

Any feedback on the API is welcome. @mkustermann @lrhn

Some design decisions:

  • I have aligned the TypedData types and Pointer type arguments so that we can talk about length in element (similar to asTypedList).
  • I did not expose any offsets, rather typed data views and Pointer.elementAt should be used to do this. This makes giving nice error messages when the length/offset is out of range easier.

@dcharkes
Copy link
Contributor Author

dcharkes commented Jun 2, 2023

FWIW I would prefer that we simply provide fast path in List.setRange (and possibly a TFA based specialization of TypedData.setRange(TypedData). This way we can also improve memory copying for non-FFI scenarios.

That is an interesting idea, using asTypedList on the pointer and making memcopy between TypedDatas faster.

@dcharkes
Copy link
Contributor Author

dcharkes commented Jun 2, 2023

FWIW I would prefer that we simply provide fast path in List.setRange (and possibly a TFA based specialization of TypedData.setRange(TypedData). This way we can also improve memory copying for non-FFI scenarios.

The only downside of this is that it's not visible from the Dart code whether it's going to be a fast copy or not. E.g. if the call sites are polymorphic, or have mismatched element sizes. How much do we care about predictable performance? @mraleph @mkustermann?

If we want predictable performance, and support TypedData copies, we could consider introducing a memoryCopy method on Int64List and friends. And then in addition rewrite List.setRange if we can in the TFA to use that memoryCopy method. (Of course that would lead to the question what kind of predictable performance we could get in the JS/WASM backends as well, since it's a shared API. cc @askeksa-google @rakudrama.)

@haberman
Copy link

haberman commented Jun 2, 2023

This would greatly accelerate protobuf parsing using https://github.com/protocolbuffers/upb. Profiles currently show ~40% of the overall parsing cost is just copying the data into an FFI buffer.

Ideally we could just parse directly out of the Uint8List buffer as proposed in #44589, but in cases where that is not possible for whatever reason a memcopy() function would help a lot.

@mkustermann
Copy link
Member

FWIW I would prefer that we simply provide fast path in List.setRange (and possibly a TFA based specialization of TypedData.setRange(TypedData). This way we can also improve memory copying for non-FFI scenarios.

I agree with @mraleph we should make List.setRange() and TypedData.setRange() faster - which is already tracked at #42072.

The only downside of this is that it's not visible from the Dart code whether it's going to be a fast copy or not. E.g. if the call sites are polymorphic, or have mismatched element sizes. How much do we care about predictable performance? @mraleph @mkustermann?

Specifically for memory copies of pointers / typed data, I don't see any such concern. We guarantee any TypedData.setRange(TypedData) will be fast.

@mkustermann
Copy link
Member

@sstrickl is going to work on optimizing memory copies (we may not want a separate API though - as this issue suggests - as one can always make views on ffi memory and use existing TypedData.setRange() calls)

@mkustermann
Copy link
Member

@dcharkes @mraleph if we agree we don't need a new API, maybe we can close this issue in favor of #42072 ?

@dcharkes
Copy link
Contributor Author

dcharkes commented Aug 7, 2023

@dcharkes @mraleph if we agree we don't need a new API, maybe we can close this issue in favor of #42072 ?

sgtm, I can abandon https://dart-review.googlesource.com/c/sdk/+/307040.

@sstrickl you might want to copy the test from that CL (and massage it to use setRange). We should check that the cases covered in the test are all working.

Probably irrelevant but also worth checking: How much overhead does the typed-data object creation add if we're just talking about copying between two pointers? (Or a typed data and a pointer.)

final p1 = malloc<Int8>(1024);
final p2 = malloc<Int8>(1024);
p2.asTypedList(1024).setRange(0, 1024, p1.asTypedList(1024));

@mkustermann
Copy link
Member

How much overhead does the typed-data object creation add if we're just talking about copying between two pointers? (Or a typed data and a pointer.)

We should try to come to a solution where the TypedData object creation will be optimized away (will get there soon).

@mkustermann
Copy link
Member

After all the work @sstrickl has done here, the code

dest.asTypedList(n).setRange(0, n, src.asTypedList(n))

now compiles to a MemoryCopyInstr without typed data view allocation etc. - see IL test

@dcharkes
Copy link
Contributor Author

dcharkes commented Dec 4, 2023

After all the work @sstrickl has done here

Thanks @sstrickl !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

5 participants