Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Optimize asTypedList() #39843

Closed
dcharkes opened this issue Dec 18, 2019 · 16 comments
Closed

[vm/ffi] Optimize asTypedList() #39843

dcharkes opened this issue Dec 18, 2019 · 16 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size

Comments

@dcharkes
Copy link
Contributor

_asExternalTypedData is implemented as a runtime entry, we should make this a recognized method.

Samples: 27K of event 'cycles:uppp', Event count (approx.): 22172477433
  Overhead  Command     Shared Object                  Symbol
-   13.77%  DartWorker  dart                           [.] dart::BootstrapNatives::DN_Ffi_asExternalTypedData                                                ◆
   - 10.70% dart::BootstrapNatives::DN_Ffi_asExternalTypedData                                                                                               ▒
      - 10.48% dart::NativeEntry::BootstrapNativeCallWrapper                                                                                                 ▒
           CallBootstrapNative                                                                                                                               ▒
         + dart.ffi_::__asExternalTypedData@7050071                                                                                                          ▒
   + 1.05% dart::Type::type_class_id                                                                                                                         ▒
   + 0.64% dart::Smi::AsInt64Value                                                                                                                           ▒
+    6.52%  DartWorker  dart                           [.] dart::ExternalTypedData::New                                                                      ▒
+    6.01%  DartWorker  dart                           [.] dart::VMHandles::AllocateHandle                                                                   ▒
+    4.57%  DartWorker  perf-124944.map                [.] *dart.typed_data___ExternalUint64Array&_TypedList&_IntListMixin&_TypedIntListMixin@6027147_setRang▒
+    4.08%  DartWorker  dart                           [.] dart::Object::Allocate                                                                            ▒
+    3.45%  DartWorker  perf-124944.map                [.] *Lz4Lib_decompressFrame                                                                           ▒
@dcharkes dcharkes added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Dec 18, 2019
@mkustermann mkustermann added type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size labels Dec 16, 2020
@vaind
Copy link

vaind commented May 20, 2021

Any chance this is going to be addressed? We have had to do various workarounds in pub.dev/packages/objectbox to at least alleviate the issue but there's really no way to completely get around it AFAIK.

And it really is rather slow at about 7.5 million asTypedList() calls per second, taking up a good portion of DB read operations (getting about 5 million per second on the same machine so you get the idea how much of that is asTypedList). See also https://github.com/objectbox/objectbox-dart/blob/main/benchmark/bin/native_pointers.dart

@dcharkes
Copy link
Contributor Author

Your benchmark results are consistent with the ones in the SDK:

class PointerInt8TypedDataNew extends BenchmarkBase {
Pointer<Int8> pointer = nullptr;
PointerInt8TypedDataNew() : super('FfiMemory.PointerInt8TypedDataNew');
@override
void setup() {
pointer = calloc(N);
}
@override
void teardown() {
calloc.free(pointer);
pointer = nullptr;
}
@override
void run() {
final typedData = pointer.asTypedList(N);
doStoreInt8TypedData(typedData, N);
final int x = doLoadInt8TypedData(typedData, N);
if (x != N) {
throw Exception('$name: Unexpected result: $x, expected $N');
}
}
}
final emptyTypedData = Int8List(0);
class PointerInt8TypedDataReuse extends BenchmarkBase {
Pointer<Int8> pointer = nullptr;
Int8List typedData = emptyTypedData;
PointerInt8TypedDataReuse() : super('FfiMemory.PointerInt8TypedDataReuse');
@override
void setup() {
pointer = calloc(N);
typedData = pointer.asTypedList(N);
}
@override
void teardown() {
calloc.free(pointer);
pointer = nullptr;
typedData = emptyTypedData;
}
@override
void run() {
doStoreInt8TypedData(typedData, N);
final int x = doLoadInt8TypedData(typedData, N);
if (x != N) {
throw Exception('$name: Unexpected result: $x, expected $N');
}
}
}

I can't make any statements on prioritization. Currently, we're busy working on other dart:ffi tasks.

Until we get to this, would it be possible to use the Pointer [] and []= operations instead?

@vaind
Copy link

vaind commented May 20, 2021

Until we get to this, would it be possible to use the Pointer [] and []= operations instead?

We're using it to get a ByteData.view() - any chance there's an implementation working with Pointer<Uint8>?

@vaind
Copy link

vaind commented May 20, 2021

I've tried a rough, simple (and probably incorrect) dart implementation of ByteData, with its overrides implemented using byte shifts and indexing a Pointer<>, e.g.

  @override
  int getUint32(int byteOffset, [Endian endian = Endian.big]) {
    assert(endian == Endian.little);
    return (_ptr[byteOffset + 3] << 24) |
        (_ptr[byteOffset + 2] << 16) |
        (_ptr[byteOffset + 1] << 8) |
        _ptr[byteOffset];
  }

but it's even slower than using asTypedList for our use case (in our DB - reads go from those 5M/sec to about 1.3M/sec).

@blaugold
Copy link
Contributor

Any updates on this? I would really appreciate this optimization getting implemented.

It would be quite impactful, for many users of dart:ffi, since it currently dominates Utf8Pointer.toDartString and other methods in package:ffi:
Screenshot from 2021-11-24 12-19-39

@vaind
Copy link

vaind commented Nov 24, 2021

@dcharkes I'm happy to contribute this change if you can point me in the right direction, e.g. what you meant by "make this a recognized method."

@dcharkes
Copy link
Contributor Author

@dcharkes I'm happy to contribute this change if you can point me in the right direction, e.g. what you meant by "make this a recognized method."

That is a fairly complicated process. It involves writing our intermediate representation graph in the compiler. Here is the same process for making Pointer loads and stores recognized methods.

Furthermore, that intermediate representation then generates machine code, which if it is wrong will lead to segfaults which are good fun debugging.

Unless you're already familiar with (1) building the Dart VM, (2) Dart VM's intermediate representation, and (3) debugging segfaults with GDB/RR, I would not recommend it.

@vaind
Copy link

vaind commented Nov 25, 2021

Thanks @dcharkes, the linked MR has convinced me this is not the right time for me to get involved here :)

@blaugold
Copy link
Contributor

I've submitted a PR (#47780) to resolve this issue.

@dcharkes
Copy link
Contributor Author

dcharkes commented Nov 26, 2021

I've submitted a PR (#47780) to resolve this issue.

You made my day! I left a review in Gerrit: https://dart-review.googlesource.com/c/sdk/+/221360.

Upon running the tests I saw 1 test started failing. Let me know if you need help debugging it. (The test doesn't fail on main.) cc @sstrickl who is fluent in the type testing stubs and type argument vectors.

@blaugold
Copy link
Contributor

Glad I can help. 😃
I'll try to address the requested changes later today.

@blaugold
Copy link
Contributor

blaugold commented Nov 27, 2021

@dcharkes
I haven't been able to debug that failing test yet, but it seems to be related to the number of added recognized methods (10). With 8 or less new recognized methods, the TTS test passes. The TTS test fails because it is unable to set up a partial TTS with a TypeTestCache because the TTS respecializes instead. I don't know enough about how those things interact. This line causes the issue.

Update
The issue that makes the test fail seems to be unrelated to the PR, it just happens to be triggered by the code change. Instructions::Equals contains a bug because it potentially compares uninitialized memory. The Instructions header is padded to align the following instructions on word boundaries. The overall size of Instructions is further padded to kObjectAlignment. By using memcmp with the full object size, memory with random content is included in the comparison.

Update
Memory for Instructions actually is properly initialized in Object::InitializeObject... In any case, in the failing test, Instructions::Equals returns false even though the instructions are the same.

Update
Instructions::Equals compares the full memory of two instances, including the header of UntaggedObject. In this case, one instance has been marked by the GC while the other has not.

@sstrickl
Copy link
Contributor

Sorry about the delay in responding, was OOO last week. Thanks for digging further, @blaugold, that sounds right and I can make a quick fix CL if so :)

@sstrickl
Copy link
Contributor

Done in https://dart-review.googlesource.com/c/sdk/+/221467, up for review now.

copybara-service bot pushed a commit that referenced this issue Nov 29, 2021
As discovered by @blaugold on GitHub, Instructions::Equals can give a
false negative on the JIT if the GC flags in the object header are set
differently.

This CL changes Instructions::Equals to only compare the
Instructions-specific portion of the object, namely the size_and_flags_
field and the content bytes. If those match, then the important parts of
the object header match: the cid is always kInstructionsCid, as the
Equals method assumes non-null instances, and the size in the object
header is calculated using the content size from size_and_flags_.

TEST=Manually ran failing tests after cherry picking CL 221360 on top.

Bug: #39843
Change-Id: I25c25bbe6b1bf615d4cd923bfe871da7e929822a
Cq-Include-Trybots: luci.dart.try:vm-kernel-precomp-dwarf-linux-product-x64-try,vm-kernel-precomp-linux-product-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-kernel-linux-product-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/221467
Commit-Queue: Tess Strickland <sstrickl@google.com>
Reviewed-by: Daco Harkes <dacoharkes@google.com>
@sstrickl
Copy link
Contributor

CL has landed, so you should be clear now. Manual running of the tests succeeded on some of the failing build configurations, so if you're still seeing failures in the other CL once rebased, I'd be interested in knowing.

@blaugold
Copy link
Contributor

Thanks for the quick fix @sstrickl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants