Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminates Dart->Host copy for mutable responses #34018

Closed
wants to merge 3 commits into from

Conversation

gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Jun 13, 2022

This makes platform_channel_basic_binary_2host_1MB go from 529.2 µs to 412.4 µs (22% improvement) in my local testing.

The test is in the platform channel benchmarks in the flutter repo.

I found this optimization when exploring asset loading, so it should be beneficial there as well by eliminating a copy.

WARNING: Do not merge until flutter/flutter#105982 merges in the framework.

local testing results

Executed on iOS device.

before

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 56.8 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 684.4 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 49.7 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 529.2 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 26.1 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.6 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.3 µs

after

BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 54.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 685.3 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 50.1 µs
BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 412.4 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 25.3 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 39.3 µs
BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 22.4 µs

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or Hixie said the PR is test-exempt. See testing the engine for instructions on
    writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@gaaclarke gaaclarke force-pushed the eliminate-copy-response branch from aac6765 to a236ed5 Compare June 14, 2022 00:01
@gaaclarke gaaclarke requested a review from dnfield June 14, 2022 00:07
@gaaclarke gaaclarke marked this pull request as ready for review June 14, 2022 00:07
@flutter-dashboard

This comment was marked as outdated.

@gaaclarke
Copy link
Member Author

I'm not sure if this case requires a test exemption since it has a test but it exists in the other repo.

/*length=*/data_size,
/*peer=*/data.release(),
/*external_allocation_size=*/data_size,
/*callback=*/MappingFinalizer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Positive change. But, the previous version of the response was a copy that the Dart code could potentially modify. Now, modifications will be potentially fatal (if possible at all when using an external typed data, I'm not sure).

I suspect not too many use cases modify the response buffer, but I'm not sure. Does this require a migration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @dnfield @jonahwilliams who might know more about the mutability of these buffers.

Copy link
Member

@jonahwilliams jonahwilliams Jun 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had looked into this earlier and concluded we couldn't use this API because we currently expose this buffers as mutable.

If we exposed them as immutable (besides being a breaking change), Dart will choke on them if they end up getting fed back into an engine API: dart-lang/sdk#42785

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some kind of support from dart for this use case - immutable data that doesn't incur runtime penalties related to using UnmodifiableTypedData. /cc @lhrn - I think we've talked about this before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a problem if they are mutated? Since we've migrated the data buffers to unique pointers we can guarantee that Dart is the only one that has access to the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's the situation you are talking about @chinmaygarde? What if we expose a FileMapping to read-only memory and Dart tries to modify the data? We have tests that are sending and receiving platform channel messages. If we were modifying the buffers in our code, now or in the future, those tests would fail. Often times that data is just used to read to deserialize the data being passed and another copy is made at deserialization. The user never sees it.
If someone is using a BinaryCodec they may get access to a databuffer that will crash if they try to write to it (I'm not sure if we've eliminated all copies now). It will pretty obvious when that is attempted though. That isn't a breaking change unless it breaks any tests. If a user does run into this problem the solution would be pretty obvious, they need to copy the data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to run 'customer: money' with no problems with this change, but framework tests do fail:

aaclarke-macbookpro2:flutter aaclarke$ flutter --local-engine-src-path ~/dev/engine/src --local-engine=host_debug_unopt test
05:12 +6331 ~15: /Users/aaclarke/dev/flutter/packages/flutter/test/material/tab_bar_theme_test.dart: Tab bar theme - beveled rect indicator
══╡ EXCEPTION CAUGHT BY FLUTTER TEST FRAMEWORK ╞════════════════════════════════════════════════════
The following _Exception was thrown while running async test code:
Exception: Invalid image data

When the exception was thrown, this was the stack:
#0      _futurize (dart:ui/painting.dart:5779:5)
#1      ImageDescriptor.encoded (dart:ui/painting.dart:5634:12)
#2      instantiateImageCodecFromBuffer (dart:ui/painting.dart:2098:60)
#3      instantiateImageCodec (dart:ui/painting.dart:2056:10)
<asynchronous suspension>
<asynchronous suspension>
(elided one frame from package:stack_trace)
════════════════════════════════════════════════════════════════════════════════════════════════════
05:12 +6331 ~15 -1: /Users/aaclarke/dev/flutter/packages/flutter/test/material/tab_bar_theme_test.dart: Tab bar theme - beveled rect indicator [E]
  Test failed. See exception logs above.
  The test description was: Tab bar theme - beveled rect indicator

I'll see if I can address this, dropping back into a Draft. Worst case scenario we can selectively perform a copy for assets. This copy is not required for platform channels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I can't reproduce this failure again now, investigating to see if it is non-deterministic)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to prove if that failure is actually a bug. It's hard when I can't reproduce it. If we are supplying a mmaped file to dart where previously we were supplying a copy, editing the file on disk could create a problem where the ui thread would encounter a file that changed on disk underneath it. It may be purely academic if in practice we are always passing mmaped files that are never modified on disk. Also, that wouldn't be a new bug if the modifier is not the ui thread (this change just makes it more likely).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a problem if they are mutated? Since we've migrated the data buffers to unique pointers we can guarantee that Dart is the only one that has access to the data.

I was actually thinking about what happens if pages marked readonly are written to by Dart. That would be a segmentation fault. But you're also right about flutter::APKAssetMappings which are backed by AAssets that are not thread safe.

I am positive this is a contrived use case but it ought to be easy to write a test for this in the shell_unittests harness. I am actually curious if we can cause a segfault this way.

@gaaclarke gaaclarke marked this pull request as draft June 14, 2022 18:14
@gaaclarke
Copy link
Member Author

For asset loading the time change for a 1MB payload is: 543.29 µs to 339.95 µs (37%)

Benchmark: flutter/flutter#105982

@chinmaygarde
Copy link
Member

I'm totally convinced that not doing the copy is faster and also saves memory. But I'm not sure if this is going to cause segfaults in cases where a PROT_READ only mmaped buffer response is written to from Dart and also doesn't cause threading violations in cases APKAssetMappings.

I was looking into how to verify that this is safe and the guidance to reproduce the contrived use-case would be to duplicate EmbedderTest.PlatformMessagesCanReceiveResponse. On the native side, create an mmaped buffer that is write protected. On the Dart side, attempt to write to it. If we segfault, that would mean we need a migration. If not, we need to figure out how we can write to a write protected buffer without copies. Then hunt down those copies. Either way, I feel we'd need a different API and a migration.

@gaaclarke
Copy link
Member Author

I had a chat with @jonahwilliams and @dnfield. We decided:

  1. Having a mutable ByteBuffer that crashes when you write to it is unacceptable
  2. Having a private channel that elides the copy and we control all the end points isn't good enough since channels are not ever really private and embedders will have to know to follow the convention of not writing to the ByteBuffer that isn't enforced beyond crashing
  3. Implementing copy-on-write semantics for the ByteBuffer will probably not work because that would introduce an extra layer of indirection, thus harming performance in other ways. Dart could probably implement that but immutable data would be sufficient too.
  4. We should definitely elide the copy in cases where we are sending over read-write memory (like platform channels)
  5. We should land the benchmark for the asset loader
  6. We should audit what is actually being sent with "flutter/assets" in customer: money to potentially build a case for prioritizing the work on the Dart VM.

@gaaclarke
Copy link
Member Author

Hey, can't the operating system do copy-on-write with mmap? https://pythonspeed.com/articles/reduce-memory-array-copies/ This would be something to investigate.

@gaaclarke
Copy link
Member Author

Hey, can't the operating system do copy-on-write with mmap? https://pythonspeed.com/articles/reduce-memory-array-copies/ This would be something to investigate.

Looks like we are already doing copy-on-write (https://github.com/gaaclarke/engine/blob/a236ed5db78c665eb66de540a4fcda57c7862c71/fml/platform/posix/mapping_posix.cc#L75:L75), so this probably isn't a crasher. I'll have to write the test that Chinmay is suggesting tomorrow. Potentially we can remove all copies except apk assets which are not threadsafe.

@gaaclarke
Copy link
Member Author

I was able to verify that copy-on-write works, and it's not what we are currently doing. I had to apply the following patch:

diff --git a/assets/directory_asset_bundle.cc b/assets/directory_asset_bundle.cc
index 7896756614..c0d0e0d844 100644
--- a/assets/directory_asset_bundle.cc
+++ b/assets/directory_asset_bundle.cc
@@ -50,8 +50,12 @@ std::unique_ptr<fml::Mapping> DirectoryAssetBundle::GetAsMapping(
     return nullptr;
   }
 
-  auto mapping = std::make_unique<fml::FileMapping>(fml::OpenFile(
-      descriptor_, asset_name.c_str(), false, fml::FilePermission::kRead));
+  auto mapping = std::make_unique<fml::FileMapping>(
+      fml::OpenFile(descriptor_, asset_name.c_str(), false,
+                    fml::FilePermission::kRead),
+      std::initializer_list<fml::FileMapping::Protection>(
+          {fml::FileMapping::Protection::kRead,
+           fml::FileMapping::Protection::kWrite}));
 
   if (!mapping->IsValid()) {
     return nullptr;
diff --git a/fml/platform/posix/mapping_posix.cc b/fml/platform/posix/mapping_posix.cc
index aff3d645d3..52c878e4f7 100644
--- a/fml/platform/posix/mapping_posix.cc
+++ b/fml/platform/posix/mapping_posix.cc
@@ -72,7 +72,7 @@ FileMapping::FileMapping(const fml::UniqueFD& handle,
 
   auto* mapping =
       ::mmap(nullptr, stat_buffer.st_size, ToPosixProtectionFlags(protection),
-             is_writable ? MAP_SHARED : MAP_PRIVATE, handle.get(), 0);
+             MAP_PRIVATE, handle.get(), 0);
 
   if (mapping == MAP_FAILED) {
     return;

I propose we:

  1. Make FileMapping more expressive to signify we want copy-on-write by adding an extra flag to its constructor
  2. Make DirectoryAssetBundle start returning copy-on-write file mappings
  3. Add a new method to Mapping uint8_t* GetReadWriteMapping(). For in-memory mappings and copy-on-write mmap file mappings, that return a value. For other mappings it will return a nullptr.
  4. Make the dart response code copy the data if GetReadWriteMapping() returns nullptr before handing it to Dart (matching current behavior)

That way we can pass ownership of the read-write memory to Dart and a memcpy will only happen when one is required.

@gaaclarke
Copy link
Member Author

I filed a bug for the fuchsia failure: https://bugs.fuchsia.dev/p/fuchsia/issues/detail?id=102798

Since it works in darwin and android it seems to be a fuchsia bug. We can cut them out of this change for now.

@gaaclarke gaaclarke force-pushed the eliminate-copy-response branch 2 times, most recently from dd4e890 to 394980c Compare June 16, 2022 17:14
@gaaclarke gaaclarke marked this pull request as ready for review June 16, 2022 17:46
Copy link
Member

@chinmaygarde chinmaygarde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this approach is safe for the following reasons:

  • If the size of the buffer is changed from Dart code, there is no way to resize the underlying mapping.
  • For files in read-only storage, you simply cannot get mappings that are read-write. You will get a MAP_FAILED in these circumstances. This is probably what is happening on Fuchsia and Windows. I expected the same thing to happen on iOS too since the assets are in read-only storage too. Perhaps this is in debug mode where the files are synced to writable storage? Not sure. This probably doesn't work on iOS in release mode.
  • On platforms where you can get a mutable mapping (desktops perhaps), applications can now change the files on disk because they can write to mappings they get from platform message responses. This is unexpected to them because they never realized the mappings were backed by files. Oh, and even that is not guaranteed to work because there is no deterministic mflush.
  • You are relying on dodgy pointer const casts and the implementation of NSData on iOS.

The way I read this, this doesn't work on Android, Fuchsia, Windows and probably iOS (really, any platform that has a read-only asset pool), relies on pointer casts and implementation details on Foundation, doesn't handle buffer resizes, allows applications to unexpected change assets on disk where they can get a mapping, and makes the FML Mappings a whole lot more complicated.

For these reasons, I don't think this is the right approach. I would rather have another interface that gives the Dart code a read only mapping and perform a migration to that API. I suspect that is what 99% of the use cases want anyway.

@dnfield
Copy link
Contributor

dnfield commented Jun 16, 2022

Dart can't resize bytedata, but the other points stand

@jason-simmons
Copy link
Member

I agree with #34018 (review). Returning a Dart ByteData backed by a copy-on-write file mapping will produce unexpected behavior.

Is there a specific application scenario that this is targeting?

If the goal is to improve the performance of platform messages sent by host applications, then it might be worthwhile to avoid an extra copy if the message is backed by a malloced buffer or std::vector.

If the issue is related to loading assets, then we probably need a separate Dart API that represents a read-only mapping (as mentioned above).

@gaaclarke
Copy link
Member Author

gaaclarke commented Jun 17, 2022

Thanks for taking a look @chinmaygarde @dnfield @jason-simmons . I'll try to respond to all the feedback. There are 3 optimizations here: assets, platform channels on ios and platform channels on android. No one has a problem with the android platform channels, I'll split my feedback between the optimizations.

Assets

If the size of the buffer is changed from Dart code, there is no way to resize the underlying mapping.

As Dan pointed out, there is no API to do that. It couldn't be safely done by the Dart VM either since it doesn't know how the memory was created.

For files in read-only storage, you simply cannot get mappings that are read-write. You will get a MAP_FAILED in these circumstances. This is probably what is happening on Fuchsia and Windows. I expected the same thing to happen on iOS too since the assets are in read-only storage too. Perhaps this is in debug mode where the files are synced to writable storage? Not sure. This probably doesn't work on iOS in release mode.

I think there is a misunderstanding about how mmap works with copy-on-write. The flag that creates copy-on-write semantics is MAP_PRIVATE which means all the modifications to the data are private to the process, the files on disk are not modified so the file permissions don't matter (more info). The unit test I've provided asserts this by loading the contents in memory again after the memory is edited. The way this works is the OS will switch a VM page from looking at disk to looking at a page that is specific to the process (probably another region of RAM). This is possible on Windows too, see the documentation on FILE_COPY_MAP. It's just a notoriously finicky API and the engine requires a lot of effort getting development setup on Windows, that's why I didn't tackle it up yet.

On platforms where you can get a mutable mapping (desktops perhaps), applications can now change the files on disk because they can write to mappings they get from platform message responses.

This isn't the case because of the previous point.

iOS Platform Channels

You are relying on dodgy pointer const casts and the implementation of NSData on iOS.

The reason here is because it is mostly* safe. In most cases those NSData objects are ephemeral data blobs that we are creating (example). Unfortunately objc lacks the the ability to statically assert that we have one unique owner of the NSData, but that would be the case. When you have one unique owner, reinterpretting const memory to non-const memory is safe.

*) I say mostly safe because there is one case where it could be problematic now that I think of it. If a user is using a BinaryCodec and responds to a message with NSData, then Dart modifies that data, then it is read again in in objc that would be a problem.

This could be solved by querying the Codec if it copies the data when responding and inserting a copy if it isn't.

If you have concerns beyond that we can address them with tests.

Android Platform Channels

If the goal is to improve the performance of platform messages sent by host applications, then it might be worthwhile to avoid an extra copy if the message is backed by a malloced buffer or std::vector.

That's what's happening in the Android case, we can statically assert we have one owner of the read-write data so we are good here.

@gaaclarke
Copy link
Member Author

Here's local testing results of the platform channel benchmarks for Android on a Pixel 4a

before

I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 361.6 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 2303.7 µs
I/flutter (17630): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 142.4 µs
I/flutter (17630): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1483.9 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 86.1 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 134.4 µs
I/flutter (17630): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 79.8 µs

after

I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 414.1 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 1979.8 µs
I/flutter (12863): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 119.2 µs
I/flutter (12863): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1024.4 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 88.3 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 135.5 µs
I/flutter (12863): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 82.1 µs

Looks like it is slightly slower for small payloads and 30% faster for 1MB payloads.

@gaaclarke
Copy link
Member Author

gaaclarke commented Jun 17, 2022

I've removed the iOS implementation since it will take a bit more work to get it working safely with the BinaryCodec (or custom codecs). I'll land it in a separate PR.

@jonahwilliams
Copy link
Member

jonahwilliams commented Jun 17, 2022

Is it beneficial to trade off the performance of small messages for larger ones?

@gaaclarke
Copy link
Member Author

Is it beneficial to trade off the performance of small messages for larger ones?

The previous implementation was already conditionalizing how data was being treated based on its size for performance reasons: https://github.com/gaaclarke/engine/blob/a01f69234e9cc0d21d4a972e4e22bb19737e00fe/third_party/tonic/typed_data/dart_byte_data.cc#L26:L26

I can duplicate that logic here.

(sorry, i edited your comment on accident instead of replying)

@gaaclarke
Copy link
Member Author

gaaclarke commented Jun 17, 2022

@jonahwilliams I duplicated that logic in the latest patch and here are the results for local testing (on Android pixel 4a):

I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Small: 301.4 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/Large: 1949.6 µs
I/flutter (24598): BasicMessageChannel/BinaryCodec/Flutter->Host/Large: 124.1 µs
I/flutter (24598): BasicMessageChannel/BinaryCodec/Flutter->Host/1MB: 1055.0 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host/SmallParallel3: 84.7 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/Small: 195.6 µs
I/flutter (24598): BasicMessageChannel/StandardMessageCodec/Flutter->Host (background)/SmallParallel3: 127.3 µs

@gaaclarke gaaclarke requested a review from chinmaygarde June 17, 2022 18:42
@gaaclarke gaaclarke force-pushed the eliminate-copy-response branch from 20ba04a to 343c07b Compare June 21, 2022 23:44
@gaaclarke
Copy link
Member Author

@chinmaygarde friendly ping since my response was right before a long weekend.

@gaaclarke
Copy link
Member Author

gaaclarke commented Jun 23, 2022

Chat summary

@chinmay and I had a discussion about this issue last night on Discord. Here was the takeaways, feel free to correct me if I'm misconstruing something.

  • It would be better if Dart could represent const data, because that already matches the fml::Mapping interface. However that has yet to be implemented and is lacking a design to specify how we'd deal with changing the Codec APIs to accommodate it.
  • Right now we are doing the least-common denominator where we copy the data unconditionally. We need to conditionally optimize away the copy. Even if Dart could represent const data, for the case of AAssets they'll still need to be copied since they aren't allowed to be passed between threads.
  • A big point of friction is around modifying the fml::Mapping API. IsDontNeedSafe() is seen as a mistake and we don't want that to happen again. The concern is that when/if dart implements const data representation and we migrate to using that, will could be left with a vestigial API on fml::Mapping that is difficult to remove.
  • Conditionally optimizing away the copy is added complexity, but the gains in performance merit added complexity. (20% improvement in 1MB android platform channel round trip, 30% on iOS asset loading 1MB)

Options to move forward

Here's some options I've thought of that address the concerns.

Mapping->Dart Conversion

Promoting FlutterMapping::GetMutableMapping() is kind of a big change to fml::Mapping. Someone could easily latch onto it and use it for some purpose that isn't moving the fml::Mapping to Dart. That means if we migrate to a Dart const data representation we could end up with a method that would be difficult to remove. Instead of adding GetMutableMapping() we could add a context specific method that couldn't be used for a different purpose, a Dart_Handle MoveToByteBuffer() call.

class Mapping {
  // Move the Mapping to be wrapped in a thread-safe Dart 
  // ByteBuffer.  If this call is successful, calling 
  // "GetMapping()" afterwards will return nullptr.
  virtual std::optional<Dart_Handle> MoveToByteBuffer() =0;
};

The benefit of this is that this method cannot be used for a purpose beyond what we need it for and thus making it easier to modify in the future. It could be converted to Dart_Handle MoveToConstByteBuffer() in the future.

edit: We could also make the function always return something, incurring a copy if it has to. Putting that logic in the Mapping instead of in the handler.

Visitor

If we are concerned about modifying the Mapping API we can add the functionality we need with the Visitor design pattern.

class MappingVisitor {
  virtual void Visit(FileMapping* mapping) =0;
  virtual void Visit(MallocMapping* mapping) =0;
  //...
};
class Mapping {
  virtual void Visit(MappingVisitor* visitor) =0;
};

That way we can write the logic to decide if we can transfer ownership to Dart inside of a Visitor. The downside to this is there are subclasses of fml::Mapping that are only compiled for certain embedders. We'd have to declare them in the core engine so the Visitor can reference them (I'm not sure if we can use forward declarations). This also has a slight performance cost, performing double dispatch instead of single dispatch.

Migrate to PlatformMessageResponse to MallocMapping

Migrating the response class to using MallocMapping would allow us to remove the copy in certain cases without modifying the fml::Mapping interface. One downside is that it would only easily be available to the Android Platform Channel's case since that already has a MallocMappping from the jni layer. It would never be able to pass ownership of mmap'd data to Dart and would unlikely to ever be compatible with iOS (without some trickery).

class PlatformMessageResponse {
  virtual void Complete(std::unique_ptr<fml::MallocMapping> data) =0;
};

The major downside of this though is that it may actually introduce an extra copy in the case where the payload is small so we are performing a copy when passing the data to Dart and the payload wasn't already in a MallocMapping. For example, iOS Platform Channels whose response payload is less than 1000 bytes.

std::initializer_list<fml::FileMapping::Flags>());
#else
// We use a copy-on-write mapping so we can safely pass ownership to Dart
// which currently lacks immutable data buffers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dart has them, but the implementation is problematic:
https://github.com/dart-lang/sdk/blob/main/sdk/lib/typed_data/unmodifiable_typed_data.dart. The problem is that including them causes the regular typed data classes to become non-monomorphic, which breaks inlining optimizations. Might be worth benchmarking with anyway.

@gaaclarke
Copy link
Member Author

One other thought that I didn't communicate well. The data we are actually sending here in a response is one of 4 types:

  1. Ephemeral data that we can prove at compile time the messenger has unique ownership of (ex MallocMapping)
  2. A memory representation of a file (ex mmap)
  3. A non-threadsafe memory representation of a file (ex AAssets)
  4. Data that we can't statically assert is uniquely owned by the messenger. (ex NSDataMapping)

Only type 4 requires a Dart representation that is immutable. It also represents a fraction of the actual usage. Right now it is iOS platform channel usage but I think we can use convention and tests to capture most of iOS's use cases as well by proving that the messenger has unique ownership.

@gaaclarke
Copy link
Member Author

Here is some data supporting the importance of this change. (@zanderso @jonahwilliams @jiahaog )

Microbenchmarks Summary

Here is a reminder of the microbenchmarks results:

  • For iOS asset loading the time change for a 1MB payload is: 543.29 µs to 339.95 µs (37%)
  • For Android Platform Channels 1MB round trip time: 1483.9 µs to 1055.0 µs (29%)

Customer Money

Here is some data that shows the potential gain for customer money.

Sqflite latency breakdown

In b/237035671 customer money asks why the wall time for loading data from sqflite takes 17ms when the actual execution time of java handler takes 1ms.

The reasons why are:

  1. The latency when scheduling on the platform thread (could be removed with Migrate to using Android Background Platform Channels tekartik/sqflite#719)
  2. The latency when scheduling on sqlflite's thread (probably not huge)
  3. The latency when scheduling on the platform thread again (I just removed this in a PR in sqflite: Dropped the extra thread hop when calling results tekartik/sqflite#825)
  4. The latency when scheduling the response on the ui thread (This probably contributes the most)
  5. The payload size of the request
  6. The payload size of the response (The province of this PR if the response is >1MB)

I've asked them measure the size of their responses to understand what portion of the 16ms is the response copy.

Platform channel audit

We do have data from customer money that gives us understanding about how platform channels are being used since the savings from this PR are a function of the size of the response payloads.

Since this improvement is gated on a payload greater than 1000 bytes, here are the platforms channels reported by customer money that are communicating over 1000 bytes on the "downstream" (in the direction of Host->Flutter):

channel down_bytes
plugins.flutter.io/<redacted>#updateConfig 27733
plugins.flutter.io/device_info#getAndroidDeviceInfo 5588
com.tekartik.sqflite#query 1366

When we hear back from customer money about the individual sizes we can make sure some of these payloads are over 1000 bytes.

Asset loading at startup

I also audited asset loading on customer money's prototype app on iOS and these are the assets getting loaded at startup and their size (remember anything over 1000 bytes would receive an improvement):

asset size (bytes)
asset/ic_us_tab_bar_minus_one_light.flr 2972
asset/ic_us_tab_bar_zero_light.flr 2770
asset/ic_us_tab_bar_plus_one_light.flr 3816
asset/network/home_backsplash_us.svg.vg 31718
asset/img_logo_<redacted>.svg.vg 4126
asset/network/homescreen_empty_state_business_v2.svg.vg 2964
asset/network/homescreen_empty_state_people_default_v2.svg.vg 4412
asset/bg_invite_billboard_us.svg.vg 154
asset/img_invite_billboard_illustration_us.svg.vg 11077
asset/toolbar/toolbar_footer_light.svg.vg 69
asset/toolbar/toolbar_footer_light.svg.vg 69
asset/my_money/link_to_stored_value_logo.svg.vg 816
asset/entity/container/img_search_icon.svg.vg 1413
asset/network/remittance_discover_illustration_v3.svg.vg 5470
asset/network/us_businesses_vertical_gas.svg.vg 4628
asset/network/us_businesses_vertical_parking.svg.vg 4178
asset/network/us_businesses_vertical_transit.svg.vg 2596

We cannot easily say how much time this PR would save the app at startup, or generalize for every possible app because it depends on how parallelized the code is and the system is. We'll get a better idea if we can get them to test the PR.

@zanderso
Copy link
Member

@jiahaog may know the most up-to-date way to try out this patch. Suggest following up offline.

@gaaclarke
Copy link
Member Author

gaaclarke commented Jun 24, 2022

I've audited the time spent performing this copy of data in customer money's prototype app on iOS at startup as well. There were 20 instances of the copy called with a total time of 7ms on an iPhone SE 2 (this is the copy that is removed in this PR for iOS assets):
Screen Shot 2022-06-24 at 2 51 36 PM

profile patch
--- a/third_party/tonic/typed_data/dart_byte_data.cc
+++ b/third_party/tonic/typed_data/dart_byte_data.cc
@@ -8,6 +8,18 @@
 
 #include "tonic/logging/dart_error.h"
 
+#include <os/signpost.h>
+
+static os_log_t points_of_interest_logger = NULL;
+
+static os_log_t GetPointsOfInterestLogger() {
+  if (!points_of_interest_logger) {
+    points_of_interest_logger =
+        os_log_create("dev.flutter", OS_LOG_CATEGORY_POINTS_OF_INTEREST);
+  }
+  return points_of_interest_logger;
+}
+
 namespace tonic {
 
 namespace {
@@ -28,9 +40,17 @@ Dart_Handle DartByteData::Create(const void* data, size_t length) {
     // The destructor should release the typed data.
     return handle;
   } else {
+    static uint64_t counter = 1;
+    static uint64_t identifier = counter++;
+    if (__builtin_available(macOS 10.14, iOS 12.0, *)) {
+      os_signpost_interval_begin(GetPointsOfInterestLogger(), identifier, "DartByteData memcpy");
+    }
     void* buf = ::malloc(length);
     TONIC_DCHECK(buf);
     ::memcpy(buf, data, length);
+    if (__builtin_available(macOS 10.14, iOS 12.0, *)) {
+      os_signpost_interval_end(GetPointsOfInterestLogger(), identifier, "DartByteData memcpy");
+    }
     return Dart_NewExternalTypedDataWithFinalizer(
         Dart_TypedData_kByteData, buf, length, buf, length, FreeFinalizer);
   }

@gaaclarke
Copy link
Member Author

@chinmaygarde friendly ping. This is caught up on your feedback. I responded to it with some options to move forward. Can you please capture where your head is at now. I'm back from vacation now and can fill in any gaps, but it seems like discussion on his PR usually comes back to people trying to interpret where your thoughts are now.

FWIW @zanderso told me that const buffers in Dart should be coming this quarter so keep that in mind that there seems to be movement there.

@gaaclarke gaaclarke force-pushed the eliminate-copy-response branch from 343c07b to ac5c3e7 Compare July 27, 2022 20:14
@gaaclarke
Copy link
Member Author

Rebased to help @jiahaog's testing.

@gaaclarke
Copy link
Member Author

Update: The Dart team believes they may have a way to present a const bytebuffer with little runtime performance. Previously they thought the cost to reading and writing would make it not worth it because of some global polymophism optimization that is currently happening.

We got the latests results from customer: money on startup time. This copy occupies 0.9% (9ms) of the ui thread at app startup on Android. We don't have an official measurement for iOS and it would be very different since it is eliminating asset loading copies.

@gaaclarke
Copy link
Member Author

The unmodifiable bytes change for Dart has landed: dart-lang/sdk@c1e67ac

I haven't had a chance to hook it up. This looks like we can make it a non-breaking change compile-wise since UnmodifiableByteDataView implements ByteData (dart-lang/sdk@c1e67ac#diff-ff924e00bfbb53afcdf9b21700dd244df23776793371f29db6c46c47b8987b0fR5487). It will still technically be a breaking change at runtime (that I suspect no one will run into). I've been asked by the Dart team to check out the performance ramifications too.

@gaaclarke gaaclarke changed the title Eliminates Dart->Host response copy Eliminates Dart->Host copy for mutable responses Aug 17, 2022
@gaaclarke
Copy link
Member Author

closed in lieu of #35468

@gaaclarke gaaclarke closed this Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants