Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dart:ffi not working with Isolate.spawn #48532

Closed
xuchaoqian opened this issue Mar 9, 2022 · 13 comments
Closed

dart:ffi not working with Isolate.spawn #48532

xuchaoqian opened this issue Mar 9, 2022 · 13 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-ffi

Comments

@xuchaoqian
Copy link

xuchaoqian commented Mar 9, 2022

We implemented a flutter plugin which uses dart:ffi package to call c++ code.
It works well without Isolate.spawn. But it crashed after moving the code to spawned Isolate.

So, does Isolate.spawn not support dart:ffi?

The error message as following:

signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Cause: null pointer dereference

My developer tools as following:

* Dart&Flutter SDK Version
Flutter 2.10.3 • channel stable • https://github.com/flutter/flutter.git
Framework • revision 7e9793dee1 (7 days ago) • 2022-03-02 11:23:12 -0600
Engine • revision bd539267b4
Tools • Dart 2.16.1 • DevTools 2.9.2

* Whether you are using Windows, MacOSX, or Linux (if applicable)
MacOSX
@mraleph
Copy link
Member

mraleph commented Mar 9, 2022

You need to provide more information. Right now the only thing we can see is that there is some null pointer dereference. There can be many different reasons why your code is crashing. dart:ffi should work fine with isolates - though you should keep in mind that isolates involve threads, which means that certain things might work differently (e.g. if your C++ library uses TLS storage, etc).

@mraleph mraleph added the needs-info We need additional information from the issue author (auto-closed after 14 days if no response) label Mar 9, 2022
@xuchaoqian
Copy link
Author

@mraleph thanks a lot for your quick response.

Basically, the data flow is as follows:

user code <---dart object---> lib (as a flutter plugin) which calls Isolate.spawn() <---dart object---> spawned isolate which uses dart:ffi to call c++ code <---json---> c++ code which runs heavy task, and also invokes callbacks from spawned isolate.

The code which calls Isolate.spawn() is as follows:

void testSpawn() async {
    Stopwatch stopwatch = Stopwatch()..start();
    final filename = 'json_01.json';
    final result = result = await _spawnAndReceive(filename);s
    print('result: $result');
    print('test spawn: in ${stopwatch.elapsedMilliseconds} ms');
  }

  // Spawns an isolate and sends a [filename] as the first message.
  // Waits to receive a message from the the spawned isolate containing the
  // parsed JSON.
  static Future<String> _spawnAndReceive(String fileName) async {
    final p = ReceivePort();
    await Isolate.spawn(_readAndParseJson, [p.sendPort, fileName]);
    return (await p.first) as String;
  }

  // The entrypoint that runs on the spawned isolate. Reads the contents of
  // fileName, decodes the JSON, and sends the result back the the main
  // isolate.
  static void _readAndParseJson(List<dynamic> args) async {
    SendPort responsePort = args[0];
    String fileName = args[1];
    // 🚀 use dart:ffi to call c++ code here 🚀
    Isolate.exit(responsePort, "1");
  }

If dart:ffi works fine with isolates, I may need look more into c++ code.

BTW, my c++ code uses dart api to release dart object as follows:

  FinalizablePointer *peer = new FinalizablePointer();
  peer->pointer = pointer;
  peer->finalizer = finalizer;
  Dart_FinalizableHandle handle = Dart_NewFinalizableHandle_DL(
      object, (void *)peer, external_allocation_size, finalizer_callback);
  if (handle == NULL) {
    delete peer;
    return 1;
  }

@no-response no-response bot removed the needs-info We need additional information from the issue author (auto-closed after 14 days if no response) label Mar 9, 2022
@srawlins srawlins added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-ffi labels Mar 9, 2022
@a-siva
Copy link
Contributor

a-siva commented Mar 9, 2022

//cc @aam

@aam
Copy link
Contributor

aam commented Mar 9, 2022

Isolate.spawn and ffi should work without any issues.
There are tests in dart sdk that ensure that, for example https://github.com/dart-lang/sdk/blob/main/tests/ffi/vmspecific_function_callbacks_exit_test.dart

@mraleph
Copy link
Member

mraleph commented Mar 10, 2022

One thing I could think about is that reusing something like a FFI trampoline between two isolates might result in a strange issues due to how this is implemented right now. e.g. the following will break in strange ways:

void callback() {
}

final callbackPtr = Pointer.fromFunction<Void Function()>(callback);

void isolateMain() {
  if (clibrary_needs_init())
    init_clibrary(callbackPtr);
}

e.g. if you spawn isolateMain multiple times but only initialize c library with a callbackPtr from the first isolate - things will go south later if library tries to call the callbackPtr it has. (we should probably fix this /cc @dcharkes @mkustermann)

But it is hard to say if this a problem or not without seeing stack trace from your SIGSEGV and/or poking around in the debugger to understand where exactly the crash occurs.

@dcharkes
Copy link
Contributor

if you spawn isolateMain multiple times but only initialize c library with a callbackPtr from the first isolate - things will go south later if library tries to call the callbackPtr it has.

Callbacks into an isolate can only be invoked when being the mutator thread from that isolate (either by doing an ffi call from that isolate, or by using the Dart API to enter the isolate). This is a known limitation.

Asynchronous callbacks would not have this limitation, when we get around to implementing those:

The known workaround is to use ports for the callbacks rather than ffi callback trampolines. See documentation in the first post in the above issue.

we should probably fix this

@mraleph, how do you propose addressing this known limitation? Always calling to a trampoline which blocks until there's no more mutator running in the isolate and then run it? That is basically what we would do for async callbacks.

@xuchaoqian
Copy link
Author

Maybe this is the key to this problem. I'm going to try long running isolate solution, and will give feedback here soon.

Thank you. @mraleph @dcharkes

e.g. if you spawn isolateMain multiple times but only initialize c library with a callbackPtr from the first isolate - things will go south later if library tries to call the callbackPtr it has. (we should probably fix this /cc @dcharkes @mkustermann)

@xuchaoqian
Copy link
Author

xuchaoqian commented Mar 11, 2022

It still does NOT work stably, even if initialize the native code once in the spawned isolate and remove all callbacks.
@aam @mraleph @dcharkes

I have created a minimal repo(isolate-spawn-demo) to reproduce this:

1. clone the repo
2. cd example && flutter pub get && flutter run -v
3. click the button "call ffi in spawn isolate"
4. sometimes it works, sometimes it will cause an error: "############failed to call ffi: $e ################"

Meanwhile, It works stably if call ffi in the main isolate. Just show the button in example/lib/main.dart and run again(Don't hot reload):

ElevatedButton(
  child: Text('call ffi in main isolate'),
  onPressed: callInMainIsolate,
),
// ElevatedButton(
//   child: Text('call ffi in spawn isolate'),
//   onPressed: callInSpawnedIsolate,
// ),

BTW. the native code is just a v8 hello-world program, see here.

@mraleph
Copy link
Member

mraleph commented Mar 11, 2022

@xuchaoqian My guess would be the following (without looking at this in the debugger). It's the same thing I warned about in my very first comment: V8 uses TLS (thread local storage) to associate v8::Isolate with native threads. However Dart does not guarantee that a particular isolate always runs on a particular thread: isolates run on top of a thread pool, so if Dart isolate suspends and resumes (e.g. due to waiting for some data to arrive via a port) it might end up on a different thread.

Your current code enters v8::Isolate when Runtime is created. However between creating Runtime and then using it for Script the underlying Dart isolate might have ended up on a different thread which does not have v8::Isolate.

You can try to change your code to enter isolate in Script instead, i.e. remove Enter/Exit from Runtime/~Runtime and instead in Script (https://github.com/xuchaoqian/isolate-spawn-demo/blob/main/cpp/Script.cpp#L7) do:

Script::Script() {
  auto isolate = Runtime::GetInstance()->GetIsolate();
  v8::Isolate::Scope isolate_scope(isolate);
  // ...
}

I think this will likely fix the issue.

There is a feature request to allow creating isolates pinned to a specific thread (#46943) but we have not yet have a chance to figure out how it would look like, so right now you have to be very careful using libraries like V8 which rely on TLS for certain things.

@xuchaoqian
Copy link
Author

xuchaoqian commented Mar 11, 2022

@mraleph Still does NOT work correctly. BUT you inspired me just now!

I should NOT use Isolate.spawn in my case, but should use thread in native code directly instead.

I will have a try in this way, and give feedback here soon.

@xuchaoqian
Copy link
Author

xuchaoqian commented Mar 20, 2022

Sorry for late reply. It took me some time to completely rewrite this flutter plugin. Now it works perfectly to use the solution: combine the benefits of native threads and native port.

Thank you for your awesome works on dart vm and dart:ffi. @mraleph @dcharkes @aam

I still have some questions about resource management as follows:

1. Is it safe to malloc native memory on the [dart side] (use String.toNativeUtf8() method) and call the free() function on the [cpp side]?

Dart side

typedef Exec = NativeFunction<Void Function(Pointer<Utf8>)>;
final exec = lib.lookup<Exec>('exec').asFunction();
exec("hello dart:ffi".toNativeUtf8());

Cpp side

void exec(const char *value) {
  // do somethings...
  free(value);
}

2. Is it safe to malloc native memory on the [cpp side] (use malloc() function) and call malloc.free() method on the [dart side]?

It's a little tricky to transfer pointers from cpp side to dart side:

Cpp side

Dart_CObject errmsg_obj;
errmsg_obj.type = Dart_CObject_kInt64;
errmsg_obj.value.as_int64 = reinterpret_cast<intptr_t>(errmsg); // cpp pointer, point to object allocated by malloc!!!

Dart_CObject payload_obj;
payload_obj.type = Dart_CObject_kInt64;
payload_obj.value.as_int64 = reinterpret_cast<intptr_t>(payload); // cpp pointer, point to object allocated by malloc!!!

Dart_CObject* result_values[] = {&errmsg_obj, &payload_obj};
Dart_CObject result_obj;
result_obj.type = Dart_CObject_kArray;
result_obj.value.as_array.values = result_values;
result_obj.value.as_array.length =
    sizeof(result_values) / sizeof(result_values[0]);
Dart_PostCObject_DL(send_port_, &result_obj);

Dart side

malloc.free(Pointer<Void>.fromAddress(errmsgPtr)); // from cpp side's errmsg_obj.value.as_int64
malloc.free(Pointer<Void>.fromAddress(payloadPtr)); // from cpp side's payload_obj.value.as_int64

@dcharkes
Copy link
Contributor

(Note: this is becoming off topic, feel free to open new issues for things not imediately related to isolate.spawn and dart:ffi.)

RE 1 & 2: Yes. As long as you're not on Windows. On Windows package:ffi's malloc uses WinHeapAlloc and WinHeapFree:

https://github.com/dart-lang/ffi/blob/5e801e857662819d9bdb15df7dbb49d9ed60bb66/lib/src/allocation.dart#L90-L93

malloc and free on Windows cannot be paired from different dlls. WinHeapdoesn't have this problem. So make sure to use the WinHeap in your C code when targeting windows.

It's a little tricky to transfer pointers from cpp side to dart side

This is by design. Your code looks exactly as expected. We want to make Pointers disappear completely at runtime.

@xuchaoqian
Copy link
Author

@dcharkes I've learned a lot. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-ffi
Projects
None yet
Development

No branches or pull requests

6 participants