Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Follow up work once lightweight isolates are enabled by-default. #46752

Open
2 of 14 tasks
mkustermann opened this issue Jul 29, 2021 · 10 comments
Open
2 of 14 tasks

Follow up work once lightweight isolates are enabled by-default. #46752

mkustermann opened this issue Jul 29, 2021 · 10 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate

Comments

@mkustermann
Copy link
Member

mkustermann commented Jul 29, 2021

After the lightweight isolate support is enabled (see #36097) there are a few tasks we might want to do as follow-up work:

Experiment with more data sharing

  • Enable sending more object kinds (tracked Issue 46623)
  • Explore language-level support for constructing immutable objects.
  • Explore runtime based freezing of object graphs
  • Explore runtime based immutability by-construction

Performance

  • Investigate whether there is lock contention, and if so, find ways to avoid it (already done for symbol lock, ...)
  • Investigate whether we would benefit by preventing multiple mutators compiling unopt code for same function at same time
  • Investigate installing code without stopping mutators
  • Investigate wether we an utilize isolate death as a signal for GC
  • Explore making send-and-exit O(1) - transferring receive port ownership as well
  • Add pre-emptive scheduler (parallel mutator count is limited, exceeding it relies on cooperative scheduling)
  • Avoid doing 2 calls to Dart in eventloop (lookupHandler() & invokeMessageHandler())
  • Avoid using OOB messages for hot-reload, instead piggy-back on safepoint level mechanism
  • Optimize isolate startup performance (avoid doing many calls to Dart to setup corelibs, possibly bundle them)

Testing

  • Investigate instrumenting non-atomic loads/stores in generated code with TSAN to detect races

/cc @aam

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate labels Jul 29, 2021
dart-bot pushed a commit that referenced this issue Aug 18, 2021
… call.

This improves performance of SendPort.Receive.Nop benchmark with isolate groups enabled on Intel Xeon by ~17%.
This benchmark emphasizes performance of handle message flow.

Issue #46752

TEST=ci

Change-Id: I3b9be3283047631e8989bb56f90af2b3b007afe8
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/209642
Commit-Queue: Alexander Aprelev <aam@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
@mnordine
Copy link
Contributor

[ ] Explore making send-and-exit O(1) - transferring receive port ownership as well

@mkustermann Wasn't this done?

@mkustermann
Copy link
Member Author

[ ] Explore making send-and-exit O(1) - transferring receive port ownership as well

@mkustermann Wasn't this done?

@mnordine not afaik (/cc @aam).

Currently we do a O(n) verification pass and check that certain objects aren't part of the transitive object graph. To make it O(1) we'd need to think very carefully what transferring such objects to another isolate means. Also in the context of message passing, where there's a time window when the sender isolate exited, but the receiver isolate hasn't read the message yet (e.g. for transferred receive ports, timers etc objects)

@julemand101
Copy link
Contributor

Maybe an additional TODO would be the following found in the ReloadSources() method in service.cc:

sdk/runtime/vm/service.cc

Lines 3765 to 3766 in b6c8bd7

// TODO(dartbug.com/36097): We need to change the "reloadSources" service-api
// call to accept an isolate group instead of an isolate.

The current comment points to #36097 which are closed. Don't know if the intention are still to change the service API.

@maks
Copy link

maks commented Jul 3, 2023

@mkustermann I'm very excited in the possibility of:

Add pre-emptive scheduler (parallel mutator count is limited, exceeding it relies on cooperative scheduling)

is there a separate issue to track that work? (it would also be great to have an issue as a place to ask questions about the current implementation of Isolate scheduling)

@mkustermann
Copy link
Member Author

is there a separate issue to track that work? (it would also be great to have an issue as a place to ask questions about the current implementation of Isolate scheduling)

Could you give us some context on why this is important for you?

We haven't heard many users ask for it, so currently we're focusing on some other things atm. Though there's a bunch of things that is planned for our GC which may allow us to eventually remove this restriction (number of mutators running in parallel) we have atm. (/cc @rmacnak-google )

@maks
Copy link

maks commented Jul 3, 2023

is there a separate issue to track that work? (it would also be great to have an issue as a place to ask questions about the current implementation of Isolate scheduling)

Could you give us some context on why this is important for you?

We haven't heard many users ask for it, so currently we're focusing on some other things atm. Though there's a bunch of things that is planned for our GC which may allow us to eventually remove this restriction (number of mutators running in parallel) we have atm. (/cc @rmacnak-google )

Thanks for the quick reply @mkustermann !

My use case for this is that I'm back working on a "Erlang style" backend system (initially servicing Flutter clients via websockets) initially running workloads developed internally but later on being able to run workloads submitted by customers. For this, having pre-emptive scheduling across (very) large numbers of Isolates would help both with Isolates that could end up being CPU bound either due to developer bugs or misuse by customer submitted jobs (unintentionally or not). I believe having pre-emptive Isolate scheduling like Erlang where all Isolates can keep making some progress at the expense of higher latency across all Isolates would be beneficial for my use-case (my reference for alot of my design thinking for this is this presentation: https://www.youtube.com/watch?v=JvBT4XBdoUE).

Hopefully what I described above is what you had in mind in regards to that item on this issues list of tasks?

In regards to GC, I've also run into the potential issue of heavy mem churn in some Isolates causing higher latency across all Isolates but I've asked about that in a separate issue.

I suspect the reason no one has asked about this before is Dart server-side usage is still in its early stages and the use there is so far is from frameworks that are more styled after NodeJS rather than Erlang/Elixir. But I definitely see growing use of Dart on backend (there was a new commercial Dart server-side Sass announced just few days ago) so I would expect this to become a more important usecase in the near future.

@mkustermann
Copy link
Member Author

Hopefully what I described above is what you had in mind in regards to that item on this issues list of tasks?

Scenarios where the number of ready tasks (i.e. have something to do) is vastly greater than the number of CPU cores available for execution is basically a highly overloaded system and one cannot expect to get good application behavior. What's more realistic is that one has a high number of tasks - but only a small part of them are actually able to execute (i.e. have something to do) at a given point in time.

The reason we currently would need to do a preemptive scheduling in the VM itself is because we cannot have too many isolates running in parallel as that would degrade the performance - due to the way our GC is structured (i.e. currently we cannot take advantage of high number of cores). If we didn't have that problem anymore, we could simply use OS threads and rely on the OS to do scheduling.

@maks
Copy link

maks commented Aug 23, 2023

Hi @mkustermann I'm circling back to this because I didn't yet want to give up on the idea of having large numbers of concurrent Isolates in a manageable way, so I've tried a different tack: to use scripting for workloads running inside each Isolate. Using an interpreter written in Dart, I'm then able to do my own preemptive scheduling simply by counting instruction executions inside the interpreters for(;;) loop and so time slicing that way.

But what I think I need to now do is essentially do the same as you have proposed in #51261 but for Dart instead of FFI calls: I need to be able to have the current thread leave the current Isolate when the "timeslice" runs out for its interpreter, as otherwise I can run into the same kind of max 16 mutator thread deadlocks. I was guessing that doing this would then put each of those threads back (hopefully to the back) of the "ready to run" Isolates list?

I had a quick look but couldn't easily spot where in the VM src is the code that allocates the mutator thread pool to "ready to run" Isolates, so I'm not sure if this is already possible and I'm missing something on how to do this in just Dart? Or if not, would the mechanism you are planning for FFI could be put to use for this scenario too?

@mkustermann
Copy link
Member Author

I need to be able to have the current thread leave the current Isolate when the "timeslice" runs out ... I was guessing that doing this would then put each of those threads back (hopefully to the back) of the "ready to run" Isolates list?

If a Dart isolate calls C which then leaves the isolate via Dart_ExitIsolate(), that C code decides what it does next before re-entering the isolate and continuing.

What our implementation does in Dart_ExitIsolate() implementation is two things

  • if it runs on the VM's thread pool, tells that thread pool that we don't occupy a slot atm and therefore don't count towards the max size of the pool
  • we reduce the number of entered isolates in the isolate group, allowing some other thread to enter another isolate

Both of this happens in Thread::ExitIsolate which calls IsolateGroup::DecreaseMutatorCount()

If it wasn't a nested exit (i.e. Dart calls C which exits) but rather Dart just went back to the event loop, then we indeed re-use the thread for possibly another isolate. Isolates are by-default started on a thread pool the IsolateGroup owns, see Isolate::run(). On that thread pool it will handle the isolate's messages, once there's no longer messages it will return back to the thread pool. The thread pool will drain the next task (which may be another isolate's message handler task) or put the thread into an idle list, see ThreadPool::WorkerLoop

@maks Do I correctly understand: Each interpreted dart program runs in it's own lightweight isolate. Your interpreter loop will let you know when a time slice ran out. At which point you can simply go back to the event loop - then other isolates will run. If you have an interpreter to Dart call that may be blocking but you cannot control what that dart call does - you essentially need the VM support for preemptive scheduling.

@maks
Copy link

maks commented Aug 23, 2023

Thanks for the detailed reply and pointers to the VM code @mkustermann thats very helpful! 🙏🏻

Apologies for not providing more details on the approach I'm trying with the interpreter, I'll try to explain it better below.
What I've done is tried to use the LuaDardo package, a Lua 5.3 (not quite complete) implementation in pure Dart and patched it in my fork to count operations in its own runloop, to allow me to suspend it after a certain number, somewhat the way I've read that some WASM runtimes implement "gas metering". This is actually along the same lines that the canonical C Lua provides with its debug library and I hope to eventually cleanup my hacky code to add that to LuaDardo.

But getting back to the topic, I was hoping to use the above approach to only run Lua instances in each "lua worker" Isolate thereby allowing me to have preemptive scheduling of those Isolates.

@maks Do I correctly understand: Each interpreted dart program runs in it's own lightweight isolate. Your interpreter loop will let you know when a time slice ran out. At which point you can simply go back to the event loop - then other isolates will run.

Yes that's exactly it Martin. I'd like some way, in the Lua runloop's Dart code to go back to the Dart event loop to allow other isolates to run, and then at some point come back to running this isolate from that "yield point" in the Lua runloop (the Dart for(;;) ).

But that seems to require doing something like await Future.delayed(Duration(seconds:0)) and the LuaDardo codebase is currently completely synchronous, so I guess I need to rewrite it to be async since I can't find any way in Dart sync code that I can do "return to runloop and also schedule an event to start executing again from this point"?

What I essentially want to do is the equivalent of dart:io's sleep() but without blocking the currently executing thread and instead have it leave the Isolate with an event placed on the event queue that would resume executing from that point on. Thats why I thought Thread::ExitIsolate might be of help, but now that you've explained how it works I can see its just about accounting of the threadpool and won't help me here.

Really what I want here is what, ironically Lua co-routines exactly provide: a blocking yield (to allow the thread to go off and service other Isolates events).

I have a bad feeling this is along the lines of the debate that was had around the use of waitFor() and my only real option here is to change LuaDardo implementation to be async and then rely on Darts existing ability to "pause" Dart code on await's without blocking the underlying thread, though I've got my fingers crossed that you can tell me I'm wrong here? 🙂

If you have an interpreter to Dart call that may be blocking but you cannot control what that dart call does - you essentially need the VM support for preemptive scheduling.

And yes I very much take your point about if I have code that calls out to Dart that then does a long running CPU bound call, I back to square one! Thats one of the reasons I chose Lua and LuaDardo specifically is that its possible to limit every external call available in the Lua instance as even its std lib import is optional, so I can chose to not even provide a print() to each Lua state instance. so I'm planning to tightly control what Dart code can be called out to so that CPU bound operations only ever happen within Lua code and so are preemptable. That does of course greatly set back performance having to run business logic in a Lua intrp vs AOT compiled, but I'm going for the engineering trade off here of "promote the progrss of the system as a whole at the expense of the efficiency of any single activity" as Sasa Juric puts it. For my project, the benefits of preemption, vastly improved runtime introspection and hot-restart of the business logic while still writing the "infrastructure" in Dart and having the whole system AOT compiled in a single exe outweigh the perf penalty.
And I also treat this as a temp perf setback as I'm looking towards the near future when I could swap out the Lua VM for a WASM+GC runtime and have all the above benefits while having excellent perf with running Dart code for the business logic 🙂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-isolate
Projects
None yet
Development

No branches or pull requests

4 participants