-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Isolate.exit] Unpredictable performance. #47508
Comments
@modulovalue Some colleagues also looked into your example, rewrote it a little and measured, see https://gist.github.com/mit-mit/ffa606e17569cecbd2c47c234ac1be21 I've looked at that particular example already, and come to the following conclusion: The benchmark itself produces a 14 MB string and then decodes that into 50+ MB in-memory representation of that JSON string. Doing so will create a lot of objects that the garbage collector needs to churn through. Especially the JSON decoding of such a large message is a bit of a worst case scenario in a generational GC - because objects will be created in young generation, survive, need to be copied & promoted to old space. Our young generation collections are stop-the-world. Usually they are very quick, due to the generational hypothesis holding (i.e. most objects die young). Though in this example it doesn't hold. BUT: The situation in Flutter is slightly different, because Flutter collects the young generation often before it's actually full (it does so in-between frames if there's idle time). Doing so will reduce those pause times. In the standalone VM this doesn't happen. There is an additional very long pause in this particular benchmark caused by waiting for a concurrent marking of old-space to finish. This pause time is probably feasible to address (I've filed So in summary: Those pause times are due to GC. The work for #36097 removed the 50MB of object allocations on the receiver side (we no longer pay that cost), but all isolates can have pauses due to GC. |
Not certain I should be adding to this issue as I don't think my issue is GC but does related to unexpected lag with a call to Isolate.exit. Dart SDK version: 2.19.5 (stable) (Mon Mar 20 17:09:37 2023 +0000) on "linux_x64" From the documentation:
I read this as; the time to exit and return data should be consistent regardless of the size of the data being returned. I've written a test to explore this: import 'dart:isolate';
void main() async {
await test(size: 1);
await test(size: 10);
await test(size: 20);
await test(size: 30);
await test(size: 40);
await test(size: 50);
await test(size: 60);
await test(size: 70);
await test(size: 80);
await test(size: 90);
await test(size: 100);
}
Future<void> test({required int size}) async {
final result = await Isolate.run(() {
// allocate size * MB.
final list = List.filled(1024 * 1024 * size, 0);
return Tuple(DateTime.now(), list);
});
final completed = DateTime.now();
final interval = completed.difference(result.t1);
print('Time for Isolate.exit size: $size, ms: ${interval.inMilliseconds} ');
print('');
}
class Tuple<T1, T2> {
Tuple(this.t1, this.t2);
T1 t1;
T2 t2;
}
This is a fairly typical run of the benchmark:
Whilst we have some lumps, there is a fairly linear relationship between the size of the memory allocated and the time it |
In general This is vaguely mentioned in the API docs of As part of #46752 there's a bullet point:
The issue right now is that certain native objects (e.g. sockets, timers, ...) cannot be sent or moved from isolate to isolate. |
That is disappointing.
I would suggest the current documentation is misleading in that it
describes isolate.exit as operating in constant time, which clearly it
doesn't. Perhaps we could get this updated.
I was rather excited by isolate.exit but it appears we are still left with
isolates being of very limited utility given any sizable response will
cause frame drops.
Oh but for a thread.
…On Fri, 24 Mar 2023, 8:10 pm Martin Kustermann, ***@***.***> wrote:
Whilst we have some lumps, there is a fairly linear relationship between
the size of the memory allocated and the time it takes isolate.exit to
return it.
In general Isolate.exit() is just like SendPort.send() a O(n) operation,
i.e. linear in message size. That's because it has to verify that the
message being sent doesn't contain objects that are unsendable. Though
compared to SendPort.send() it does avoid the copy.
This is vaguely mentioned in the API docs of Isolate.exit() as: *The
system may be able to send this final message more efficiently than normal
port communication between live isolates. In these cases this final message
object graph will be reassigned to the receiving isolate without copying.*
As part of #46752 <#46752> there's
a bullet point:
- *Explore making send-and-exit O(1) - transferring receive port
ownership as well*
The issue right now is that certain native objects (e.g. sockets, timers,
...) cannot be sent or moved from isolate to isolate.
—
Reply to this email directly, view it on GitHub
<#47508 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG32OB7CENRG55WC4BYSDTW5VQJVANCNFSM5GLEUNIA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@bsutton the O(n) overhead is entirely on the sending side, which does verification inside That being said - building large payloads likely comes with some GC costs (e.g. due to the cost of evacuation of large number of survivors from the nursery or due to GC inability to make appropriate progress when allocation rate is too high) and GCs will pause all isolates in a group. |
@bsutton an alternative solution to this issue is to allocate unmanaged memory via ffi and to pass around a pointer to that memory between isolates (see: #50457).
This sounds very exciting to me. @mkustermann, thank you for the update (and your work on these issues!). Is the work on |
The pauses may still be GC related (see my older comment at #47508 (comment)). Though we can investigate it again with newest VM - @aam could you have a look? |
It looks like Michael has a more up-to-date benchmark related to this issue here. |
So I rewrote my own benchmark. It spawns the isolate and then starts a loop checking for any gaps in the performance of the loop:
Here is a test run:
This test still shows a linear relationship. Here are the results of the reversed run:
As you can see we still have a linear relationship, which suggests this isn't a GC issue and that (at least some) of the performance cost is worn by the calling isolate. |
@bsutton What Dart version are you running? I get no frame drops neither on |
@mraleph |
@mraleph
|
@mraleph |
Interesting, I can indeed reproduce this on Linux and it is GC related: On Mac I get:
on Linux I get:
@rmacnak-google any ideas? seems like we are hitting MS finalize in some very unfortunate moment which causes large synchronous pause. |
Here is a 3.x run:
|
@mraleph how do you dump out the gc stats? |
You provide |
FYI: if I aot compile the benchmark, then no more dropped frames and the raw exit time (time from the last line of the lambda to primary isolate resuming) drops to a consistent 1-2 ms. I've added warmup logic for calls to isolate.run and the GC to get the jit compiler primed. Of course, I would assume that the GC is written in C so the concept of warming up the GC is probably moot. With the jit run, we do see an improvement to the first 100MB call (110ms down to 70ms). I can see from the --verbose-gc output that the warm-up causes the gc to run hard. Jit test: warming: 0
warming: 100
warming: 200
warmup complete
Time for Isolate.exit size: 101 MB, raw Exit time: 99ms
ERROR: frame drops: 1 average duration: 73
Time for Isolate.exit size: 91 MB, raw Exit time: 151ms
ERROR: frame drops: 1 average duration: 64
Time for Isolate.exit size: 81 MB, raw Exit time: 88ms
ERROR: frame drops: 1 average duration: 67
Time for Isolate.exit size: 71 MB, raw Exit time: 122ms
ERROR: frame drops: 1 average duration: 82
Time for Isolate.exit size: 61 MB, raw Exit time: 106ms
ERROR: frame drops: 1 average duration: 64
Time for Isolate.exit size: 51 MB, raw Exit time: 89ms
ERROR: frame drops: 1 average duration: 59
Time for Isolate.exit size: 41 MB, raw Exit time: 72ms
ERROR: frame drops: 1 average duration: 37
Time for Isolate.exit size: 31 MB, raw Exit time: 55ms
All Good
Time for Isolate.exit size: 21 MB, raw Exit time: 36ms
ERROR: frame drops: 1 average duration: 36
Time for Isolate.exit size: 11 MB, raw Exit time: 16ms
ERROR: frame drops: 1 average duration: 16
Time for Isolate.exit size: 1 MB, raw Exit time: 2ms
All Good AOT run Note: for a direct comparison, the aot run included the warm-up logic, however, it didn't need to as it delivered the same results without the warmup (as expected). warming: 0
warming: 100
warming: 200
warmup complete
Time for Isolate.exit size: 101 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 91 MB, raw Exit time: 1ms
All Good
Time for Isolate.exit size: 81 MB, raw Exit time: 1ms
All Good
Time for Isolate.exit size: 71 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 61 MB, raw Exit time: 2ms
All Good
Time for Isolate.exit size: 51 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 41 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 31 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 21 MB, raw Exit time: 0ms
All Good
Time for Isolate.exit size: 11 MB, raw Exit time: 0ms
ERROR: frame drops: 1 average duration: 14
Time for Isolate.exit size: 1 MB, raw Exit time: 1ms
All Good Updated benchmark with warmup logic. import 'dart:async';
import 'dart:isolate';
void main() async {
await warmup();
print('warmup complete');
for (var size = 101; size > 0; size -= 10) {
await test(size: size);
}
}
Future<void> test({required int size}) async {
final completer = Completer<bool>();
final future = Isolate.run(() {
// allocate size MB.
final list = List.filled(1024 * 1024 * size, 0);
return Tuple(DateTime.now(), list);
});
unawaited(future.whenComplete(() {
completer.complete(true);
}));
final frameDrops = <Duration>[];
final stopwatch = Stopwatch()..start();
while (!completer.isCompleted) {
final elapsed = stopwatch.elapsed;
if (elapsed > const Duration(milliseconds: 10)) {
frameDrops.add(elapsed);
}
stopwatch.reset();
// give async tasks a chance to run.
await Future.delayed(const Duration(milliseconds: 1), () => null);
}
final completed = DateTime.now();
final interval = completed.difference((await future).t1);
print('Time for Isolate.exit size: $size MB, raw Exit time: '
'${interval.inMilliseconds}ms');
final totalDuration = frameDrops.isNotEmpty
? frameDrops.reduce((a, b) => a + b)
: Duration.zero;
final averageDuration = frameDrops.isNotEmpty
? Duration(
microseconds: totalDuration.inMicroseconds ~/ frameDrops.length)
: Duration.zero;
if (frameDrops.isEmpty) {
print('All Good');
} else {
print(
'ERROR: frame drops: ${frameDrops.length} average duration: ${averageDuration.inMilliseconds}');
}
print('');
}
Future<void> warmup() async {
for (var i = 0; i < 300; i++) {
// 10 MB
List.filled(1024 * 1024 * 10, 0);
if (i % 100 == 0) {
print('warming: $i');
}
// hopefully the gc will run
await Future.delayed(const Duration(milliseconds: 1), () => null);
}
}
class Tuple<T1, T2> {
Tuple(this.t1, this.t2);
T1 t1;
T2 t2;
} |
I'm wondering, if the observations here are related to GC, then it might make sense to first establish what's expected behavior? @mraleph wrote:
This sounds to me like it's currently technically impossible (ignoring ffi-related techniques) to e.g. parse arbitrarily large json in a separate isolate B and to pass that back to the main isolate without having the main isolate freeze (because of operations that happened on isolate B)? So the skipped frames that are being reported by bsuttons benchmark and my benchmark are expected behavior? If that's true:
Note: This issue was motivated by the following use-case:
|
That's not what I was trying to say. I simply wanted to make it simpler to interpret the results of the benchmark: if one isolate is not doing anything and it's timer "jitters" then this jitter is most likely introduced by GC in another isolate. We aim to have most GC pauses to be small - if you see large GC pause (like the one reproduced on this example on Linux), I would interpret these as an issue with GC. I think one common source of larger GC pauses when doing JSON parsing is the cost of evacuating large numbers of survivors from the nursery. I think @rmacnak-google was looking at non-moving promotion before to address this sort of issue, but I am not sure about the current status. |
Can you provide any clarity on when Isolate.exit() can use the fast return path (no copy required)? |
It will never copy atm. It will do a O(n) verification pass and exit successfully by sending a pointer to another isolate (or it will throw an exception if it encountered an object during the verification pass that cannot be sent). The verification pass has some optimizations in it (e.g. it will never recurse into deeply immutable/constant object graphs - as they can always be shared). (The phrasing of API docs came from #47164 (comment) IIRC) The current documentation would enable us to allowing |
@mraleph For me the light weight Isolates feature has been a huge benefit for Dart server side applications! but what I'm also concerned about is that GC affects all the Isolates in a group. I (vaguely) understand that all the Isolates in a group share a heap, but they are still all isolated afaik, so I'm not clear as to why a long running GC needs to stop all Isolates in a group? ie. an Isolate in group creating alot of garbage can still cause performance issues for other Isolates in the group, so there seems to be no way at the moment to offload memory intensive work from Isolate(s) that are latency sensitive? |
@maks isolates in a group only look isolated to the Dart developer, which means that at the Dart language level there is no way to observe that two isolates work on the same heap. They do exist in the same heap (which is what enables things like We do want our GC pauses to be as short as possible (and if you see large pauses you should report this), but today we don't have any plans for providing any sort of hard or soft real-time time guarantees. |
Thanks @mraleph for the detailed explanation! That makes things a lot clearer for me, as I hadn't considered that there is no book keeping for mem allocations per Isolate so then the GC has to run over the entire heap and hence applies to all Isolates. I definitely agree that GC pauses be as short as possible. Infact for my usecase (a backeend system using large numbers of Isolates) its really only the stop-the-world GC that is of concern for the young generation GC and not any slow down from the old gen concurrent one as that would still allow all my Isolates to make some progress, which what I'm most interested in. I'll try to get some benchmarks done with my current WIP application and report back. Btw I found this quite detailed doc about GC, would that be the best reference at the moment on how Dart GC currently works? |
Yes. |
This issue is about the efficiency guarantees of the new Isolate.exit feature.
Isolate.exit (via #36097) as a possible solution to #40653 seems to work ~50% of the time.
The following example uses the new Isolate.exit functionality to encode and decode json on a different isolate to not drop any frames (Timer.tick is used as a proxy for frames).
Code
Output
The Frame Info section shows that some runs caused ~40 dropped frames and some didn't. I expected to see no large ranges of dropped frames.
The text was updated successfully, but these errors were encountered: