Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Isolate.exit] Unpredictable performance. #47508

Open
modulovalue opened this issue Oct 20, 2021 · 26 comments
Open

[Isolate.exit] Unpredictable performance. #47508

modulovalue opened this issue Oct 20, 2021 · 26 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@modulovalue
Copy link
Contributor

This issue is about the efficiency guarantees of the new Isolate.exit feature.

Isolate.exit (via #36097) as a possible solution to #40653 seems to work ~50% of the time.

The following example uses the new Isolate.exit functionality to encode and decode json on a different isolate to not drop any frames (Timer.tick is used as a proxy for frames).

Code
import 'dart:async';
import 'dart:convert';
import 'dart:isolate';
import 'dart:math';

Future<void> main() async {
  for (int i = 0; i < 10; i++) {
    await run();
  }
}

Future<void> run() async {
  final frames = <int>[];
  final timer = Timer.periodic(
    const Duration(milliseconds: frameDurationIsMS),
    (final timer) => frames.add(timer.tick),
  );
  final jsonString = () {
    final rnd = Random();
    final map = <String, dynamic>{
      for (int i = 0; i < 500000; i++)
        i.toString(): [
          rnd.nextInt(1000),
          () {
            final rand = Random();
            final codeUnits = List.generate(
              10,
              (final index) => rand.nextInt(33) + 89,
            );
            return String.fromCharCodes(codeUnits);
          }()
        ],
    };
    return json.encode(map);
  }();
  final map = await runAsyncOperation<dynamic>(
    description: "json.decode",
    operation: () => asyncCompute<String, dynamic>(
      fn: json.decode,
      input: jsonString,
    ),
    getTimerTick: () => timer.tick,
  );
  await Future<void>(() {});
  final encodedMap = await runAsyncOperation<dynamic>(
    description: "json.encode",
    operation: () => asyncCompute<Map<dynamic, dynamic>, String>(
      fn: json.encode,
      input: map.result as Map<dynamic, dynamic>,
    ),
    getTimerTick: () => timer.tick,
  );
  timer.cancel();
  output(
    ops: [
      map,
      encodedMap,
    ],
    ticks: frames,
  );
}

Future<O> asyncCompute<I, O>({
  required final FutureOr<O> Function(I) fn,
  required final I input,
}) async {
  final receivePort = ReceivePort();
  final inbox = StreamIterator<dynamic>(
    receivePort,
  );
  await Isolate.spawn<_IsolateMessage<I, O>>(
    (final message) => Isolate.exit(
      message.sendPort,
      message.fn(
        message.input,
      ),
    ),
    _IsolateMessage(
      sendPort: receivePort.sendPort,
      fn: fn,
      input: input,
    ),
  );
  final movedNext = await inbox.moveNext();
  assert(
    movedNext,
    "Call to moveNext is expected to return true.",
  );
  final typedResult = inbox.current as O;
  receivePort.close();
  await inbox.cancel();
  return typedResult;
}

class _IsolateMessage<I, O> {
  final SendPort sendPort;
  final FutureOr<O> Function(I) fn;
  final I input;

  const _IsolateMessage({
    required final this.sendPort,
    required final this.fn,
    required final this.input,
  });
}

const int frameDurationIsMS = 1000 ~/ 120;

Future<Operation<T>> runAsyncOperation<T>({
  required final String description,
  required final FutureOr<T> Function() operation,
  required final int Function() getTimerTick,
}) async {
  final startTick = getTimerTick();
  final value = await operation();
  final endTick = getTimerTick();
  return Operation<T>(
    description: description,
    frameInfo: FrameInfo(
      startTick: startTick,
      endTick: endTick,
    ),
    result: value,
  );
}

void output({
  required final List<Operation<Object?>> ops,
  required final List<int> ticks,
}) {
  print("== Operation Info ==");
  print(
    "Operation".padRight(30) + "Start Frame".padRight(30) + "End Frame".padRight(30),
  );
  for (final op in ops) {
    print(
      _operationToText(
        operation: op,
      ),
    );
  }
  print("== Frame Info ==");
  print("Skipped Frames".padRight(30) + "Frame Range".padRight(30));
  for (final skippedFrame in _findSkippedFrames(
    frames: ticks,
  )) {
    print(
      ((skippedFrame.endTick - skippedFrame.startTick).toString()).padRight(30) +
          (skippedFrame.startTick.toString() + " to " + skippedFrame.endTick.toString()).padRight(30),
    );
  }
}

String _operationToText<T>({
  required final Operation<T> operation,
}) =>
    operation.description.padRight(30) +
    (operation.frameInfo.startTick.toString()).padRight(30) +
    (operation.frameInfo.endTick.toString()).padRight(30);

class Operation<T> {
  final String description;
  final FrameInfo frameInfo;
  final T result;

  const Operation({
    required final this.description,
    required final this.frameInfo,
    required final this.result,
  });
}

List<FrameInfo> _findSkippedFrames({
  required final List<int> frames,
}) {
  final skippedFrames = <FrameInfo>[];
  frames.fold<int?>(
    null,
    (final previousValue, final element) {
      if (previousValue == null) {
        return element;
      } else if (previousValue + 1 != element) {
        skippedFrames.add(
          FrameInfo(
            startTick: previousValue,
            endTick: element,
          ),
        );
      } else {
        return element;
      }
    },
  );
  return skippedFrames;
}

class FrameInfo {
  final int startTick;
  final int endTick;

  const FrameInfo({
    required final this.startTick,
    required final this.endTick,
  });
}
Output
/usr/local/opt/dart-beta/libexec/bin/dart --version
Dart SDK version: 2.15.0-178.1.beta (beta) (Tue Oct 12 11:11:28 2021 +0200) on "macos_x64"

/usr/local/opt/dart-beta/libexec/bin/dart snippet.dart
snippet.dart: Warning: Interpreting this as package URI, (...)
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             255                           
json.encode                   257                           347                           
== Frame Info ==
Skipped Frames                Frame Range                   
3                             132 to 135                    
4                             149 to 153                    
4                             159 to 163                    
2                             170 to 172                    
5                             175 to 180                    
4                             193 to 197                    
3                             202 to 205                    
2                             255 to 257                    
2                             337 to 339                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             228                           
json.encode                   234                           337                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             128 to 130                    
4                             141 to 145                    
3                             151 to 154                    
3                             166 to 169                    
2                             180 to 182                    
6                             228 to 234                    
2                             325 to 327                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             225                           
json.encode                   225                           311                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             122 to 124                    
2                             131 to 133                    
5                             142 to 147                    
4                             171 to 175                    
3                             180 to 183                    
44                            225 to 269                    
2                             300 to 302                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             225                           
json.encode                   226                           325                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             126 to 128                    
4                             139 to 143                    
3                             149 to 152                    
3                             163 to 166                    
2                             177 to 179                    
58                            226 to 284                    
2                             312 to 314                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             227                           
json.encode                   227                           315                           
== Frame Info ==
Skipped Frames                Frame Range                   
3                             122 to 125                    
2                             132 to 134                    
5                             143 to 148                    
4                             172 to 176                    
2                             181 to 183                    
46                            227 to 273                    
2                             304 to 306                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             224                           
json.encode                   225                           324                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             125 to 127                    
3                             138 to 141                    
4                             147 to 151                    
3                             162 to 165                    
3                             176 to 179                    
59                            225 to 284                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             225                           
json.encode                   226                           313                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             122 to 124                    
2                             131 to 133                    
4                             143 to 147                    
4                             172 to 176                    
2                             181 to 183                    
45                            226 to 271                    
2                             303 to 305                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             226                           
json.encode                   231                           329                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             126 to 128                    
4                             139 to 143                    
4                             149 to 153                    
3                             164 to 167                    
2                             178 to 180                    
5                             226 to 231                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             224                           
json.encode                   224                           311                           
== Frame Info ==
Skipped Frames                Frame Range                   
3                             121 to 124                    
2                             131 to 133                    
5                             142 to 147                    
4                             171 to 175                    
2                             180 to 182                    
45                            224 to 269                    
2                             301 to 303                    
== Operation Info ==
Operation                     Start Frame                   End Frame                     
json.decode                   0                             226                           
json.encode                   231                           330                           
== Frame Info ==
Skipped Frames                Frame Range                   
2                             127 to 129                    
5                             139 to 144                    
3                             150 to 153                    
3                             164 to 167                    
3                             178 to 181                    
5                             226 to 231      

The Frame Info section shows that some runs caused ~40 dropped frames and some didn't. I expected to see no large ranges of dropped frames.

@mkustermann
Copy link
Member

@modulovalue Some colleagues also looked into your example, rewrote it a little and measured, see https://gist.github.com/mit-mit/ffa606e17569cecbd2c47c234ac1be21

I've looked at that particular example already, and come to the following conclusion:

The benchmark itself produces a 14 MB string and then decodes that into 50+ MB in-memory representation of that JSON string. Doing so will create a lot of objects that the garbage collector needs to churn through. Especially the JSON decoding of such a large message is a bit of a worst case scenario in a generational GC - because objects will be created in young generation, survive, need to be copied & promoted to old space.

Our young generation collections are stop-the-world. Usually they are very quick, due to the generational hypothesis holding (i.e. most objects die young). Though in this example it doesn't hold.

BUT: The situation in Flutter is slightly different, because Flutter collects the young generation often before it's actually full (it does so in-between frames if there's idle time). Doing so will reduce those pause times. In the standalone VM this doesn't happen.

There is an additional very long pause in this particular benchmark caused by waiting for a concurrent marking of old-space to finish. This pause time is probably feasible to address (I've filed
#47492 for this, @rmacnak-google has some ideas in this area).

So in summary: Those pause times are due to GC. The work for #36097 removed the 50MB of object allocations on the receiver side (we no longer pay that cost), but all isolates can have pauses due to GC.

@vsmenon vsmenon added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Oct 20, 2021
@bsutton
Copy link

bsutton commented Mar 24, 2023

Not certain I should be adding to this issue as I don't think my issue is GC but does related to unexpected lag with a call to Isolate.exit.

Dart SDK version: 2.19.5 (stable) (Mon Mar 20 17:09:37 2023 +0000) on "linux_x64"

From the documentation:

Further, the receiving isolate will in most cases be able to receive the message in constant time.

I read this as; the time to exit and return data should be consistent regardless of the size of the data being returned.

I've written a test to explore this:

import 'dart:isolate';

void main() async {
  await test(size: 1);
  await test(size: 10);
  await test(size: 20);
  await test(size: 30);
  await test(size: 40);
  await test(size: 50);
  await test(size: 60);
  await test(size: 70);
  await test(size: 80);
  await test(size: 90);
  await test(size: 100);
}

Future<void> test({required int size}) async {
  final result = await Isolate.run(() {
    // allocate size * MB.
    final list = List.filled(1024 * 1024 * size, 0);
    return Tuple(DateTime.now(), list);
  });
  final completed = DateTime.now();
  final interval = completed.difference(result.t1);
  print('Time for Isolate.exit size: $size, ms:  ${interval.inMilliseconds} ');
  print('');
}

class Tuple<T1, T2> {
  Tuple(this.t1, this.t2);

  T1 t1;
  T2 t2;
}

This is a fairly typical run of the benchmark:

Time for Isolate.exit size: 1, ms:  2 

Time for Isolate.exit size: 10, ms:  9 

Time for Isolate.exit size: 20, ms:  17 

Time for Isolate.exit size: 30, ms:  45 

Time for Isolate.exit size: 40, ms:  33 

Time for Isolate.exit size: 50, ms:  73 

Time for Isolate.exit size: 60, ms:  49 

Time for Isolate.exit size: 70, ms:  103 

Time for Isolate.exit size: 80, ms:  66 

Time for Isolate.exit size: 90, ms:  128 

Time for Isolate.exit size: 100, ms:  82 

Whilst we have some lumps, there is a fairly linear relationship between the size of the memory allocated and the time it
takes isolate.exit to return it.
I would expects some bumps (due to gc) but I wouldn't expect the linear relationship we are seeing.

@mkustermann
Copy link
Member

Whilst we have some lumps, there is a fairly linear relationship between the size of the memory allocated and the time it takes isolate.exit to return it.

In general Isolate.exit() is just like SendPort.send() a O(n) operation, i.e. linear in message size. That's because it has to verify that the message being sent doesn't contain objects that are unsendable. Though compared to SendPort.send() it does avoid the copy.

This is vaguely mentioned in the API docs of Isolate.exit() as: The system may be able to send this final message more efficiently than normal port communication between live isolates. In these cases this final message object graph will be reassigned to the receiving isolate without copying.

As part of #46752 there's a bullet point:

  • Explore making send-and-exit O(1) - transferring receive port ownership as well

The issue right now is that certain native objects (e.g. sockets, timers, ...) cannot be sent or moved from isolate to isolate.

@bsutton
Copy link

bsutton commented Mar 24, 2023 via email

@mraleph
Copy link
Member

mraleph commented Mar 24, 2023

@bsutton the O(n) overhead is entirely on the sending side, which does verification inside Isolate.exit call. Documentation does not mention that, but it correctly says that receiving side pays only O(1) per message. Your benchmark is measuring both the cost of Isolate.exit and the cost of receiving the message. So you see both sending and receiving costs which add up to O(n) + O(1) = O(n). If you want to measure just receiving costs you could set a timer which fires, say, every 10ms in the main isolate and then observe that there should be no significant drift between events even when you send large payloads.

That being said - building large payloads likely comes with some GC costs (e.g. due to the cost of evacuation of large number of survivors from the nursery or due to GC inability to make appropriate progress when allocation rate is too high) and GCs will pause all isolates in a group.

@modulovalue
Copy link
Contributor Author

That is disappointing.

@bsutton an alternative solution to this issue is to allocate unmanaged memory via ffi and to pass around a pointer to that memory between isolates (see: #50457).


As part of #46752 there's a bullet point:

  • Explore making send-and-exit O(1) - transferring receive port ownership as well

This sounds very exciting to me. @mkustermann, thank you for the update (and your work on these issues!).

Is the work on making send-and-exit O(1) intended to help with the "skipped frames" issue here? (I've ran the code in the issue description and it continues to report some skipped frames.)

@mkustermann
Copy link
Member

mkustermann commented Mar 24, 2023

Is the work on making send-and-exit O(1) intended to help with the "skipped frames" issue here? (I've ran the code in the issue description and it continues to report some skipped frames.)

The pauses may still be GC related (see my older comment at #47508 (comment)). Though we can investigate it again with newest VM - @aam could you have a look?

@modulovalue
Copy link
Contributor Author

modulovalue commented Mar 24, 2023

It looks like Michael has a more up-to-date benchmark related to this issue here.

@bsutton
Copy link

bsutton commented Mar 24, 2023

@mraleph

So I rewrote my own benchmark.

It spawns the isolate and then starts a loop checking for any gaps in the performance of the loop:

import 'dart:async';
import 'dart:isolate';

void main() async {
  await test(size: 1);
  await test(size: 10);
  await test(size: 20);
  await test(size: 30);
  await test(size: 40);
  await test(size: 50);
  await test(size: 60);
  await test(size: 70);
  await test(size: 80);
  await test(size: 90);
  await test(size: 100);
}

Future<void> test({required int size}) async {
  final completer = Completer<bool>();

  final future = Isolate.run(() {
    // allocate size MB.
    final list = List.filled(1024 * 1024 * size, 0);
    return Tuple(DateTime.now(), list);
  });

  unawaited(future.whenComplete(() {
    completer.complete(true);
  }));

  final frameDrops = <Duration>[];
  final stopwatch = Stopwatch()..start();

  while (!completer.isCompleted) {
    final elapsed = stopwatch.elapsed;
    if (elapsed > const Duration(milliseconds: 10)) {
      frameDrops.add(elapsed);
    }
    stopwatch.reset();
    // give async tasks a chance to run.
    await Future.delayed(const Duration(milliseconds: 1), () => null);
  }

  final completed = DateTime.now();
  final interval = completed.difference((await future).t1);
  print(
      'Time for Isolate.exit size: $size MB, ms:  ${interval.inMilliseconds} ');

  final totalDuration = frameDrops.isNotEmpty
      ? frameDrops.reduce((a, b) => a + b)
      : Duration.zero;

  final averageDuration = frameDrops.isNotEmpty
      ? Duration(
          microseconds: totalDuration.inMicroseconds ~/ frameDrops.length)
      : Duration.zero;

  if (frameDrops.isEmpty) {
    print('All Good');
  } else {
    print(
        'ERROR: frame drops: ${frameDrops.length} average duration: ${averageDuration.inMilliseconds}');
  }
  print('');
}

class Tuple<T1, T2> {
  Tuple(this.t1, this.t2);

  T1 t1;
  T2 t2;
}

Here is a test run:

Time for Isolate.exit size: 1 MB, ms:  4 
All Good

Time for Isolate.exit size: 10 MB, ms:  17 
All Good

Time for Isolate.exit size: 20 MB, ms:  34 
ERROR: frame drops: 1 average duration: 21

Time for Isolate.exit size: 30 MB, ms:  50 
ERROR: frame drops: 1 average duration: 34

Time for Isolate.exit size: 40 MB, ms:  69 
ERROR: frame drops: 1 average duration: 47

Time for Isolate.exit size: 50 MB, ms:  87 
ERROR: frame drops: 1 average duration: 56

Time for Isolate.exit size: 60 MB, ms:  104 
ERROR: frame drops: 1 average duration: 60

Time for Isolate.exit size: 70 MB, ms:  64 
ERROR: frame drops: 1 average duration: 60

Time for Isolate.exit size: 80 MB, ms:  140 
ERROR: frame drops: 1 average duration: 92

Time for Isolate.exit size: 90 MB, ms:  152 
ERROR: frame drops: 1 average duration: 102

Time for Isolate.exit size: 100 MB, ms:  174 
ERROR: frame drops: 2 average duration: 65

This test still shows a linear relationship.
I don't think this is a GC issue.
Each isolate does a single allocation. So the number of objects that need to be GC'd are constant for each test.
My assumption from this is that GC overhead should be constant.
Of course, the timing of when GC kicks in will be somewhat variable and the larger block allocations will be more likely to cause a GC.
So I did a run reversing the order of the sizes. The idea is that if we do the largest allocation first, we should be less likely to see a GC whilst it runs. My assumption (?) is that a single 100MB allocation shouldn't cause a GC to run.

Here are the results of the reversed run:

Time for Isolate.exit size: 101 MB, ms:  173 
ERROR: frame drops: 1 average duration: 111

Time for Isolate.exit size: 91 MB, ms:  153 
ERROR: frame drops: 1 average duration: 91

Time for Isolate.exit size: 81 MB, ms:  139 
ERROR: frame drops: 1 average duration: 90

Time for Isolate.exit size: 71 MB, ms:  125 
ERROR: frame drops: 2 average duration: 46

Time for Isolate.exit size: 61 MB, ms:  106 
ERROR: frame drops: 1 average duration: 69

Time for Isolate.exit size: 51 MB, ms:  84 
ERROR: frame drops: 1 average duration: 57

Time for Isolate.exit size: 41 MB, ms:  69 
ERROR: frame drops: 1 average duration: 46

Time for Isolate.exit size: 31 MB, ms:  53 
ERROR: frame drops: 1 average duration: 36

Time for Isolate.exit size: 21 MB, ms:  37 
ERROR: frame drops: 1 average duration: 24

Time for Isolate.exit size: 11 MB, ms:  16 
ERROR: frame drops: 1 average duration: 14

Time for Isolate.exit size: 1 MB, ms:  1 
All Good

As you can see we still have a linear relationship, which suggests this isn't a GC issue and that (at least some) of the performance cost is worn by the calling isolate.

@mraleph
Copy link
Member

mraleph commented Mar 24, 2023

@bsutton What Dart version are you running? I get no frame drops neither on 2.19.5 nor 3.0.0-336.0.dev even if I go as high as 300 Mb.

@bsutton
Copy link

bsutton commented Mar 24, 2023

@mraleph
Dart SDK version: 2.19.5 (stable) (Mon Mar 20 17:09:37 2023 +0000) on "linux_x64"

@bsutton
Copy link

bsutton commented Mar 24, 2023

@mraleph
I am running it from the vscode debugger but I've just doen a second run from the cli and get the same results:

dart bin/simple.dart

@bsutton
Copy link

bsutton commented Mar 24, 2023

@mraleph
AMD® Ryzen 7 3700x 8-core processor × 16
AMD® Radeon rx 5500 xt
Memory: 32.0 GiB
Ubuntu 22.04.2 LTS (64 bits)

@mraleph
Copy link
Member

mraleph commented Mar 24, 2023

Interesting, I can indeed reproduce this on Linux and it is GC related:

On Mac I get:

[ main         ,  StartCMark(   new space),   28,   2.18,   0.0,   0.0,   0.0,   2.0,   2.0, 0.0, 0.0, 1524.6, 1524.6, 1525.9, 1525.9,   0.0,   0.0,   0,   0,   0.0,    0.0, ]
main: hard_threshold=8796093022208MB, soft_threshold=1005MB, idle_threshold=806MB, reason=gc
[ main         ,   MarkSweep(    finalize),   29,   2.29,   0.2,   0.1,   0.1,   2.0,   2.0, 0.0, 0.0, 1524.6,  804.6, 1525.9, 1525.9,   0.0,   0.0,   0,   1,   0.0, -720.0, ]
Time for Isolate.exit size: 100 MB, ms:  77
All Good 

on Linux I get:

 main         ,  StartCMark(   new space),   28,   7.44,   0.1,   0.0,   0.0,   2.0,   2.0, 0.0, 0.0, 1524.6, 1524.6, 1526.8, 1526.8,   0.0,   0.0,   1,   1,   0.0,    0.0, ]
main: hard_threshold=8796093022208MB, soft_threshold=1005MB, idle_threshold=806MB, reason=gc
[ main         ,   MarkSweep(    finalize),   29,   8.37, 107.1,   0.4,   0.4,   2.0,   2.0, 0.0, 0.0, 1524.6,  804.6, 1526.8, 1526.8,   0.0,   0.0,   0,   1,   0.0, -720.0, ]
Time for Isolate.exit size: 100 MB, ms:  140
ERROR: frame drops: 1 average duration: 109 

@rmacnak-google any ideas? seems like we are hitting MS finalize in some very unfortunate moment which causes large synchronous pause.

@bsutton
Copy link

bsutton commented Mar 24, 2023

Here is a 3.x run:

~/git/onepub_isolates$ dswitch use dev
Switched to dev (3.0.0-263.0.dev)

You may need to restart your terminal session.
~/git/onepub_isolates$ dart bin/simple.dart
Time for Isolate.exit size: 101 MB, ms:  326 
ERROR: frame drops: 1 average duration: 117

Time for Isolate.exit size: 91 MB, ms:  295 
ERROR: frame drops: 2 average duration: 65

Time for Isolate.exit size: 81 MB, ms:  208 
ERROR: frame drops: 1 average duration: 57

Time for Isolate.exit size: 71 MB, ms:  230 
ERROR: frame drops: 2 average duration: 49

Time for Isolate.exit size: 61 MB, ms:  131 
ERROR: frame drops: 2 average duration: 47

Time for Isolate.exit size: 51 MB, ms:  163 
ERROR: frame drops: 1 average duration: 61

Time for Isolate.exit size: 41 MB, ms:  131 
ERROR: frame drops: 1 average duration: 46

Time for Isolate.exit size: 31 MB, ms:  98 
ERROR: frame drops: 1 average duration: 39

Time for Isolate.exit size: 21 MB, ms:  47 
ERROR: frame drops: 1 average duration: 48

Time for Isolate.exit size: 11 MB, ms:  41 
ERROR: frame drops: 1 average duration: 15

Time for Isolate.exit size: 1 MB, ms:  3 
All Good

@bsutton
Copy link

bsutton commented Mar 24, 2023

@mraleph how do you dump out the gc stats?

@aam
Copy link
Contributor

aam commented Mar 25, 2023

how do you dump out the gc stats?

You provide --verbose-gc command line option to dart.

@bsutton
Copy link

bsutton commented Mar 26, 2023

FYI: if I aot compile the benchmark, then no more dropped frames and the raw exit time (time from the last line of the lambda to primary isolate resuming) drops to a consistent 1-2 ms.

I've added warmup logic for calls to isolate.run and the GC to get the jit compiler primed.

Of course, I would assume that the GC is written in C so the concept of warming up the GC is probably moot.

With the jit run, we do see an improvement to the first 100MB call (110ms down to 70ms).
But essentially the rest of the remaining stats are the same.

I can see from the --verbose-gc output that the warm-up causes the gc to run hard.

Jit test:

warming: 0
warming: 100
warming: 200
warmup complete
Time for Isolate.exit size: 101 MB, raw Exit time: 99ms
ERROR: frame drops: 1 average duration: 73

Time for Isolate.exit size: 91 MB, raw Exit time: 151ms
ERROR: frame drops: 1 average duration: 64

Time for Isolate.exit size: 81 MB, raw Exit time: 88ms
ERROR: frame drops: 1 average duration: 67

Time for Isolate.exit size: 71 MB, raw Exit time: 122ms
ERROR: frame drops: 1 average duration: 82

Time for Isolate.exit size: 61 MB, raw Exit time: 106ms
ERROR: frame drops: 1 average duration: 64

Time for Isolate.exit size: 51 MB, raw Exit time: 89ms
ERROR: frame drops: 1 average duration: 59

Time for Isolate.exit size: 41 MB, raw Exit time: 72ms
ERROR: frame drops: 1 average duration: 37

Time for Isolate.exit size: 31 MB, raw Exit time: 55ms
All Good

Time for Isolate.exit size: 21 MB, raw Exit time: 36ms
ERROR: frame drops: 1 average duration: 36

Time for Isolate.exit size: 11 MB, raw Exit time: 16ms
ERROR: frame drops: 1 average duration: 16

Time for Isolate.exit size: 1 MB, raw Exit time: 2ms
All Good

AOT run

Note: for a direct comparison, the aot run included the warm-up logic, however, it didn't need to as it delivered the same results without the warmup (as expected).

warming: 0
warming: 100
warming: 200
warmup complete
Time for Isolate.exit size: 101 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 91 MB, raw Exit time: 1ms
All Good

Time for Isolate.exit size: 81 MB, raw Exit time: 1ms
All Good

Time for Isolate.exit size: 71 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 61 MB, raw Exit time: 2ms
All Good

Time for Isolate.exit size: 51 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 41 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 31 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 21 MB, raw Exit time: 0ms
All Good

Time for Isolate.exit size: 11 MB, raw Exit time: 0ms
ERROR: frame drops: 1 average duration: 14

Time for Isolate.exit size: 1 MB, raw Exit time: 1ms
All Good

Updated benchmark with warmup logic.

import 'dart:async';
import 'dart:isolate';

void main() async {
  await warmup();
  print('warmup complete');

  for (var size = 101; size > 0; size -= 10) {
    await test(size: size);
  }
}

Future<void> test({required int size}) async {
  final completer = Completer<bool>();

  final future = Isolate.run(() {
    // allocate size MB.
    final list = List.filled(1024 * 1024 * size, 0);
    return Tuple(DateTime.now(), list);
  });

  unawaited(future.whenComplete(() {
    completer.complete(true);
  }));

  final frameDrops = <Duration>[];
  final stopwatch = Stopwatch()..start();

  while (!completer.isCompleted) {
    final elapsed = stopwatch.elapsed;
    if (elapsed > const Duration(milliseconds: 10)) {
      frameDrops.add(elapsed);
    }
    stopwatch.reset();
    // give async tasks a chance to run.
    await Future.delayed(const Duration(milliseconds: 1), () => null);
  }

  final completed = DateTime.now();
  final interval = completed.difference((await future).t1);
  print('Time for Isolate.exit size: $size MB, raw Exit time: '
      '${interval.inMilliseconds}ms');

  final totalDuration = frameDrops.isNotEmpty
      ? frameDrops.reduce((a, b) => a + b)
      : Duration.zero;

  final averageDuration = frameDrops.isNotEmpty
      ? Duration(
          microseconds: totalDuration.inMicroseconds ~/ frameDrops.length)
      : Duration.zero;

  if (frameDrops.isEmpty) {
    print('All Good');
  } else {
    print(
        'ERROR: frame drops: ${frameDrops.length} average duration: ${averageDuration.inMilliseconds}');
  }
  print('');
}

Future<void> warmup() async {
  for (var i = 0; i < 300; i++) {
    // 10 MB
    List.filled(1024 * 1024 * 10, 0);
    if (i % 100 == 0) {
      print('warming: $i');
    }
    // hopefully the gc will run
    await Future.delayed(const Duration(milliseconds: 1), () => null);
  }
}

class Tuple<T1, T2> {
  Tuple(this.t1, this.t2);

  T1 t1;
  T2 t2;
}

@modulovalue
Copy link
Contributor Author

I'm wondering, if the observations here are related to GC, then it might make sense to first establish what's expected behavior?

@mraleph wrote:

That being said - building large payloads likely comes with some GC costs (e.g. due to the cost of evacuation of large number of survivors from the nursery or due to GC inability to make appropriate progress when allocation rate is too high) and GCs will pause all isolates in a group.

This sounds to me like it's currently technically impossible (ignoring ffi-related techniques) to e.g. parse arbitrarily large json in a separate isolate B and to pass that back to the main isolate without having the main isolate freeze (because of operations that happened on isolate B)?

So the skipped frames that are being reported by bsuttons benchmark and my benchmark are expected behavior?

If that's true:

  • are there's plans to improve GC-related performance for inter-isolate communication? (does it even make sense to optimize for this?)
  • are there plans to work on removing that fundamental limitation?

Note: This issue was motivated by the following use-case:

  • I needed to parse a large chunk of json
  • doing that on the main isolate caused my Flutter UI to freeze and therefore drop frames.
  • I decided to offload parsing to a separate isolate
  • The main isolates was still dropping frames despite the work happening on a separate isolate.
  • I was hoping to be able to parse arbitrary large strings to json on a separate isolate and have the main isolate never drop any frames. (The skipped frames here were meant to be a proxy metric for the dropped frames that I would observe in Flutter apps)

@mraleph
Copy link
Member

mraleph commented Mar 27, 2023

This sounds to me like it's currently technically impossible (ignoring ffi-related techniques) to e.g. parse arbitrarily large json in a separate isolate B and to pass that back to the main isolate without having the main isolate freeze (because of operations that happened on isolate B)?

That's not what I was trying to say. I simply wanted to make it simpler to interpret the results of the benchmark: if one isolate is not doing anything and it's timer "jitters" then this jitter is most likely introduced by GC in another isolate.

We aim to have most GC pauses to be small - if you see large GC pause (like the one reproduced on this example on Linux), I would interpret these as an issue with GC. I think one common source of larger GC pauses when doing JSON parsing is the cost of evacuating large numbers of survivors from the nursery. I think @rmacnak-google was looking at non-moving promotion before to address this sort of issue, but I am not sure about the current status.

@bsutton
Copy link

bsutton commented Mar 27, 2023

@mkustermann

This is vaguely mentioned in the API docs of Isolate.exit() as: The system may be able to send this final message more efficiently than normal port communication between live isolates. In these cases this final message object graph will be reassigned to the receiving isolate without copying.

Can you provide any clarity on when Isolate.exit() can use the fast return path (no copy required)?
I've not been able to find any documentation that clarifies what qualifies and what doesn't, which makes it a bit hard to design a return message that we know will qualify for the fast return path.

@mkustermann
Copy link
Member

Can you provide any clarity on when Isolate.exit() can use the fast return path (no copy required)?
I've not been able to find any documentation that clarifies what qualifies and what doesn't, which makes it a bit hard to design a return message that we know will qualify for the fast return path.

It will never copy atm. It will do a O(n) verification pass and exit successfully by sending a pointer to another isolate (or it will throw an exception if it encountered an object during the verification pass that cannot be sent). The verification pass has some optimizations in it (e.g. it will never recurse into deeply immutable/constant object graphs - as they can always be shared). (The phrasing of API docs came from #47164 (comment) IIRC)

The current documentation would enable us to allowing Isolate.exit(<port>, <message>) with a <port> that comes from another app (which was spawned via Isolate.spawnUri()) - in which case we'd restrict <message> to be json-like and copy it (only isolates spawned via Isolate.spawn() / Isolate.run() share the same managed heap).

@maks
Copy link

maks commented Jul 3, 2023

if one isolate is not doing anything and it's timer "jitters" then this jitter is most likely introduced by GC in another isolate.

@mraleph For me the light weight Isolates feature has been a huge benefit for Dart server side applications! but what I'm also concerned about is that GC affects all the Isolates in a group. I (vaguely) understand that all the Isolates in a group share a heap, but they are still all isolated afaik, so I'm not clear as to why a long running GC needs to stop all Isolates in a group? ie. an Isolate in group creating alot of garbage can still cause performance issues for other Isolates in the group, so there seems to be no way at the moment to offload memory intensive work from Isolate(s) that are latency sensitive?

@mraleph
Copy link
Member

mraleph commented Jul 3, 2023

@maks isolates in a group only look isolated to the Dart developer, which means that at the Dart language level there is no way to observe that two isolates work on the same heap. They do exist in the same heap (which is what enables things like Isolate.exit and sharing various internal data structures and immutable objects like String, int, double instances, deeply immutable Lists and compile time constants), which means that you can't reliably partition the graph of objects into isolate specific parts (i.e. given an arbitrary object in the heap you can't say which isolate owns it). GC is a heap level operation - which means all isolates that share the heap need to cooperate. We did consider if we want to experiment with implementing isolate local young generation collection, but so far this never bubbled to the top of the list.

We do want our GC pauses to be as short as possible (and if you see large pauses you should report this), but today we don't have any plans for providing any sort of hard or soft real-time time guarantees.

@maks
Copy link

maks commented Jul 3, 2023

Thanks @mraleph for the detailed explanation! That makes things a lot clearer for me, as I hadn't considered that there is no book keeping for mem allocations per Isolate so then the GC has to run over the entire heap and hence applies to all Isolates.

I definitely agree that GC pauses be as short as possible. Infact for my usecase (a backeend system using large numbers of Isolates) its really only the stop-the-world GC that is of concern for the young generation GC and not any slow down from the old gen concurrent one as that would still allow all my Isolates to make some progress, which what I'm most interested in.

I'll try to get some benchmarks done with my current WIP application and report back.

Btw I found this quite detailed doc about GC, would that be the best reference at the moment on how Dart GC currently works?

@mraleph
Copy link
Member

mraleph commented Jul 4, 2023

Btw I found this quite detailed doc about GC, would that be the best reference at the moment on how Dart GC currently works?

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

7 participants