-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation after upgrading to newer Flutter with Isolate Groups enabled by default #47672
Comments
I think we need timelines to see what triggers these GCs. |
/cc @mit-mit |
Here're the timelines from Flutter Dev Tools: |
Thanks for the timeline. There is one obvious problem - it seems that we receive low memory notification from Flutter (I am not entirely sure how to figure out why we would receive one? do we just get it by default when coming from background? @dnfield might now) - and then we tell all individual isolates to do low memory GCs. (see One of the GCs is disproportionally long as well - it spends 3 seconds of CPU time to chew through ~500Mb heap where other GCs chew through it in ~200ms. I think that is related to CPU scheduling - we probably get sent to a slow / down-clocked core. @rmacnak-google Do you feel that maybe we should land https://dart-review.googlesource.com/c/sdk/+/203304 and apply it also to LowMemory GCs? |
JIC, data is not evenly distributed between isolates, with 80+ dictionary files they are being loaded to isolates one-by-one in whatever order FS enums file names, some files can be bigger than others, it is possible that one isolate occasionally chews more than others. |
@maxim-saplin the heap is shared between isolates (the whole point of lightweight isolates was to share the heap) so all of these GCs collect exactly the same heap. |
Suspected that yet decided to mention) After all GC is supposed to sweep the entire isolate group in one pass, yet it is not the case.. |
Low memory notifications are forwarded from the OS. Making them work per heap rather than per isolate seems like a good improvement. |
In my case GC is triggered by mere minimize/resume loop with nothing happening in the app. Even with one GC per isolate group it might be reasonable to optimize further and avoid unnecessary GC wasting CPU cycles and not giving any free bytes. Just thinking out loud. |
@dnfield based on the cursory glance at the source I am starting to suspect that these low memory notifications might be related to (potentially when UI goes to background OS might be also moving process to a LITTLE cores - which would explain the slowness of the GC, though it is a pure speculation on my part). |
TRIM_MEMORY_UI_HIDDEN is forwarded on as a Dart_NotifyLowMemory. This seems like desirable behavior to me though - when we go to background, we should try to shrink our memory usage as much as possible to avoid the OS killing us for using too much memory as a background process. This is also likely a good time to take our time doing a GC because it won't interrupt user experience of the app. However, the assumption here is that a synchronous GC will only take somewhere in the 200ms range (maybe 400ms...?). Obviously if GC is going to take 10s that will be a problem when the user comes back. But the fact that this wasn't a problem before isolate groups were on by default is suggesting to me that if we just GC the heap once on this request we should be in better shape again. Would need tracing to see though. I'm wondering if part of what we're seeing here is scheduling contention where you have a large number of parallel GC threads fighting for core time and not making much progress due to all the scheduling interruptions... It might be easier to see that if we had a full systrace/perfetto trace though. |
If I understand correctly all worker isolates keep large amount of (same?) data in memory: dictionary. I'm not sure whether having same dictionary data loaded into every worker isolate is ideal solution. Perhaps sharding dictionary data by first letter and distributing work across isolates that are responsible for particular shards might have worked better. |
@mraleph I think we should not take marking speed into account during |
…olate group. When this mechanism was first added, each isolate had its own heap that needed to GC on low memory. Now that isolates share a heap with other members of the group, broadcasting to each isolate does redundant work in groups that have more than one isolate. TEST=ci Bug: #47672 Change-Id: I3b71c19a2362f5a3274238e590bbf73e37aa2b4e Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/220064 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Aprelev <aam@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>
Thanks @rmacnak-google for fixing the low-memory signal issue. I've filed #47701 to get this cherry-picked into the beta (soon to be stable) |
I don't fully agree with this reasoning. I think we should be following "When in Rome, do as the Romans do" mantra here. I have reviewed the code in Android that is associated with dispatching and handling trim-memory and low-memory requests, most specifically AppProfiler.updateLowMemStateLSP as well as ActivityThread.handleLowMemory and ActivityThread.handleTrimMemory. Based on the source code I have reviewed I believe that Android is not going to request a GC on This reinforces my belief that our current handling of trim notifications (and potentially low memory notifications) is overly conservative and rigid. I suggest we stop doing synchronous GCs at least for trimming and instead use these messages simply as a signal to drive our concurrent GC. Furthermore, Flutter Engine should probably react to trim-notifications by purging various non-essential caches and releasing native resources, similar to what Android framework does (unless engine already does it). |
Flutter does free caches in the framework and on the engine side on these memory events. I don't have any strong opinion on what kind of GC gets run, but we do need to run a GC when this happens or the OS won't see us actually freeing memory and will be more likely to kill us. So if |
…olate group. When this mechanism was first added, each isolate had its own heap that needed to GC on low memory. Now that isolates share a heap with other members of the group, broadcasting to each isolate does redundant work in groups that have more than one isolate. TEST=ci Bug: #47672 Change-Id: I3b71c19a2362f5a3274238e590bbf73e37aa2b4e Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/220064 Commit-Queue: Ryan Macnak <rmacnak@google.com> Reviewed-by: Alexander Aprelev <aam@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com>
@maxim-saplin we landed the fix to only interrupt one isolate member (#47701), so could you please try out your app with an updated Flutter |
One thing we could try to do is use a Alternatively, we could try to add a level to Dart_NotifyLowMemory that we could set higher when Android tells us it is about to kill us (TRIM_LEVEL_COMPLETE), and lower in other cases. |
It would also help to know if this issue is still reproducing on master. In general, going to background is a good time to free resources and if we can still get away with it we'd like to. |
Just checked Profile build on emulator, all seems good - 1-2 seconds of delay on master compared to ±17 seconds on stable version 2.5.3. Thanks a lot for the quick fix! Here's how master looks on my end:
Also checked Release build on Galaxy Note 10, now there's hardly any visible flicker/stutter when returning to the app. Feels way better. |
Can you by any chance share an updated timeline of this behavior? |
Here we go |
It seems like that timeline doesn't have very much in it... |
Tried once again with more events toggled in DevTools config, though it seems there're no isolate events displayed in the UI anymore - might be smth is broken in Flutter tooling on master. Here's another timeline: dart_devtools_2021_11_19-1637308513596000.json.zip |
That one has a bit more, great. I'm not quite sure what would be up with the tooling though. It looks like the GC is still taking 3.76s of wall time (2.5 of CPU). It's also not actually freeing much memory. I think this overall makes sense - it's a big heap to scan and there isn't much it can free so it takes a little while. Most of that time is taken by parallel marking, and a good chunk of the parallel marking is where we're losing the wall vs. CPU time. I'm still not super inclined to change the current behavior of doing a big GC when we go to background - although it'd help to look at more apps, since I suspect this trace is a bit unusual by in terms of its heap size and the amount of it wants to retain. |
I have read a bit more code and done some experiments and my initial hypothesis that an app going into background also changes how threads of that app are scheduled was indeed correct. Once you switch away from the app it gets put into a special cgroup which limits CPU cores the app can get scheduled on to slower cores (among other things - there is also some scheduler policies changes which I suspect instruct scheduler to allocate less quants to the app) - which means you are basically guaranteed to have much-much longer GC pause if you attempt to perform a full GC while in background. This explains 3.7s of GC. Collecting the same heap in foreground will take ~200-300ms.
To me this strongly indicates that the current approach needs to change - we should use |
What value are we supposed to pass to |
I'm not opposed to making some different request of the VM when we go to background than |
@dnfield to make things cleaner we could add |
An alternative would be to add some kind of level parameter to |
Update on this: Recently 81143e1 landed which removes running long synchronous GC as response to |
A dictionary app with lots of long living objects (millions of strings, hundreds of megabytes in the heap) and multiple isolates (7+) sees 10 second freezes (on Galaxy Note 10) when minimizing the app and returning back to it. Investigating the issues I've identified the following cause:
Isolate.spawn()
, GC on child isolates blocks UIflutter doctor
[✓] Flutter (Channel stable, 2.5.3, on macOS 12.0.1 21A559 darwin-x64, locale en-GB)
• Flutter version 2.5.3 at /Users/maxim_saplin/flutter
• Upstream repository https://github.com/flutter/flutter.git
• Framework revision 18116933e7 (4 weeks ago), 2021-10-15 10:46:35 -0700
• Engine revision d3ea636dc5
• Dart version 2.14.4
[!] Android toolchain - develop for Android devices (Android SDK version 31.0.0)
• Android SDK at /Users/maxim_saplin/Library/Android/sdk
✗ cmdline-tools component is missing
Run
path/to/sdkmanager --install "cmdline-tools;latest"
See https://developer.android.com/studio/command-line for more details.
✗ Android license status unknown.
Run
flutter doctor --android-licenses
to accept the SDK licenses.See https://flutter.dev/docs/get-started/install/macos#android-setup for more details.
[✓] Xcode - develop for iOS and macOS
• Xcode at /Applications/Xcode.app/Contents/Developer
• Xcode 13.1, Build version 13A1030d
• CocoaPods version 1.11.2
[✓] Chrome - develop for the web
• Chrome at /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
[✓] Android Studio (version 2020.3)
• Android Studio at /Applications/Android Studio.app/Contents
• Flutter plugin can be installed from:
🔨 https://plugins.jetbrains.com/plugin/9212-flutter
• Dart plugin can be installed from:
🔨 https://plugins.jetbrains.com/plugin/6351-dart
• Java version OpenJDK Runtime Environment (build 11.0.10+0-b96-7281165)
[✓] VS Code (version 1.62.0)
• VS Code at /Applications/Visual Studio Code.app/Contents
• Flutter extension version 3.28.0
[✓] Connected device (2 available)
• macOS (desktop) • macos • darwin-x64 • macOS 12.0.1 21A559 darwin-x64
• Chrome (web) • chrome • web-javascript • Google Chrome 95.0.4638.69
What is concerning about that issue:
Below are screenshots from Observatory comparing app running 1 and 10 isolates, same data loaded in both cases.
Observatory, home pages, 1 isolate vs 10 isolates
Observatory, allocation profile, 1 isolate vs 10 isolates
Attached file contains 2 screen recordings which show Memory Performance tab of Flutter DevTools. To the left is a piece of the app. When the app is opened and input field is focused keyboard is shown. When minimizing the app and returning back to it UI is frozen, keyboard stays open until UI thread is freed. In the memory profiler (though with some delay) it is clear that 10 isolates generate way more GC events (blue circles).
DevTools, memory, 1 isolates vs 10.zip
App details
The app seeing the performance issues is this one: https://github.com/maxim-saplin/dikt It is a simple 'lookup word -> display article' kind of app, basically a dictionary. With performance being top priority a number of NFR were set (such as loading 10mil words in 90 dictionary files in 2-3 seconds on Galaxy Note 10, conducting a case insensitive lookup within 30ms) a separate package was created (https://github.com/maxim-saplin/ikvpack) which uses custom binary format to store key/values pairs (word/compressed article). The package allows to spawn a number of isolates (isolate pool) which host dictionary handling logic (loading files, searching words) and surfaces API via a sort of proxy/RPC allowing cross-isolate calls. When launching the app N-1 isolates are spawned (N being the number of CPU cores), .ikv files are loaded in parallel and stay in isolates, all lookups happen in isolates with the main thread just consolidating lookup results received from isolates. The general idea is to benefit from multi-core CPUs, off-load all heavy opps to isolates and never allow dictionary operations to slow down UI thread for longer than 10ms.Following up with @aam and @mkustermann as mentioned in issue #46754.
The text was updated successfully, but these errors were encountered: