-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generational plans are slower than their non-generational counterparts #594
Comments
See discussions here: https://mmtk.zulipchat.com/#narrow/stream/262673-mmtk-core/topic/Generational.20GC.20performance Results: What we need to do @qinsoon is to increase the min nursery size. That will fix some of the problems. What size is a good question. The meta point essentially is we need to evaluate our nursery sizing heuristics. There still are some slow downs which we will have to look into, namely:
|
Steve also mentioned that we should change all references to "Appel-style nursery" to "variable-size nursery" in comments (as "variable-style nursery" is more descriptive) which I will do at some point. |
Re-evaluated recently: http://squirrel.anu.edu.au/plotty/angusa/benchmarks/p/W4UHbJ
|
This PR introduces different kinds of nursery size options, and by default, we use a proportion of the heap size as the min nursery. This PR should generally improve the generational plans' performance by triggering a full heap GC more promptly. This PR mitigates the issue identified in #594, but does not fully fix the problem.
I ran some benchmarks after #1087. I am using the stock JikesRVM + Java MMTk from upstream, and OpenJDK + Rust MMTk, on dacapo 2006 benchmarks (so they can run on JikesRVM). Min heapThe min heap values for the two systems are quite different. However, min heap values measured may not be very reliable as JikesRVM crashes frequently. I used 20 attempts.
PerformanceI measured performance for each system with 2x of their own min heap values, and use a bounded [2M,32M) nursery size (it is the default for JikesRVM). It showed that both generational plans are slower than Immix. The reason seems similar for both systems - the GC time increases a lot for generational plans. Results for JikesRVM + Java MMTK/home/yilin/Code/jikesrvm/dist/FastAdaptiveImmix_x86_64_m32-linux/rvm -X:gc:ignoreSystemGC=true -Dprobes=MMTk -X:gc:variableSizeHeap=false -X:aos:enable_bulk_compile=true -X:aos:enable_recompilation=false -Xms106M -Xmx106M -cp /usr/share/benchmarks/dacapo/dacapo-2006-10-MR2.jar:/home/yilin/running-ng-configs/probes/probes-java6.jar Harness -c probe.Dacapo2006Callback -n 2 fop Results for OpenJDK + Rust MMTkMMTK_PLAN=Immix MMTK_NURSERY="Bounded:2097152,33554432" /home/yilin/Code/openjdk/build/jdk-mmtk/images/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/yilin/running-ng-configs/probes -Xms34M -Xmx34M -cp /usr/share/benchmarks/dacapo/dacapo-2006-10-MR2.jar:/home/yilin/running-ng-configs/probes:/home/yilin/running-ng-configs/probes/probes.jar Harness -c probe.Dacapo2006Callback -n 2 fop |
The original Immix paper showed that on 2x min heap, GenImmix is faster than Immix, but on 3x min heap, GenImmix is actually slower. So I run the above JikesRVM evaluation again with a smaller heap size (1.5x). GenImmix is still slower in the evaluation. |
The following is the evaluation on JikesRVM using dacapo bach lusearch with different mutator threads. One of the hypotheses was that the number of mutator threads may affect the time spent in stack scanning which is a fixed cost for every GC (nursery or mature), and that may slow down performance for generational plans when a lot more GCs happen. Min heap
PerformanceRun with 2x min heap for different mutator numbers. There seems no correlation between generational performance and the number of mutator threads. |
This PR introduces different kinds of nursery size options, and by default, we use a proportion of the heap size as the min nursery. This PR should generally improve the generational plans' performance by triggering a full heap GC more promptly. This PR mitigates the issue identified in mmtk#594, but does not fully fix the problem.
During some performance evaluation, I have noticed that our generational GC plans (
GenCopy
andGenImmix
) are slower than their non-generational counterparts (SemiSpace
andImmix
).Immix
has the best GC time as well as overall benchmark execution time. This is quite alarming as generational garbage collection should improve the STW time instead of increase it.Results
Table 1: STW time for different MMTk GC plans
Note the geomean (emphasis mine) for the generational plans in comparison to the non-generational plans.
Revisions used
mmtk-core:
3dbdd7ae
+ featureperf_counter
mmtk-openjdk:
3cc0d71
openjdk:
ca90b43f0f5
DaCapo Chopin:
f480064
Benchmark results were gathered on an i9-9900K Coffee Lake machine.
The text was updated successfully, but these errors were encountered: