Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generational plans are slower than their non-generational counterparts #594

Open
k-sareen opened this issue May 16, 2022 · 7 comments
Open
Labels
P-high Priority: High. A high-priority issue should be fixed as soon as possible.

Comments

@k-sareen
Copy link
Collaborator

k-sareen commented May 16, 2022

During some performance evaluation, I have noticed that our generational GC plans (GenCopy and GenImmix) are slower than their non-generational counterparts (SemiSpace and Immix). Immix has the best GC time as well as overall benchmark execution time. This is quite alarming as generational garbage collection should improve the STW time instead of increase it.

Results

Table 1: STW time for different MMTk GC plans

-- GenCopy (core_3dbdd7ae-jdk_3cc0d71) GenImmix (core_3dbdd7ae-jdk_3cc0d71) Immix (core_3dbdd7ae-jdk_3cc0d71) MarkCompact (core_3dbdd7ae-jdk_3cc0d71) NoGC (core_3dbdd7ae-jdk_3cc0d71) SemiSpace (core_3dbdd7ae-jdk_3cc0d71)
avrora 4.535 -1.83%, +1.86% 3.118 -8.83%, +8.86% 1.000 -1.79%, +1.82% 3.774 -2.12%, +2.15% 0.000 4.288 -1.99%, +2.02%
cassandra 2.698 -3.38%, +3.49% 1.936 -3.40%, +3.51% 1.000 -3.28%, +3.39% 5.089 -2.38%, +2.49% 0.000 2.934 -2.54%, +2.65%
eclipse 7.899 -0.47%, +0.48% 6.293 -3.46%, +3.47% 1.000 -0.49%, +0.49% 9.403 -0.58%, +0.58% 0.000 8.122 -0.43%, +0.43%
fop 2.710 -2.24%, +2.33% 3.491 -9.58%, +9.67% 1.000 -2.98%, +3.07% 4.497 -2.09%, +2.19% 0.000 2.540 -2.14%, +2.24%
h2o 4.539 -1.34%, +1.36% 2.381 -2.07%, +2.08% 1.000 -1.23%, +1.24% 7.424 -1.16%, +1.18% 0.000 3.991 -1.44%, +1.45%
jython * 1.564 -0.76%, +0.76% 1.000 -0.46%, +0.46% 5.559 -0.34%, +0.34% 0.000 2.921 -0.41%, +0.41%
luindex 7.452 -0.83%, +0.84% * 1.000 -0.73%, +0.73% 5.258 -0.52%, +0.52% 0.000 6.639 -0.72%, +0.73%
lusearch 1.944 -2.94%, +3.12% 1.281 -2.96%, +3.14% 1.000 -4.17%, +4.35% 5.877 -2.93%, +3.12% * 2.269 -3.01%, +3.19%
pmd * * 1.000 -0.81%, +0.82% 21.066 -0.58%, +0.59% 0.000 1.461 -0.64%, +0.65%
sunflow 2.913 -0.87%, +0.89% 2.949 -0.96%, +0.97% 1.000 -1.13%, +1.15% 13.042 -0.82%, +0.83% 0.000 1.933 -0.84%, +0.86%
tomcat 2.104 -7.33%, +8.58% 1.425 -7.33%, +8.59% 1.000 -10.60%, +11.86% 4.735 -7.32%, +8.58% 0.000 2.616 -7.32%, +8.57%
zxing 5.492 -2.66%, +2.74% 4.487 -10.90%, +10.99% 1.000 -2.88%, +2.96% 14.138 -2.08%, +2.17% 0.000 4.616 -2.10%, +2.18%
 
min 2.104 1.425 1.000 3.774 0.000 1.933
max 7.899 6.293 1.000 14.138 0.000 8.122
mean 4.111 3.260 1.000 7.763 0.000 3.880
geomean 3.761 2.964 1.000 6.921 0.000 3.531

Note the geomean (emphasis mine) for the generational plans in comparison to the non-generational plans.

Revisions used

mmtk-core: 3dbdd7ae + feature perf_counter
mmtk-openjdk: 3cc0d71
openjdk: ca90b43f0f5

DaCapo Chopin: f480064

Benchmark results were gathered on an i9-9900K Coffee Lake machine.

@k-sareen
Copy link
Collaborator Author

@k-sareen
Copy link
Collaborator Author

k-sareen commented Apr 4, 2023

See discussions here: https://mmtk.zulipchat.com/#narrow/stream/262673-mmtk-core/topic/Generational.20GC.20performance

Results:
Ryzen 9 5950X http://squirrel.anu.edu.au/plotty/kunals/bench-test/p/S2cc6q
Intel i9 9900K http://squirrel.anu.edu.au/plotty/kunals/bench-test/p/4SQyvC

What we need to do @qinsoon is to increase the min nursery size. That will fix some of the problems. What size is a good question. The meta point essentially is we need to evaluate our nursery sizing heuristics.

There still are some slow downs which we will have to look into, namely:

  1. StickyImmix for eclipse (we can in principle fix the write barrier to have the object);
  2. Why do the Generational GCs with 2 M min nursery size perform waay better for mutator time for lusearch and sunflow. We should be able to get the same mutator time performance for the other GCs as well.
  3. I had previously noticed some variance in the nursery GC time for our generational GCs, but I think increasing the min nursery size fixed that. But we still need to look into why generational GCs are slower than non-generational ones for something like lusearch and most other benchmarks.

@k-sareen
Copy link
Collaborator Author

k-sareen commented Apr 4, 2023

Steve also mentioned that we should change all references to "Appel-style nursery" to "variable-size nursery" in comments (as "variable-style nursery" is more descriptive) which I will do at some point.

@udesou udesou added the P-normal Priority: Normal. label Nov 15, 2023
@qinsoon qinsoon added P-high Priority: High. A high-priority issue should be fixed as soon as possible. and removed P-normal Priority: Normal. labels Jan 29, 2024
@angussidney
Copy link
Contributor

angussidney commented Jan 29, 2024

Re-evaluated recently: http://squirrel.anu.edu.au/plotty/angusa/benchmarks/p/W4UHbJ

	GenCopy	GenImmix	Immix	StickyImmix
geomean	1.539	1.099	        1.000	1.034

github-merge-queue bot pushed a commit that referenced this issue Apr 17, 2024
This PR introduces different kinds of nursery size options, and by
default, we use a proportion of the heap size as the min nursery. This
PR should generally improve the generational plans' performance by
triggering a full heap GC more promptly. This PR mitigates the issue
identified in #594, but does not
fully fix the problem.
@qinsoon
Copy link
Member

qinsoon commented Apr 22, 2024

I ran some benchmarks after #1087. I am using the stock JikesRVM + Java MMTk from upstream, and OpenJDK + Rust MMTk, on dacapo 2006 benchmarks (so they can run on JikesRVM).

Min heap

The min heap values for the two systems are quite different. However, min heap values measured may not be very reliable as JikesRVM crashes frequently. I used 20 attempts.

benchmark JikesRVM Immix (Java MMTk) OpenJDK Immix (Rust MMTk)
antlr 34 5
eclipse 198 25
fop 53 17
hsqldb 125 113
luindex 34 6
lusearch 47 9
pmd 74 31
xalan 73 54

Performance

I measured performance for each system with 2x of their own min heap values, and use a bounded [2M,32M) nursery size (it is the default for JikesRVM). It showed that both generational plans are slower than Immix. The reason seems similar for both systems - the GC time increases a lot for generational plans.

Results for JikesRVM + Java MMTK

jikesrvm-generational
plotty

/home/yilin/Code/jikesrvm/dist/FastAdaptiveImmix_x86_64_m32-linux/rvm -X:gc:ignoreSystemGC=true -Dprobes=MMTk -X:gc:variableSizeHeap=false -X:aos:enable_bulk_compile=true -X:aos:enable_recompilation=false -Xms106M -Xmx106M -cp /usr/share/benchmarks/dacapo/dacapo-2006-10-MR2.jar:/home/yilin/running-ng-configs/probes/probes-java6.jar Harness -c probe.Dacapo2006Callback -n 2 fop

Results for OpenJDK + Rust MMTk

jdk-generational
plotty

MMTK_PLAN=Immix MMTK_NURSERY="Bounded:2097152,33554432" /home/yilin/Code/openjdk/build/jdk-mmtk/images/jdk/bin/java -XX:MetaspaceSize=500M -XX:+DisableExplicitGC -server -XX:-TieredCompilation -Xcomp -XX:+UseThirdPartyHeap -Dprobes=RustMMTk -Djava.library.path=/home/yilin/running-ng-configs/probes -Xms34M -Xmx34M -cp /usr/share/benchmarks/dacapo/dacapo-2006-10-MR2.jar:/home/yilin/running-ng-configs/probes:/home/yilin/running-ng-configs/probes/probes.jar Harness -c probe.Dacapo2006Callback -n 2 fop

@qinsoon
Copy link
Member

qinsoon commented Apr 23, 2024

The original Immix paper showed that on 2x min heap, GenImmix is faster than Immix, but on 3x min heap, GenImmix is actually slower. So I run the above JikesRVM evaluation again with a smaller heap size (1.5x). GenImmix is still slower in the evaluation.

jikesrvm-generational-1 5x
plotty

@qinsoon
Copy link
Member

qinsoon commented Apr 24, 2024

The following is the evaluation on JikesRVM using dacapo bach lusearch with different mutator threads. One of the hypotheses was that the number of mutator threads may affect the time spent in stack scanning which is a fixed cost for every GC (nursery or mature), and that may slow down performance for generational plans when a lot more GCs happen.

Min heap

thread min heap
1 23
2 24
4 26
8 30
16 39
32 52

Performance

Run with 2x min heap for different mutator numbers. There seems no correlation between generational performance and the number of mutator threads.

generational-lusearch-scale
plotty

k-sareen pushed a commit to k-sareen/mmtk-core that referenced this issue Sep 17, 2024
This PR introduces different kinds of nursery size options, and by
default, we use a proportion of the heap size as the min nursery. This
PR should generally improve the generational plans' performance by
triggering a full heap GC more promptly. This PR mitigates the issue
identified in mmtk#594, but does not
fully fix the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-high Priority: High. A high-priority issue should be fixed as soon as possible.
Projects
None yet
Development

No branches or pull requests

4 participants