-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NULL and movement check in process_edge #1032
Conversation
This allows the VM binding to report slots that hold tagged non-reference values as `ObjectReference::NULL` so that mmtk-core will not try to trace it.
Update the doc comments of `Edge::load` and `Edge::store` to mention tagged references.
45fe7c9
to
01865db
Compare
@k-sareen I took your advice and added a check after I did some experiment.
I tested with OpenJDK (unmodified). I ran lusearch of DaCapo Chopin on bobcat.moma, 40 invocations, 2.5x min heap size, no PGO. Results (normalized to build1) The effect of this PR (build1 vs build2) is uncertain. It slows down GenCopy, Immix and StickyImmix, but speeds up GenImmix and SemiSpace. The difference in Immix (0.7%) is likely to be a result of more GCs being triggered for some reasons. The differences for other plans are less than 0.3%, and are within the confidence intervals. The effect of checking for object movement after |
|
Oh sorry. I misread the graphs (could I suggest giving them more semantic names instead of "build{1,2,3}"?). Yes okay skipping the object write for unmoved objects is a win which is good. I'll note that the percentage differences are minuscule (the graphs are < 1%) and the error bars are large, so either way this PR does not kill performance. I would recommend running a bigger experiment with all the benchmarks to see the impact across the entire suite. |
This is my favorite test: tracing over a static binary search tree. Every time I ran the test program, it triggered 100 GCs (plus 10 warm-ups) using In the following figure, each sub-figure window is a Plan*Build combination. Each vertical bar is an execution of the test program, and each dot is one GC, and the y-axis is the time for that GC. The violin plot shows the clustering. The colored horizontal bar in the middle is the median, and the black horizontal bar in the middle is the mean. Obviously, SemiSpace manifests a bi-modal distribution because the cost of dispatching the object to copyspace0 and copyspace1 is different. (We discussed the bi-modal distribution before. See #952 (comment)) The testing of And the effect on Immix is obvious. The speed-up is consistent and significant, probably because of the reduced memory traffic. The effect on StickyImmix is a bit mixed. It reduced the variation of GC time, and reduced the maximum GC time. Although the median (colored bar in the middle) becomes higher, the GCs that are below the median are significantly lower. Therefore, the mean (black bar) is not that high. Overall, the effect on StickyImmix is not significant. I think the conclusion is that this PR does not have significant impact on performance. I'll test it with more benchmarks. |
Could you remove the couple of outliers we have for Gen{Copy,Immix}? I'm just interested to see the actual trend and it's currently being dominated by the outliers. Also just to confirm my understanding, each violin plot (1-5) for each subfigure is the same configuration but a different invocation? |
The following plots have the data points with
Yes. The five violins in each subfigure have exactly the same configuration. They are just five different invocations. |
So avoiding the write is a performance win for all GCs except SS which is juuust slightly worse |
Yes. That's expected. The added check itself has overhead. But if it avoids more overhead from the write, it will still be a win overall. SemiSpace should be the worst case because it copies every single object, and it is just slightly worse. Immix profits from this check the most because Immix is deliberately designed to avoid copying. I guess if we enable the "stress GC" options for the ImmixSpace, it will be slower, too, because it will move every single object, too. |
Okay I just ran into a correctness bug because we overwrote a forwarding pointer in an slot. In ART a slot can be the start of an object since the class pointer is stored at the start of the object (as opposed to OpenJDK where the start of the object is the mark word) and in this case the object was already forwarded but the |
This time I ran the three builds (the same binaries as in the previous post) against more benchmarks. 20 invocations, 3x min heap size w.r.t. "vanilla" OpenJDK with G1 (note that the min heap size of some MMTk plans may sometimes be higher than G1's min heap size, so it may be less than 3x min heap size for some benchmarks.) I omitted batik, eclipse, h2, jme and zxing because the min heap sizes of those benchmarks increases when the number of iterations increase, and that could indicate memory leak, possibly due to the lack of (soft/weak/phantom) reference processing. Tradebeans, tradesoap and tomcat were omitted, too, due to unexpected crashes and hangs during execution. I am not sure about the reason, but it may be due to some anomaly in the communication between the benchmark threads and the network server running during the benchmark. From the data, we see that
Some benchmarks, including biojava, graphchi, jython, luindex and lusearch, benefits from the check of object movement when using generational plans. Others are not that sensitive. From the data, I think the check for object movement after |
Edge::load()
returns NULLlet new_object = self.trace_object(object); | ||
if Self::OVERWRITE_REFERENCE { | ||
debug_assert!(!new_object.is_null()); | ||
if Self::OVERWRITE_REFERENCE && new_object != object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need OVERWRITE_REFERENCE
anymore. It's always true
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SanityGCProcessEdges
sets that constant to false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no harm to check it, and we should check it unless we remove OVERWRITE_REFERENCE
. Otherwise if there is a new implementation of ProcessEdgesWork
that sets OVERWRITE_REFERNECE
to false
, we have a bug here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OVERWRITE_REFERENCE
was a hack to implement sanity GC. Sanity GC does not move objects anyway and the check can be obviated now. I'm guessing the compiler is smart enough to figure out that the constant is either always true in the case of real GC work packets and always false for sanity, so there is no performance overhead of the check. So at the end of the day, it's fine to keep it, but I just think it's a bit redundant.
@wks MarkCompact is failing the null object debug check in |
The OpenJDK binding test failed, and the error is in the reference processor. Now |
You need to add the null checks here as well (in the entire file): mmtk-core/src/util/reference_processor.rs Line 225 in 658bce8
|
Yes. And I think this is one more reason why eliminating |
I don't quite understand the relevance of the bug to that issue. With this change, we need to do a null check before we do |
The code: let old_referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(reference);
let new_referent = ReferenceProcessor::get_forwarded_referent(trace, old_referent); The reference processor called
Yes. With #1043 implemented, we will need to do null check before calling For the Ruby binding, this check always exists. Because a Ruby slot can hold a tagged non-reference value, the idiom is if (SPECIAL_CONST_P(*field)) { // Check if it holds "special values".
continue;
}
rb_gc_mark_movable(*field); // This will call `trace_object` underneath
For bindings that do not have NULL references, they will not need to do the checking because there is no NULL. So I would argue that the check of whether a |
The commit 9648aed introduced the method let referent = <E::VM as VMBinding>::VMReferenceGlue::get_referent(*reference);
if !<E::VM as VMBinding>::VMReferenceGlue::is_referent_cleared(referent) { // checks cleared (NULL or special values)
Self::keep_referent_alive(trace, referent); // calls trace_object
} This also reflects the fact that |
My understanding is that Re: #1043, yes I see. It would be a better API if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR does two things: 1. `ProcessEdgesWork::process_edge` will skip slots that are not holding references (i.e. if `Edge::load()` returns `ObjectReference::NULL`). 2. `ProcessEdgesWork::process_edge` will skip the `Edge::store()` if the object is not moved. Doing (1) removes unnecessary invocations of `trace_object()` as well as the subsequent `Edge::store()`. It also allows slots to hold non-reference tagged values. In that case, the VM binding can return `ObjectReference::NULL` in `Edge::load()` so that mmtk-core will simply skip the slot, fixing mmtk#1031 Doing (2) removes unnecessary `Edge::store()` operations in the case where the objects are not moved during `trace_object`. It reduces the STW time in most cases, fixing mmtk#574 Fixes: mmtk#1031 Fixes: mmtk#574
This PR does two things:
ProcessEdgesWork::process_edge
will skip slots that are not holding references (i.e. ifEdge::load()
returnsObjectReference::NULL
).ProcessEdgesWork::process_edge
will skip theEdge::store()
if the object is not moved.Doing (1) removes unnecessary invocations of
trace_object()
as well as the subsequentEdge::store()
. It also allows slots to hold non-reference tagged values. In that case, the VM binding can returnObjectReference::NULL
inEdge::load()
so that mmtk-core will simply skip the slot, fixing #1031Doing (2) removes unnecessary
Edge::store()
operations in the case where the objects are not moved duringtrace_object
. It reduces the STW time in most cases, fixing #574Fixes: #1031
Fixes: #574