Node-enqueuing tracing. #628

wks · 2022-07-21T06:04:46Z

This commit implements node-enqueuing tracing. This is mainly for
supporting VMs that does not support edge-enqueuing for some or all
objects.

I made minimum change to the overall structure of the current tracing framework. I reused the ScanObjects work packet as the node-processing work packet, and added logic for processing the edges of objects that do not support edge enqueuing.

Existing VMs which do not use node-enqueuing tracing should still work, and the performance impact should be negligible. I'll do experiments to compare the performance.

wks · 2022-07-21T14:43:16Z

NOTE: At the time I did this experiment, I did not add #[inline(always)] to PlanScanObjects::post_scan_object. This makes it even stranger, because not inlining a function usually results in the program being slower.

I ran lusearch on bobcat.moma. build1 is master, and build2 is this PR. 20 invocations.

From the data, the mean and median STW times are improved for SemiSpace, Immix, GenCopy and GenImmix. Although performance improvement is a good thing, I don't understand why, because this PR does not reduce the amount of work when not using node-enqueuing tracing (OpenJDK never uses node-enqueuing tracing).

I am running a more comprehensive benchmark on bobcat.moma and shrew.moma, with all benchmarks in DaCapo Chopin.

plan	build1 mean	(rel)	build1median	(rel)	build2 mean	(rel)	build2median	(rel)
SemiSpace	10574.702000000001	-	10611.315	-	10131.9795	0.9581338084042461	10079.07	0.9498417491140353
SemiSpace	10574.702000000001	-	10611.315	-	10131.9795	0.9581338084042461	10079.07	0.9498417491140353
Immix	5646.325000000001	-	5662.235	-	5115.693499999999	0.9060217929361131	5140.950000000001	0.9079365303630106
Immix	5646.325000000001	-	5662.235	-	5115.693499999999	0.9060217929361131	5140.950000000001	0.9079365303630106
GenCopy	11175.937000000002	-	11158.595	-	10768.213500000002	0.9635177345756334	10701.505000000001	0.9590369576098068
GenCopy	11175.937000000002	-	11158.595	-	10768.213500000002	0.9635177345756334	10701.505000000001	0.9590369576098068
GenImmix	6058.955	-	5901.15	-	5614.955000000001	0.9267200367059998	5605.49	0.9498979012565347
GenImmix	6058.955	-	5901.15	-	5614.955000000001	0.9267200367059998	5605.49	0.9498979012565347

wks · 2022-07-21T17:14:46Z

I ran a micro benchmark that creates a 22-layer binary tree, and executes System.gc() 500 times for each build+plan pair. All data points are shown in the plot below, together with box plots and violin plots. It seems there is no clear winner. perf does not show any difference for the hottest functions, either.

qinsoon

LGTM. Just some minor comments.

qinsoon · 2022-07-22T00:59:58Z

src/vm/scanning.rs

+}
+
+/// Callback trait of scanning functions that directly trace through edges.
+pub trait ObjectTracer {


Why not just name this as ObjectVisitor with visit_object()? This would be consistent with the EdgeVisitor above.

Good idea. This even makes it possible to reuse this trait in other places where it needs to visit objects.

After a second thought, I think I'll keep it as ObjectTracer. The semantics of this trait includes the logic of tracing. The trace_object method not only visits the object, but also returns the updated reference if the object is moved.

qinsoon · 2022-07-22T01:01:07Z

src/vm/scanning.rs

+ _object: ObjectReference,
+ _object_tracer: &mut OT,
+ ) {
+ unimplemented!();


Suggested change

unimplemented!();

unreachable!("scan_object_and_trace_edges() will not be called when support_edge_enqueue() is always true.")

I'll add this message.

qinsoon · 2022-07-22T01:05:48Z

src/vm/scanning.rs

+ /// - Otherwise, MMTk core will call `scan_object_and_trace_edges` on the object.
+ ///
+ /// For maximum performance, the VM should support edge-enqueuing for as many objects as
+ /// practical.


It is worth mentioning in the comment that this method is called for every object. So a binding should avoid expensive checks and keep it as efficient as possible.

Yes. I'll add a comment about performance.

qinsoon · 2022-07-22T01:14:52Z

src/scheduler/gc_work.rs

- fn create_process_node_roots_work(&mut self, _nodes: Vec<ObjectReference>) {
- todo!()
+ fn create_process_node_roots_work(&mut self, nodes: Vec<ObjectReference>) {
+ // We want to use E::create_scan_work.


You can put an assertion here to check if the plan moves objects. So if a binding tries to use this method in a moving plan, they will get a panic immediately.

mmtk-core/src/plan/plan_constraints.rs

Line 11 in 9a3ebff

pub moves_objects: bool,

I'll add this assertion for now. In the future, when we start to support object pinning, some moving plans, such as Immix, can support node roots, too.

qinsoon · 2022-07-22T01:16:32Z

src/scheduler/gc_work.rs

-/// object work (it won't call `scan_object()` in [`policy::gc_work::PolicytraceObject`]).
-/// It should be used only for policies that do not have policy-specific scan_object().
+/// Trait for a work packet that scans objects
+pub trait AbstractScanObjects<VM: VMBinding>: GCWork<VM> + Sized {


The trait name is clear, but it is inconsistent with the name ProcessEdgesWork. I would suggest renaming it to ScanObjectsWork just to be consistent.

Yes. That makes sense.

This commit implements node-enqueuing tracing. This is mainly for supporting VMs that does not support edge-enqueuing for some or all objects.

wks · 2022-07-26T12:38:59Z

The following are the results from shrew.moma. The command is running runbms -i 40 out full.yml 8 6 on both machines.

In the following plots, build1 is the master and build2 is this PR.

(poltty link for histograms)

STW time for SemiSpace:

STW time for Immix:

STW time for GenCopy:

STW time for GenImmix:

The distribution of STW times for all plans, with outliers removed:

The distribution of STW times for all plans, with all data points.

The distribution of the numbers of GC for all plans (not removing outliers):

From the histograms, we see that the impact on STW time is very small for most benchmarks, except a few benchmarks that have abnormally high values of mean STW time. Those with high mean STW time also have high variances.

The high variance is the results of two factors.

Some individual invocations resulted in abnormally high STW time. One example is avrora running with SemiSpace. Most invocations have STW times between 40ms to 50ms, while one invocation run by build2 has a STW time of 207ms (See in plotty). This can also be observed from the scattered point plot with all data points (the second scattered point plot). The outlier pulled the mean value high. With outlier removed (see the first scattered point plot), the distributions are similar.
Some benchmarks+plan combinations have a very small number of GC. In these cases, the STW time is determined by the number of GCs. One example is Immix-h2. Running the benchmark with this setting will execute 4 or 5 GCs, non-deterministically. If it executes 4 GCs, the STW time will be around 1100ms; if it executes 5 GCs, the STW time will be around 1700ms. (Detailed data can be seen on plotty. Please sort by number the of GC.) From the scattered point plot for STW time for Immix-h2, both build1 and build2 obey a bimodal distribution with two modes.

wks · 2022-07-26T12:55:39Z

FYI, here are the plots for bobcat.moma.

STW time, SemiSpace:

STW time, Immix:

STW time, GenCopy:

STW time, GenImmix:

STW time, all plans, no outliers

STW time, all plans, all data points

Number of GC, all plans, all data points

wks force-pushed the ruby-friendly-tracing-simpler branch from 57716df to 365f49a Compare July 21, 2022 06:51

wks force-pushed the ruby-friendly-tracing-simpler branch from 365f49a to 8f73a78 Compare July 21, 2022 15:54

wks requested a review from qinsoon July 21, 2022 17:41

qinsoon approved these changes Jul 22, 2022

View reviewed changes

qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Jul 22, 2022

Node-enqueuing tracing.

8c22d04

This commit implements node-enqueuing tracing. This is mainly for supporting VMs that does not support edge-enqueuing for some or all objects.

wks force-pushed the ruby-friendly-tracing-simpler branch from 8f73a78 to 8c22d04 Compare July 22, 2022 08:43

wks merged commit 38b99e8 into mmtk:master Jul 26, 2022

wks mentioned this pull request Jul 27, 2022

Add ProcessObjectsWork in addition to ProcessEdgesWork #581

Closed

wks mentioned this pull request Aug 29, 2022

Roadmap to the proper support of finalisation mmtk/mmtk-ruby#2

Closed

8 tasks

wks mentioned this pull request Mar 8, 2023

Decuple reference processor from ProcessEdgesWork #604

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node-enqueuing tracing. #628

Node-enqueuing tracing. #628

wks commented Jul 21, 2022

wks commented Jul 21, 2022 •

edited

Loading

wks commented Jul 21, 2022

qinsoon left a comment

qinsoon Jul 22, 2022

wks Jul 22, 2022

wks Jul 22, 2022

qinsoon Jul 22, 2022

wks Jul 22, 2022

qinsoon Jul 22, 2022

wks Jul 22, 2022

qinsoon Jul 22, 2022

wks Jul 22, 2022

qinsoon Jul 22, 2022

wks Jul 22, 2022

wks commented Jul 26, 2022 •

edited

Loading

wks commented Jul 26, 2022 •

edited

Loading

	unimplemented!();
	unreachable!("scan_object_and_trace_edges() will not be called when support_edge_enqueue() is always true.")

Node-enqueuing tracing. #628

Node-enqueuing tracing. #628

Conversation

wks commented Jul 21, 2022

wks commented Jul 21, 2022 • edited Loading

wks commented Jul 21, 2022

qinsoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wks commented Jul 26, 2022 • edited Loading

wks commented Jul 26, 2022 • edited Loading

wks commented Jul 21, 2022 •

edited

Loading

wks commented Jul 26, 2022 •

edited

Loading

wks commented Jul 26, 2022 •

edited

Loading