Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node-enqueuing tracing. #628

Merged
merged 1 commit into from
Jul 26, 2022
Merged

Conversation

wks
Copy link
Collaborator

@wks wks commented Jul 21, 2022

This commit implements node-enqueuing tracing. This is mainly for
supporting VMs that does not support edge-enqueuing for some or all
objects.

I made minimum change to the overall structure of the current tracing framework. I reused the ScanObjects work packet as the node-processing work packet, and added logic for processing the edges of objects that do not support edge enqueuing.

Existing VMs which do not use node-enqueuing tracing should still work, and the performance impact should be negligible. I'll do experiments to compare the performance.

@wks wks force-pushed the ruby-friendly-tracing-simpler branch from 57716df to 365f49a Compare July 21, 2022 06:51
@wks
Copy link
Collaborator Author

wks commented Jul 21, 2022

NOTE: At the time I did this experiment, I did not add #[inline(always)] to PlanScanObjects::post_scan_object. This makes it even stranger, because not inlining a function usually results in the program being slower.

I ran lusearch on bobcat.moma. build1 is master, and build2 is this PR. 20 invocations.

From the data, the mean and median STW times are improved for SemiSpace, Immix, GenCopy and GenImmix. Although performance improvement is a good thing, I don't understand why, because this PR does not reduce the amount of work when not using node-enqueuing tracing (OpenJDK never uses node-enqueuing tracing).

I am running a more comprehensive benchmark on bobcat.moma and shrew.moma, with all benchmarks in DaCapo Chopin.

plan build1 mean (rel) build1median (rel) build2 mean (rel) build2median (rel)
SemiSpace 10574.702000000001 - 10611.315 - 10131.9795 0.9581338084042461 10079.07 0.9498417491140353
SemiSpace 10574.702000000001 - 10611.315 - 10131.9795 0.9581338084042461 10079.07 0.9498417491140353
Immix 5646.325000000001 - 5662.235 - 5115.693499999999 0.9060217929361131 5140.950000000001 0.9079365303630106
Immix 5646.325000000001 - 5662.235 - 5115.693499999999 0.9060217929361131 5140.950000000001 0.9079365303630106
GenCopy 11175.937000000002 - 11158.595 - 10768.213500000002 0.9635177345756334 10701.505000000001 0.9590369576098068
GenCopy 11175.937000000002 - 11158.595 - 10768.213500000002 0.9635177345756334 10701.505000000001 0.9590369576098068
GenImmix 6058.955 - 5901.15 - 5614.955000000001 0.9267200367059998 5605.49 0.9498979012565347
GenImmix 6058.955 - 5901.15 - 5614.955000000001 0.9267200367059998 5605.49 0.9498979012565347

image

@wks wks force-pushed the ruby-friendly-tracing-simpler branch from 365f49a to 8f73a78 Compare July 21, 2022 15:54
@wks
Copy link
Collaborator Author

wks commented Jul 21, 2022

I ran a micro benchmark that creates a 22-layer binary tree, and executes System.gc() 500 times for each build+plan pair. All data points are shown in the plot below, together with box plots and violin plots. It seems there is no clear winner. perf does not show any difference for the hottest functions, either.

image

@wks wks requested a review from qinsoon July 21, 2022 17:41
Copy link
Member

@qinsoon qinsoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just some minor comments.

}

/// Callback trait of scanning functions that directly trace through edges.
pub trait ObjectTracer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just name this as ObjectVisitor with visit_object()? This would be consistent with the EdgeVisitor above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. This even makes it possible to reuse this trait in other places where it needs to visit objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, I think I'll keep it as ObjectTracer. The semantics of this trait includes the logic of tracing. The trace_object method not only visits the object, but also returns the updated reference if the object is moved.

_object: ObjectReference,
_object_tracer: &mut OT,
) {
unimplemented!();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unimplemented!();
unreachable!("scan_object_and_trace_edges() will not be called when support_edge_enqueue() is always true.")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this message.

/// - Otherwise, MMTk core will call `scan_object_and_trace_edges` on the object.
///
/// For maximum performance, the VM should support edge-enqueuing for as many objects as
/// practical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth mentioning in the comment that this method is called for every object. So a binding should avoid expensive checks and keep it as efficient as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll add a comment about performance.

fn create_process_node_roots_work(&mut self, _nodes: Vec<ObjectReference>) {
todo!()
fn create_process_node_roots_work(&mut self, nodes: Vec<ObjectReference>) {
// We want to use E::create_scan_work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put an assertion here to check if the plan moves objects. So if a binding tries to use this method in a moving plan, they will get a panic immediately.

pub moves_objects: bool,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this assertion for now. In the future, when we start to support object pinning, some moving plans, such as Immix, can support node roots, too.

/// object work (it won't call `scan_object()` in [`policy::gc_work::PolicytraceObject`]).
/// It should be used only for policies that do not have policy-specific scan_object().
/// Trait for a work packet that scans objects
pub trait AbstractScanObjects<VM: VMBinding>: GCWork<VM> + Sized {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait name is clear, but it is inconsistent with the name ProcessEdgesWork. I would suggest renaming it to ScanObjectsWork just to be consistent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That makes sense.

@qinsoon qinsoon added the PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead) label Jul 22, 2022
This commit implements node-enqueuing tracing.  This is mainly for
supporting VMs that does not support edge-enqueuing for some or all
objects.
@wks wks force-pushed the ruby-friendly-tracing-simpler branch from 8f73a78 to 8c22d04 Compare July 22, 2022 08:43
@wks
Copy link
Collaborator Author

wks commented Jul 26, 2022

The following are the results from shrew.moma. The command is running runbms -i 40 out full.yml 8 6 on both machines.

In the following plots, build1 is the master and build2 is this PR.

(poltty link for histograms)

STW time for SemiSpace:
2022-07-26-shrew-SemiSpace

STW time for Immix:
2022-07-26-shrew-Immix

STW time for GenCopy:
2022-07-26-shrew-GenCopy

STW time for GenImmix:
2022-07-26-shrew-GenImmix

The distribution of STW times for all plans, with outliers removed:
shrew-2022-07-22-Fri-033236-time stw-NoOutliers

The distribution of STW times for all plans, with all data points.

shrew-2022-07-22-Fri-033236-time stw

The distribution of the numbers of GC for all plans (not removing outliers):
shrew-2022-07-22-Fri-033236-GC

From the histograms, we see that the impact on STW time is very small for most benchmarks, except a few benchmarks that have abnormally high values of mean STW time. Those with high mean STW time also have high variances.

The high variance is the results of two factors.

  1. Some individual invocations resulted in abnormally high STW time. One example is avrora running with SemiSpace. Most invocations have STW times between 40ms to 50ms, while one invocation run by build2 has a STW time of 207ms (See in plotty). This can also be observed from the scattered point plot with all data points (the second scattered point plot). The outlier pulled the mean value high. With outlier removed (see the first scattered point plot), the distributions are similar.
  2. Some benchmarks+plan combinations have a very small number of GC. In these cases, the STW time is determined by the number of GCs. One example is Immix-h2. Running the benchmark with this setting will execute 4 or 5 GCs, non-deterministically. If it executes 4 GCs, the STW time will be around 1100ms; if it executes 5 GCs, the STW time will be around 1700ms. (Detailed data can be seen on plotty. Please sort by number the of GC.) From the scattered point plot for STW time for Immix-h2, both build1 and build2 obey a bimodal distribution with two modes.

@wks
Copy link
Collaborator Author

wks commented Jul 26, 2022

FYI, here are the plots for bobcat.moma.

STW time, SemiSpace:
2022-07-26-bobcat-SemiSpace

STW time, Immix:
2022-07-26-bobcat-Immix

STW time, GenCopy:
2022-07-26-bobcat-GenCopy

STW time, GenImmix:
2022-07-26-bobcat-GenImmix

STW time, all plans, no outliers
bobcat-2022-07-22-Fri-033228-time stw-NoOutliers

STW time, all plans, all data points
bobcat-2022-07-22-Fri-033228-time stw

Number of GC, all plans, all data points
bobcat-2022-07-22-Fri-033228-GC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-testing Run binding tests for the pull request (deprecated: use PR-extended-testing instead)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants