-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ProcessObjectsWork in addition to ProcessEdgesWork #581
Comments
Can this be seen as a special case of #573? For an object/node, the load returns the object reference itself, and store is a noop. |
@qinsoon No. I thought it were, but I then realised it is not. The main difference is the timing of load and store. With #573, the representation of an edge can be customised, but the timing of load and store doesn't change. The process of scanning objects and processing edges can be depicted by the following pseudo-code: process_object o1 { // (scan_object)
enqueue o1.e1
enqueue o1.e2
enqueue o1.e3
}
process_object o2 { // (scan_object)
enqueue o2.e1
enqueue o2.e2
}
...
process_edges {
// Does not necessarily process edges in the same order. May be parallelised.
process_edge o1.e1 {
p1 = load o1.e1
q1 = trace_object p1
store o1.e1, q1
}
process_edge o2.e1 { // may process edges out of order
p3 = load o2.e1
q3 = trace_object p3
store o2.e1.q3
}
process_edge o1.e2 {
p2 = load o1.e2
q2 = trace_object p2
store o1.e2, q2
}
...
} As we can see, with #573, even though an edge can now be 32-bit or 64-bit, edges are still enqueued to be processed later. However, the constraint in Ruby is that edges must be traced while the object is scanned. That is, the load/trace_object/store operations must be enclosed in the process_object operation of each object, like the following pseudocode: process_object o1 {
process_edge o1.e1 {
p1 = load o1.e1
run_in_mmtk_core_closure {
q1 = trace_object
}
store o1.e1, q1
}
process_edge o1.e2 {
p2 = load o1.e2
run_in_mmtk_core_closure {
q2 = trace_object
}
store o1.e2, q2
}
process_edge o1.e3 {
p3 = load o1.e3
run_in_mmtk_core_closure {
q3 = trace_object
}
store o1.e3, q3
}
}
process_object o2 {
process_edge o2.e1 {
p1 = load o2.e1
run_in_mmtk_core_closure {
q1 = trace_object
}
store o2.e1, q1
}
process_edge o2.e2 {
p2 = load o2.e2
run_in_mmtk_core_closure {
q2 = trace_object
}
store o2.e2, q2
}
} In the code above, the load and store operations happen during the processing of each object. Actually, only the portion in p.s. The restriction that "trace_object must happen immediately when visiting an edge" makes it hard to perform pre-fetching, and it is why it has inferior performance, according to Robin's paper |
Is it possible to have |
I think an edge is fundamentally different from a node. I (and the current mmtk-core) define an edge as a slot that holds an object reference. A node, on the other hand, is the object itself. But Robin's paper discusses MarkSweep. In MarkSweep, we never move objects, so we never store forwarded objrefs back to the slot. So in his paper, "enqueuing an edge" means "enqueueing a node (object) without loading from the object header to see if it is already marked". I think this is why you think we need a term to refer to both 'edge' and 'node'. But in copying GC, we can't just enqueue object references. We have to enqueue edges, i.e. pointers to slots (fields) themselves, because we will need to store into a slot later if the object the objref in the slot points to is moved.
Indeed. According to Section 7.1 of Robin's paper, the fn edge_enqueuing_tracing(root: Vec<Address>) { // our currrent algorithm
let mut queue = Queue::new(root);
while !queue.empty() {
let edge: Address = queue.pop();
{ // Current process_edge
let object: ObjectReference = unsafe { edge.load::<ObjectReference>() };
let new_object = self.trace_object(object, |object| { // this closure represents ProcessEdgesWork.process_node. The |object| is really just the object passed in.
// Our existing mmtk-core code enqueues node into ScanObjects work packet and calls VMScanning::scan_objects
// We omit those details and look directly into VMScanning::scan_object.
<E::VM as VMBinding>::VMScanning::scan_object(node, |edge: Address| { // This closure represents ObjectsClosure.
queue.push(edge)
});
});
if Self::OVERWRITE_REFERENCE {
unsafe { edge.store(new_object) };
}
}
}
}
fn node_enqueuing_tracing(root: Vec<ObjectReference>) { // our currrent algorithm
let mut queue = Queue::new(root);
while !queue.empty() {
let object: ObjectReference = queue.pop();
VM::VMScanning::scan_object_and_process_edges(|field_value| {
// implicit load happening before the callback:
// field_value = slot.load()
let new_value = plan.trace_object(self, field_value, |field_value| { // this closure represents ProcessEdgesWork.process_node.
queue.push(field_value)
});
new_value
// implicit store will happen after the callback:
// slot.store(new_value)
}
}
} As you can see, they are really similar. Edge-enqueuing has Actually we already have two work packets: |
Node-enqueuing tracing is implemented in #628 |
TL;DR: Edge-enqueuing has better performance, but some VMs, such as Ruby, cannot enqueue edges for some reasons. Such VMs can only process one object at a time. Despite inferior performance, it is necessary to support object-enqueuing in order to support such VMs.
Problem
As a graph traversal algorithm, tracing needs a queue. The queue may either contain objects or edges. Research shows that edge enqueuing offer superior performance because of its opportunity of prefetching.
Currently, the
ProcessEdgesWork
work packet in mmtk-core is a form of edge enqueuing. AProcessEdgesWork
work packet contains a vector of edges. Edges are discovered in root scanning andScanObjects
, andProcessEdgesWork
is created for each batch of edges, which include edges from many different objects. Then aProcessEdgesWork
work packet is processed on one of the GC worker threads.This model doesn't work for some VMs including Ruby.
Ruby
In Ruby, each type has a dedicated procedure for scanning that object. Objects of most built-in types are scanned by the gc_mark_children function. Types in C extensions are scanned by developer-supplied functions. The following shows the basic idea of how Ruby scans an object. Other types, including built-in types, are similar.
During marking, the marking function calls
rb_gc_mark
orrb_gc_mark_movable
to mark fields; during compaction, the compaction function callsrb_gc_location
to get the new location of a relocated object.Note that all of
rb_gc_mark
,rb_gc_mark_movable
andrb_gc_location
take field value rather than field address as the parameter. This means Ruby doesn't have any representation of "edges". Edges must be updated object by object. So the current edge-enqueuing mechanism doesn't work for Ruby.This means mmtk-core needs a way to process one object at a time, namely object-enqueuing.
Proposal
ProcessObjectWork work packet
We need a work packet that processes a list of objects. It should contain at least a list of objects, and a method to process each object.
Scanning::scan_object_and_process_edges
The current
Scanning::scan_object
method is insufficient to support this. It still enumerates edges. I propose another method inScanning
which the VM can implement if it needs object-enqueuing instead of edge-enqueuing.(Issue #573 contains some example code, but even if it is compatible with Rust's lifetime mechanism, it is too indirect.)
With this new function, the Ruby binding can implement it like this:
where
actual_xxxx
are global call-back functions whichxxx
actually calls when using ruby-mmtk. For example,In this way, Ruby can call back to the closure for each edge.
Alternative design
Instead of using a closure,
Scanning::scan_object_and_process_edges
can take a trait object as parameter, instead.Which to enqueue, edge or object?
The VMBinding trait should provide a hint to mmtk-core for whether it should use object enqueuing, edge enqueuing, or both.
VMs like Ruby may initially use object enqueuing, and gradually switch to using both methods for better performance (Ruby VM cannot eliminate object enqueuing because the "mark" and "compress" functions for C extensions are provided by third-party developers).
ProcessObjectsWork replaces the ScanObjects work packet
When supporting both queuing strategies,
ProcessObjectsWork
may replace theScanObjects
work packet, and take up the role of queuing edges to formProcessEdgesWork
. We need two queues. One is an object queue, and the other is an edge queue.When scanning an object, if that object only supports object-enqueuing (such as objects from third-party C extensions), we call
Scanning::scan_object_and_process_edges
and enqueue adjacent objects to the object queue; if that object supports edge-enqueuing (such as built-in objects with well-known layout), we callScanning::scan_object
and enqueue its edges into the edge queue.When flushing, the objects in the object queue turn into another
ScanObjects
work packet, and the edges in the edge queue turn into aProcessEdgesWork
work packet.The text was updated successfully, but these errors were encountered: