Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProcessObjectsWork in addition to ProcessEdgesWork #581

Closed
wks opened this issue Apr 29, 2022 · 5 comments
Closed

Add ProcessObjectsWork in addition to ProcessEdgesWork #581

wks opened this issue Apr 29, 2022 · 5 comments

Comments

@wks
Copy link
Collaborator

wks commented Apr 29, 2022

TL;DR: Edge-enqueuing has better performance, but some VMs, such as Ruby, cannot enqueue edges for some reasons. Such VMs can only process one object at a time. Despite inferior performance, it is necessary to support object-enqueuing in order to support such VMs.

Problem

As a graph traversal algorithm, tracing needs a queue. The queue may either contain objects or edges. Research shows that edge enqueuing offer superior performance because of its opportunity of prefetching.

Currently, the ProcessEdgesWork work packet in mmtk-core is a form of edge enqueuing. A ProcessEdgesWork work packet contains a vector of edges. Edges are discovered in root scanning and ScanObjects, and ProcessEdgesWork is created for each batch of edges, which include edges from many different objects. Then a ProcessEdgesWork work packet is processed on one of the GC worker threads.

This model doesn't work for some VMs including Ruby.

Ruby

In Ruby, each type has a dedicated procedure for scanning that object. Objects of most built-in types are scanned by the gc_mark_children function. Types in C extensions are scanned by developer-supplied functions. The following shows the basic idea of how Ruby scans an object. Other types, including built-in types, are similar.

struct Foo {
    VALUE field1;
    VALUE field2;
    VALUE field3;
    bool some_condition;
};

// Scan, but does not update fields
void foo_mark(struct Foo *obj) {
    rb_gc_mark_movable(obj->field1);  // Does not pin
    rb_gc_mark(obj->field2);  // Pin
    if (obj->some_condition) {  // Some fields can be visited conditionally
        rb_gc_mark_movable(obj->field3);  // Does not pin
    }
}

// Update fields
void foo_compact(struct Foo *obj) {
    obj->field1 = rb_gc_location(obj->field1);  // Updates field
    // obj->field2 is pinned, therefore no need to update.
    if (obj->some_condition) {  // Some fields can be visited conditionally
        obj->field3 = rb_gc_location(obj->field3);  // Updates field
    }
}

During marking, the marking function calls rb_gc_mark or rb_gc_mark_movable to mark fields; during compaction, the compaction function calls rb_gc_location to get the new location of a relocated object.

Note that all of rb_gc_mark, rb_gc_mark_movable and rb_gc_location take field value rather than field address as the parameter. This means Ruby doesn't have any representation of "edges". Edges must be updated object by object. So the current edge-enqueuing mechanism doesn't work for Ruby.

This means mmtk-core needs a way to process one object at a time, namely object-enqueuing.

Proposal

ProcessObjectWork work packet

We need a work packet that processes a list of objects. It should contain at least a list of objects, and a method to process each object.

struct ProcessObjectWork {
    objects: Vec<ObjectReference>, // Objects to be processed
    ...
}
impl GCWork for ProcessObjectWork {
    fn do_work(&mut self, ...) {
        for object in objects {
            self.process_object(object); // Process each object
        }
    }
}
impl ProcessObjectWork {
    fn process_object(&mut self, object: ObjectReference) {
        // We need the VM to call back to us for each reference field,
        // but pass the field value rather than the pointer to the field itself.
        VM::VMScanning::scan_object_and_process_edges(|field_value| {
            // We directly call trace_object, skipping both the load and the store.
            let new_value = plan.trace_object(self, field_value);
            new_value  // return the forwarded address
        }
    }
}
impl TransitiveClosure for ProcessObjectWork {
    // The name is a bit confusing because process_object also exists.
    // This function is called back from trace_object if object is first traced.
    // See: https://github.com/mmtk/mmtk-core/issues/559
    fn process_node(&mut self, object: ObjectReference) {
        self.enqueue(object);
    }

Scanning::scan_object_and_process_edges

The current Scanning::scan_object method is insufficient to support this. It still enumerates edges. I propose another method in Scanning which the VM can implement if it needs object-enqueuing instead of edge-enqueuing.

(Issue #573 contains some example code, but even if it is compatible with Rust's lifetime mechanism, it is too indirect.)

trait Scanning {
    /// Scan `object`, and call `f` with the value of each reference field.
    /// `f` shall return the value that should be stored back to the field.
    fn scan_object_and_process_edges<F>(object: ObjectReference, f: F)
        where f: FnMut(ObjectReference) -> ObjectReference;
}

With this new function, the Ruby binding can implement it like this:

impl Scanning for Ruby {
    fn scan_object_and_process_edges<F>(object: ObjectReference, f: F)
        where f: FnMut(ObjectReference) -> ObjectReference {
        actual_rb_gc_mark_movable = f;
        actual_rb_gc_mark = f;
        actual_rb_gc_location = f;
        gc_mark_children(object);
    }
}

where actual_xxxx are global call-back functions which xxx actually calls when using ruby-mmtk. For example,

VALUE (*actual_rb_gc_location)(VALUE object);
VALUE rb_gc_location(VALUE object) {
#ifdef USE_THIRD_PARTY_HEAP
    return actual_rb_gc_location(object);
#else
    // the original implementation
#endif
}

In this way, Ruby can call back to the closure for each edge.

Alternative design

Instead of using a closure, Scanning::scan_object_and_process_edges can take a trait object as parameter, instead.

// Spiritually similar to EdgeVisitor
trait ObjectVisitor {
    fn visit_object(&mut self, object: ObjectReference) -> ObjectReference;
}

trait Scanning {
    /// Scan `object`, and call `f` with the value of each reference field.
    /// `f` shall return the value that should be stored back to the field.
    fn scan_object_and_process_edges<OV: ObjectVisitor>(object: ObjectReference, object_visitor: &mut OV);
}

Which to enqueue, edge or object?

The VMBinding trait should provide a hint to mmtk-core for whether it should use object enqueuing, edge enqueuing, or both.

VMs like Ruby may initially use object enqueuing, and gradually switch to using both methods for better performance (Ruby VM cannot eliminate object enqueuing because the "mark" and "compress" functions for C extensions are provided by third-party developers).

ProcessObjectsWork replaces the ScanObjects work packet

When supporting both queuing strategies, ProcessObjectsWork may replace the ScanObjects work packet, and take up the role of queuing edges to form ProcessEdgesWork. We need two queues. One is an object queue, and the other is an edge queue.

struct ObjectsClosure {
    buffer: ObjectReference,
}
struct EdgesClosure {  // The current ObjectsClosure is actually this EdgesClosure
    buffer: Address,
}

When scanning an object, if that object only supports object-enqueuing (such as objects from third-party C extensions), we call Scanning::scan_object_and_process_edges and enqueue adjacent objects to the object queue; if that object supports edge-enqueuing (such as built-in objects with well-known layout), we call Scanning::scan_object and enqueue its edges into the edge queue.

impl GCWork for ProcessObjectsWork {
    fn do_work(...) {
        let mut object_closure = ObjectClosure::new();
        let mut edge_closure = EdgeClosure::new();
        for object in self.objects {
            if object_supports_edge_enqueuing(object) {
                // This is what the current `ScanObjects` work packet does.
                VM::VMScanning::scan_object(object, edge_closure);
            } else {
                VM::VMScanning::scan_object_and_process_edges(object, object_closure);
            }
        }
    }
}

When flushing, the objects in the object queue turn into another ScanObjects work packet, and the edges in the edge queue turn into a ProcessEdgesWork work packet.

impl EdgeVisitor for EdgesClosure {
    fn flush(&mut self) {
        // This is what the current `ObjectClosure` does.
        let mut new_edges = Vec::new();
        mem::swap(&mut new_edges, &mut self.buffer);
        self.worker.add_work(
            WorkBucketStage::Closure,
            ProcessEdgesWork::new(new_edges, false, self.worker.mmtk),
        );
    }        
}
impl ObjectVisitor for ObjectsClosure {
    fn visit_object(&mut self, object: ObjectReference) -> ObjectReference {
        let mut new_objects = Vec::new();
        mem::swap(&mut new_objects, &mut self.buffer);
        self.worker.add_work(
            WorkBucketStage::Closure,
            ProcessObjectsWork::new(new_edges, false, self.worker.mmtk),
        );
    }
}
@qinsoon
Copy link
Member

qinsoon commented May 2, 2022

    fn process_object(&mut self, object: ObjectReference) {
        // We need the VM to call back to us for each reference field,
        // but pass the field value rather than the pointer to the field itself.
        VM::VMScanning::scan_object_and_process_edges(|field_value| {
            // We directly call trace_object, skipping both the load and the store.
            let new_value = plan.trace_object(self, field_value);
            new_value  // return the forwarded address
        }
    }

Can this be seen as a special case of #573? For an object/node, the load returns the object reference itself, and store is a noop.

@wks
Copy link
Collaborator Author

wks commented May 2, 2022

@qinsoon No. I thought it were, but I then realised it is not.

The main difference is the timing of load and store.

With #573, the representation of an edge can be customised, but the timing of load and store doesn't change. The process of scanning objects and processing edges can be depicted by the following pseudo-code:

process_object o1 { // (scan_object)
  enqueue o1.e1
  enqueue o1.e2
  enqueue o1.e3
}
process_object o2 { // (scan_object)
  enqueue o2.e1
  enqueue o2.e2
}
...
process_edges {
  // Does not necessarily process edges in the same order.  May be parallelised.
  process_edge o1.e1 {
    p1 = load o1.e1
    q1 = trace_object p1
    store o1.e1, q1
  }
  process_edge o2.e1 {  // may process edges out of order
    p3 = load o2.e1
    q3 = trace_object p3
    store o2.e1.q3
  }
  process_edge o1.e2 {
    p2 = load o1.e2
    q2 = trace_object p2
    store o1.e2, q2
  }
  ...
}

As we can see, with #573, even though an edge can now be 32-bit or 64-bit, edges are still enqueued to be processed later.

However, the constraint in Ruby is that edges must be traced while the object is scanned. That is, the load/trace_object/store operations must be enclosed in the process_object operation of each object, like the following pseudocode:

process_object o1 {
  process_edge o1.e1 {
    p1 = load o1.e1
    run_in_mmtk_core_closure {
      q1 = trace_object
    }
    store o1.e1, q1
  }
  process_edge o1.e2 {
    p2 = load o1.e2
    run_in_mmtk_core_closure {
      q2 = trace_object
    }
    store o1.e2, q2
  }
  process_edge o1.e3 {
    p3 = load o1.e3
    run_in_mmtk_core_closure {
      q3 = trace_object
    }
    store o1.e3, q3
  }
}
process_object o2 {
  process_edge o2.e1 {
    p1 = load o2.e1
    run_in_mmtk_core_closure {
      q1 = trace_object
    }
    store o2.e1, q1
  }
  process_edge o2.e2 {
    p2 = load o2.e2
    run_in_mmtk_core_closure {
      q2 = trace_object
    }
    store o2.e2, q2
  }
}

In the code above, the load and store operations happen during the processing of each object. Actually, only the portion in run_in_mmtk_core_closure block is defined in mmtk-core. The process_object block above is defined by the VM binding in Scanning::scan_object_and_process_edges. In ruby, it is the foo_mark and the foo_compact functions provided by the C extension which we can't change, while the run_in_mmtk_core_closure is the rb_gc_mark, rb_gc_mark_movable and rb_gc_location functions which we can intercept by hacking the Ruby source code.

p.s. The restriction that "trace_object must happen immediately when visiting an edge" makes it hard to perform pre-fetching, and it is why it has inferior performance, according to Robin's paper

@qinsoon
Copy link
Member

qinsoon commented May 3, 2022

Is it possible to have enqueue_edge() vs scan_edge() (we will need a better term to replace 'edge' to refer to both 'edge' and 'node')? I think it still all depends on the similarity between 'process edges' and 'process objects'. I feel they have more similarity than difference.

@wks
Copy link
Collaborator Author

wks commented May 3, 2022

I think an edge is fundamentally different from a node. I (and the current mmtk-core) define an edge as a slot that holds an object reference. A node, on the other hand, is the object itself.

But Robin's paper discusses MarkSweep. In MarkSweep, we never move objects, so we never store forwarded objrefs back to the slot. So in his paper, "enqueuing an edge" means "enqueueing a node (object) without loading from the object header to see if it is already marked". I think this is why you think we need a term to refer to both 'edge' and 'node'.

But in copying GC, we can't just enqueue object references. We have to enqueue edges, i.e. pointers to slots (fields) themselves, because we will need to store into a slot later if the object the objref in the slot points to is moved.

I feel they have more similarity than difference.

Indeed. According to Section 7.1 of Robin's paper, the process_edge method we currently have is just the process_object method above with the trace_object call site hoisted out of the loop. If we write the edge-enqueuing and node-enqueuing algorithms in the non-work-packet form, we get:

fn edge_enqueuing_tracing(root: Vec<Address>) { // our currrent algorithm
  let mut queue = Queue::new(root);
  while !queue.empty() {
    let edge: Address = queue.pop();

    { // Current process_edge
        let object: ObjectReference = unsafe { edge.load::<ObjectReference>() };
        let new_object = self.trace_object(object, |object| { // this closure represents ProcessEdgesWork.process_node. The |object| is really just the object passed in.
             // Our existing mmtk-core code enqueues node into ScanObjects work packet and calls VMScanning::scan_objects
             // We omit those details and look directly into VMScanning::scan_object.
             <E::VM as VMBinding>::VMScanning::scan_object(node, |edge: Address| { // This closure represents ObjectsClosure.
                 queue.push(edge)
             });
        });
        if Self::OVERWRITE_REFERENCE {
            unsafe { edge.store(new_object) };
        }
    }
  }
}

fn node_enqueuing_tracing(root: Vec<ObjectReference>) { // our currrent algorithm
  let mut queue = Queue::new(root);
  while !queue.empty() {
    let object: ObjectReference = queue.pop();

    VM::VMScanning::scan_object_and_process_edges(|field_value| {
        // implicit load happening before the callback:
        // field_value = slot.load()

        let new_value = plan.trace_object(self, field_value, |field_value| { // this closure represents ProcessEdgesWork.process_node.
             queue.push(field_value)
        });
        new_value

        // implicit store will happen after the callback:
        // slot.store(new_value)
    }
  }
}

As you can see, they are really similar. Edge-enqueuing has trace_object outside and scan_object inside, while node-enqueuing has scan_object outside and trace_object inside.

Actually we already have two work packets: ProcessEdgesWork and ScanObjects. A ProcessEdgesWork generates ScanObjects work packsts, and a ScanObjects work packet generates ProcessEdgesWork work packets. It is equivalent to the two-level loops above algorithmically, except that the current ScanObjects postpones the process of edge to another work packet (ProcessEdgesWork), while my proposal require edges to be processed immediately.

@wks
Copy link
Collaborator Author

wks commented Jul 27, 2022

Node-enqueuing tracing is implemented in #628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants