Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generational plans set next_gc_full_heap in end_of_gc #714

Merged
merged 4 commits into from
Jan 2, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 2 additions & 5 deletions src/plan/generational/copying/global.rs
Original file line number Diff line number Diff line change
Expand Up @@ -123,12 +123,9 @@ impl<VM: VMBinding> Plan for GenCopy<VM> {
if full_heap {
self.fromspace().release();
}
}

// TODO: Refactor so that we set the next_gc_full_heap in gen.release(). Currently have to fight with Rust borrow checker
// NOTE: We have to take care that the `Gen::should_next_gc_be_full_heap()` function is
// called _after_ all spaces have been released (including ones in `gen`) as otherwise we
// may get incorrect results since the function uses values such as available pages that
// will change dependant on which spaces have been released
fn end_of_gc(&mut self, _tls: VMWorkerThread) {
self.gen
.set_next_gc_full_heap(Gen::should_next_gc_be_full_heap(self));
}
Expand Down
7 changes: 2 additions & 5 deletions src/plan/generational/immix/global.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,9 @@ impl<VM: VMBinding> Plan for GenImmix<VM> {
}
self.last_gc_was_full_heap
.store(full_heap, Ordering::Relaxed);
}

// TODO: Refactor so that we set the next_gc_full_heap in gen.release(). Currently have to fight with Rust borrow checker
// NOTE: We have to take care that the `Gen::should_next_gc_be_full_heap()` function is
// called _after_ all spaces have been released (including ones in `gen`) as otherwise we
// may get incorrect results since the function uses values such as available pages that
// will change dependant on which spaces have been released
fn end_of_gc(&mut self, _tls: VMWorkerThread) {
self.gen
.set_next_gc_full_heap(Gen::should_next_gc_be_full_heap(self));
}
Expand Down
11 changes: 8 additions & 3 deletions src/plan/global.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,17 +221,22 @@ pub trait Plan: 'static + Sync + Downcast {
}

/// Prepare the plan before a GC. This is invoked in an initial step in the GC.
/// This is invoked once per GC by one worker thread. 'tls' is the worker thread that executes this method.
/// This is invoked once per GC by one worker thread. `tls` is the worker thread that executes this method.
fn prepare(&mut self, tls: VMWorkerThread);

/// Prepare a worker for a GC. Each worker has its own prepare method. This hook is for plan-specific
/// per-worker preparation. This method is invoked once per worker by the worker thread passed as the argument.
fn prepare_worker(&self, _worker: &mut GCWorker<Self::VM>) {}

/// Release the plan after a GC. This is invoked at the end of a GC when most GC work is finished.
/// This is invoked once per GC by one worker thread. 'tls' is the worker thread that executes this method.
/// Release the plan after transitive closure. A plan can implement this method to call each policy's release,
/// or create any work packet that should be done in release.
/// This is invoked once per GC by one worker thread. `tls` is the worker thread that executes this method.
fn release(&mut self, tls: VMWorkerThread);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find the "release" terminology a bit confusing then. What are the semantics of "release" supposed to be exactly? Most (if not all?) plans just release pages from spaces. Is the issue that the plan.release() is called at the start of the Release work bucket instead of the end? The comment for the Release work packet implies that there should only ever one [1] (and only one is ever made in the scheduler_common_work function). What is the source of the issue you have encountered? Reference processing?

[1]:

/// The global GC release Work
/// This work packet invokes release() for the plan (which will invoke release() for each space), and
/// pushes work packets for releasing mutators and collectors.
/// We should only have one such work packet per GC, after all actual GC work ends.
/// We assume this work packet is the only running work packet that accesses plan, and there should
/// be no other concurrent work packet that accesses plan (read or write). Otherwise, there may
/// be a race condition.

Copy link
Collaborator

@k-sareen k-sareen Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see -- it seems like MarkSweep and Immix uses the Release work bucket for sweeping [1, 2]. My question is then why not have a separate Sweep work bucket like we have with the Compact work bucket [3]?

[1]: https://github.com/mmtk/mmtk-core/blob/master/src/policy/marksweepspace/malloc_ms/global.rs#L472-L501
[2]: https://github.com/mmtk/mmtk-core/blob/master/src/policy/immix/immixspace.rs#L316-L340
[3]:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right. That is the motivation. The problem is that some plans generate work in the release bucket, and we need a hook to run some stuff after every work packet is done.

What you pointed out reminds me that our current code is actually incorrect. This is a race condition: we run Release work (which calls Plan.release() with a mutable plan) and those sweeping work (which can access Plan).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for the Release work packet above, seems to imply that it should not be executing with anything else and that "all the GC work" should have been finished before the Release work packet is called. If MS and Immix are performing sweeping in Release, then all the GC work hasn't really been finished right? Either we should fix the comment for the Release work packet, or make a new work bucket called Sweep where MS and Immix generate sweep work packets (in the latter case, the issue of "parallel release work" ceases to exist, unless reference processing is still an issue).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that "fixing" the comment is not actually a fix. Yes -- there is a race condition. The Sweep work bucket is the more sensible solution (unless you have a better proposal).

Copy link
Collaborator

@wks wks Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Prepare and Release work packet, we assume we are the only work that is being executed, thus it is safe for us to get a mutable reference of plan. If we have any other work packet, they can access mmtk in their do_work. And that is racy.

That makes sense.

 #[allow(clippy::cast_ref_to_mut)] 
 let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) }; 
 plan_mut.prepare(worker.tls); 

I think this is the same issue.

It doesn't look like an issue that can be solved in one or two days. To ensure we have only one thread having access to the plan, we must stop spreading &'static Plan (and &'static MMTk, too) everywhere, and have precise control of who holds reference to Plan (and MMTk as well). A key to the solution will be the ability to "revoke" all such references during STW. In fact, this is what STW is really meant to do, i.e. stop mutators from accessing the heap, as a form of synchronisation.

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

Copy link
Collaborator

@k-sareen k-sareen Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. Though with regard to this PR (or more specifically, the issue this PR is trying to address), I think the most pragmatic approach is to add a new Sweep or Collect work bucket and make it so that MS and Immix generate their sweeping work packets into this bucket. In that case we don't really need this PR as all the GC work will have been done before the should_next_gc_full_heap() is called.

It will also be helpful to exactly clarify what the semantics of the "Release" work bucket is, i.e. should we send the list of processed references to the binding in the "Release" work bucket or the "Sweep"/"Collect" work bucket, or even the "EndOfGC" work packet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

It technically accesses the mmtk instance (right now at least -- unsure of your proposed changes):

https://github.com/mmtk/mmtk-core/blob/master/src/util/reference_processor.rs#L556

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will also be helpful to exactly clarify what the semantics of the "Release" work bucket is, i.e. should we send the list of processed references to the binding in the "Release" work bucket or the "Sweep"/"Collect" work bucket, or even the "EndOfGC" work packet.

I have made an attempt to do this in my branch (see below), but not as precise as you described.
https://github.com/wks/mmtk-core/blob/165b3fab07cc788ad88876effdeefe567697aaf2/src/scheduler/work_bucket.rs#L258

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

It technically accesses the mmtk instance (right now at least -- unsure of your proposed changes):

https://github.com/mmtk/mmtk-core/blob/master/src/util/reference_processor.rs#L556

It's accessing the mmtk.reference_processor field. Currently it is a field of the MMTK struct. In my branch, it is part of the VM binding instance.


/// Inform the plan about the end of a GC. It is guaranteed that there is no further work for this GC.
/// This is invoked once per GC by one worker thread. `tls` is the worker thread that executes this method.
fn end_of_gc(&mut self, _tls: VMWorkerThread) {}

/// Ask the plan if they would trigger a GC. If MMTk is in charge of triggering GCs, this method is called
/// periodically during allocation. However, MMTk may delegate the GC triggering decision to the runtime,
/// in which case, this method may not be called. This method returns true to trigger a collection.
Expand Down
5 changes: 5 additions & 0 deletions src/scheduler/gc_work.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,11 @@ impl<VM: VMBinding> GCWork<VM> for EndOfGC {
fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
info!("End of GC");

// We assume this is the only running work packet that accesses plan at the point of execution
#[allow(clippy::cast_ref_to_mut)]
let plan_mut: &mut dyn Plan<VM = VM> = unsafe { &mut *(&*mmtk.plan as *const _ as *mut _) };
Copy link
Collaborator

@k-sareen k-sareen Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this everywhere -- maybe we should just change the mmtk variable to mut MMTK or something in the work packets. Or am I missing something which means we can't do that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same pattern as our prepare/release code. This is unfortunate.

#[allow(clippy::cast_ref_to_mut)]
let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) };
plan_mut.prepare(worker.tls);

#[allow(clippy::cast_ref_to_mut)]
let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) };
plan_mut.release(worker.tls);

We can fix it, but not as simple as making it mut mmtk. Basically mmtk: &'static MMTK<VM> is a non mutable reference, but we want to get a mutable reference from it -- we do so under the assumption that we are the only thread that is accessing mmtk. As long as the assumption holds, the code is correct.

Basically what we want is something like RefCell but without overhead on borrowing. I don't think there is a safe way to do it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not something like an Arc? Multiple work packets should be executing at the same time, all of them accessing the mmtk variable. Overhead of reference counting? I guess if 99% of the times it's only read like most work packets, and this is the hot loop, ugly hacks like this may be required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically a RWLock could help, but I feel like adding a lock in the hottest part of our code may not be ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arc can help manage the lifetime of plan/mmtk. But it comes at the cost of a fat pointer (unknown cost) and reference counting (which should be minimal cost). But it does not solve the issue of mutability.

There are many things that we need to tidy up.

Copy link
Collaborator

@k-sareen k-sareen Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right sorry -- you need a RefCell or Mutex/RWLock to have mutability. Sorry was thinking of Arc<Mutex<T>> which is a probably a bad idea.

I guess we can flag this issue for the future. Could you add a comment here (and maybe the other places as well) saying that this is only safe under the assumption we're the only executing work packet.

plan_mut.end_of_gc(worker.tls);

#[cfg(feature = "extreme_assertions")]
if crate::util::edge_logger::should_check_duplicate_edges(&*mmtk.plan) {
// reset the logging info at the end of each GC
Expand Down