Generational plans set `next_gc_full_heap` in `end_of_gc` #714

qinsoon · 2022-12-19T04:00:36Z

This PR adds Plan.end_of_gc(). Our generational plans compute whether we should do full heap GC at the end of Plan.release() which is incorrect. There could be other parallel release work when we execute Plan.release(). This PR adds Plan.end_of_gc(), and we compute next_gc_full_heap in end_of_gc().

end_of_gc

k-sareen

I'm just a bit confused as to why this change is necessary. What are the "There could be other parallel release work when we execute Plan.release()"? From the scheduler code, it seems like the reference processing is done in the Release work bucket as well, is that the issue? If so, could you document it somewhere?

src/plan/global.rs

k-sareen · 2022-12-19T05:15:39Z

src/plan/global.rs

@@ -228,10 +228,15 @@ pub trait Plan: 'static + Sync + Downcast {
    /// per-worker preparation. This method is invoked once per worker by the worker thread passed as the argument.
    fn prepare_worker(&self, _worker: &mut GCWorker<Self::VM>) {}

-    /// Release the plan after a GC. This is invoked at the end of a GC when most GC work is finished.
+    /// Release the plan after transitive closure. A plan can implement this method to call each policy's release,
+    /// or create any work packet that should be done in release.
    /// This is invoked once per GC by one worker thread. 'tls' is the worker thread that executes this method.
    fn release(&mut self, tls: VMWorkerThread);


Personally, I find the "release" terminology a bit confusing then. What are the semantics of "release" supposed to be exactly? Most (if not all?) plans just release pages from spaces. Is the issue that the plan.release() is called at the start of the Release work bucket instead of the end? The comment for the Release work packet implies that there should only ever one [1] (and only one is ever made in the scheduler_common_work function). What is the source of the issue you have encountered? Reference processing?

[1]:

mmtk-core/src/scheduler/gc_work.rs

Lines 91 to 97 in fc144cf

/// The global GC release Work

/// This work packet invokes release() for the plan (which will invoke release() for each space), and

/// pushes work packets for releasing mutators and collectors.

/// We should only have one such work packet per GC, after all actual GC work ends.

/// We assume this work packet is the only running work packet that accesses plan, and there should

/// be no other concurrent work packet that accesses plan (read or write). Otherwise, there may

/// be a race condition.

Ah I see -- it seems like MarkSweep and Immix uses the Release work bucket for sweeping [1, 2]. My question is then why not have a separate Sweep work bucket like we have with the Compact work bucket [3]?

[1]: https://github.com/mmtk/mmtk-core/blob/master/src/policy/marksweepspace/malloc_ms/global.rs#L472-L501
[2]: https://github.com/mmtk/mmtk-core/blob/master/src/policy/immix/immixspace.rs#L316-L340
[3]:

mmtk-core/src/scheduler/work_bucket.rs

Line 220 in 43a6755

Compact,

Yeah, right. That is the motivation. The problem is that some plans generate work in the release bucket, and we need a hook to run some stuff after every work packet is done.

What you pointed out reminds me that our current code is actually incorrect. This is a race condition: we run Release work (which calls Plan.release() with a mutable plan) and those sweeping work (which can access Plan).

The comment for the Release work packet above, seems to imply that it should not be executing with anything else and that "all the GC work" should have been finished before the Release work packet is called. If MS and Immix are performing sweeping in Release, then all the GC work hasn't really been finished right? Either we should fix the comment for the Release work packet, or make a new work bucket called Sweep where MS and Immix generate sweep work packets (in the latter case, the issue of "parallel release work" ceases to exist, unless reference processing is still an issue).

I think that "fixing" the comment is not actually a fix. Yes -- there is a race condition. The Sweep work bucket is the more sensible solution (unless you have a better proposal).

In the Prepare and Release work packet, we assume we are the only work that is being executed, thus it is safe for us to get a mutable reference of plan. If we have any other work packet, they can access mmtk in their do_work. And that is racy.

That makes sense.

#[allow(clippy::cast_ref_to_mut)] let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) }; plan_mut.prepare(worker.tls);

I think this is the same issue.

It doesn't look like an issue that can be solved in one or two days. To ensure we have only one thread having access to the plan, we must stop spreading &'static Plan (and &'static MMTk, too) everywhere, and have precise control of who holds reference to Plan (and MMTk as well). A key to the solution will be the ability to "revoke" all such references during STW. In fact, this is what STW is really meant to do, i.e. stop mutators from accessing the heap, as a form of synchronisation.

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

Yes I agree. Though with regard to this PR (or more specifically, the issue this PR is trying to address), I think the most pragmatic approach is to add a new Sweep or Collect work bucket and make it so that MS and Immix generate their sweeping work packets into this bucket. In that case we don't really need this PR as all the GC work will have been done before the should_next_gc_full_heap() is called.

It will also be helpful to exactly clarify what the semantics of the "Release" work bucket is, i.e. should we send the list of processed references to the binding in the "Release" work bucket or the "Sweep"/"Collect" work bucket, or even the "EndOfGC" work packet.

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

It technically accesses the mmtk instance (right now at least -- unsure of your proposed changes):

https://github.com/mmtk/mmtk-core/blob/master/src/util/reference_processor.rs#L556

It will also be helpful to exactly clarify what the semantics of the "Release" work bucket is, i.e. should we send the list of processed references to the binding in the "Release" work bucket or the "Sweep"/"Collect" work bucket, or even the "EndOfGC" work packet.

I have made an attempt to do this in my branch (see below), but not as precise as you described.
https://github.com/wks/mmtk-core/blob/165b3fab07cc788ad88876effdeefe567697aaf2/src/scheduler/work_bucket.rs#L258

The good thing for RefEnqueue is that it never needs to access the MMTk instance. It can be a pure VM-side operation.

It technically accesses the mmtk instance (right now at least -- unsure of your proposed changes):

https://github.com/mmtk/mmtk-core/blob/master/src/util/reference_processor.rs#L556

It's accessing the mmtk.reference_processor field. Currently it is a field of the MMTK struct. In my branch, it is part of the VM binding instance.

k-sareen · 2022-12-19T05:17:19Z

src/scheduler/gc_work.rs

@@ -223,6 +223,11 @@ impl<VM: VMBinding> GCWork<VM> for EndOfGC {
    fn do_work(&mut self, worker: &mut GCWorker<VM>, mmtk: &'static MMTK<VM>) {
        info!("End of GC");

+        // We assume this is the only running work packet that accesses plan at the point of execution
+        #[allow(clippy::cast_ref_to_mut)]
+        let plan_mut: &mut dyn Plan<VM = VM> = unsafe { &mut *(&*mmtk.plan as *const _ as *mut _) };


I see this everywhere -- maybe we should just change the mmtk variable to mut MMTK or something in the work packets. Or am I missing something which means we can't do that?

This is the same pattern as our prepare/release code. This is unfortunate.

mmtk-core/src/scheduler/gc_work.rs

Lines 44 to 46 in 2300ce1

#[allow(clippy::cast_ref_to_mut)]

let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) };

plan_mut.prepare(worker.tls);

mmtk-core/src/scheduler/gc_work.rs

Lines 113 to 115 in 2300ce1

#[allow(clippy::cast_ref_to_mut)]

let plan_mut: &mut C::PlanType = unsafe { &mut *(self.plan as *const _ as *mut _) };

plan_mut.release(worker.tls);

We can fix it, but not as simple as making it mut mmtk. Basically mmtk: &'static MMTK<VM> is a non mutable reference, but we want to get a mutable reference from it -- we do so under the assumption that we are the only thread that is accessing mmtk. As long as the assumption holds, the code is correct.

Basically what we want is something like RefCell but without overhead on borrowing. I don't think there is a safe way to do it.

Why is it not something like an Arc? Multiple work packets should be executing at the same time, all of them accessing the mmtk variable. Overhead of reference counting? I guess if 99% of the times it's only read like most work packets, and this is the hot loop, ugly hacks like this may be required.

Technically a RWLock could help, but I feel like adding a lock in the hottest part of our code may not be ideal.

Arc can help manage the lifetime of plan/mmtk. But it comes at the cost of a fat pointer (unknown cost) and reference counting (which should be minimal cost). But it does not solve the issue of mutability.

There are many things that we need to tidy up.

Ah right sorry -- you need a RefCell or Mutex/RWLock to have mutability. Sorry was thinking of Arc<Mutex<T>> which is a probably a bad idea.

I guess we can flag this issue for the future. Could you add a comment here (and maybe the other places as well) saying that this is only safe under the assumption we're the only executing work packet.

k-sareen · 2022-12-19T05:39:24Z

I think it mainly looks good to me -- now that I understand what the issue is (MarkSweep and Immix sweeping in the Release work bucket). But it would be good to document it, if possible. Thanks!

wks

LGTM. Just make sure the spelling issues @k-sareen pointed out are solved.

k-sareen · 2022-12-19T09:22:06Z

I still think we should discuss the semantics of "Release" as a group before we merge this PR

k-sareen · 2022-12-21T02:14:40Z

Just noticed the commit message says next_gc_full_size instead of next_gc_full_heap -- maybe force push an amended commit when you update some of the documentation as mentioned above ^

qinsoon · 2022-12-21T21:06:17Z

Just noticed the commit message says next_gc_full_size instead of next_gc_full_heap -- maybe force push an amended commit when you update some of the documentation as mentioned above ^

I will just do a squash merge, and that commit message will not be seen in the commit history.

This PR adds `Plan.end_of_gc()`. Our generational plans compute whether we should do full heap GC at the end of `Plan.release()` which is incorrect. There could be other parallel release work when we execute `Plan.release()`. This PR adds `Plan.end_of_gc()`, and we compute `next_gc_full_heap` in `end_of_gc()`.

Add Plan.end_of_gc. Generational plans set next_heap_full_size in

8431b7a

end_of_gc

qinsoon requested review from k-sareen and wks December 19, 2022 04:38

qinsoon marked this pull request as ready for review December 19, 2022 04:38

k-sareen reviewed Dec 19, 2022

View reviewed changes

wks approved these changes Dec 19, 2022

View reviewed changes

k-sareen changed the title ~~Generational plans set next_gc_full_size in end_of_gc~~ Generational plans set next_gc_full_heap in end_of_gc Dec 21, 2022

k-sareen mentioned this pull request Dec 21, 2022

Clarify contract/semantics of "Release" and other work buckets #717

Open

Fix typo

c34c6b4

qinsoon added 2 commits December 22, 2022 10:06

Merge branch 'master' into plan-end-of-gc

beea0e3

Merge branch 'master' into plan-end-of-gc

7bf6177

qinsoon merged commit 06237f1 into mmtk:master Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generational plans set `next_gc_full_heap` in `end_of_gc` #714

Generational plans set `next_gc_full_heap` in `end_of_gc` #714

qinsoon commented Dec 19, 2022

k-sareen left a comment

k-sareen Dec 19, 2022

k-sareen Dec 19, 2022 •

edited

Loading

qinsoon Dec 19, 2022

k-sareen Dec 19, 2022

k-sareen Dec 19, 2022

wks Dec 19, 2022 •

edited

Loading

k-sareen Dec 19, 2022 •

edited

Loading

k-sareen Dec 19, 2022

wks Dec 19, 2022

wks Dec 19, 2022

k-sareen Dec 19, 2022 •

edited

Loading

qinsoon Dec 19, 2022

k-sareen Dec 19, 2022

k-sareen Dec 19, 2022

qinsoon Dec 19, 2022

k-sareen Dec 19, 2022 •

edited

Loading

k-sareen commented Dec 19, 2022

wks left a comment

k-sareen commented Dec 19, 2022

k-sareen commented Dec 21, 2022

qinsoon commented Dec 21, 2022

	/// The global GC release Work
	/// This work packet invokes release() for the plan (which will invoke release() for each space), and
	/// pushes work packets for releasing mutators and collectors.
	/// We should only have one such work packet per GC, after all actual GC work ends.
	/// We assume this work packet is the only running work packet that accesses plan, and there should
	/// be no other concurrent work packet that accesses plan (read or write). Otherwise, there may
	/// be a race condition.

	#[allow(clippy::cast_ref_to_mut)]
	let plan_mut: &mut C::PlanType = unsafe { &mut (self.plan as const _ as *mut _) };
	plan_mut.prepare(worker.tls);

	#[allow(clippy::cast_ref_to_mut)]
	let plan_mut: &mut C::PlanType = unsafe { &mut (self.plan as const _ as *mut _) };
	plan_mut.release(worker.tls);

Generational plans set next_gc_full_heap in end_of_gc #714

Generational plans set next_gc_full_heap in end_of_gc #714

Conversation

qinsoon commented Dec 19, 2022

k-sareen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wks Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

k-sareen Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k-sareen Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

k-sareen commented Dec 19, 2022

wks left a comment

Choose a reason for hiding this comment

k-sareen commented Dec 19, 2022

k-sareen commented Dec 21, 2022

qinsoon commented Dec 21, 2022

Generational plans set `next_gc_full_heap` in `end_of_gc` #714

Generational plans set `next_gc_full_heap` in `end_of_gc` #714

k-sareen Dec 19, 2022 •

edited

Loading

wks Dec 19, 2022 •

edited

Loading

k-sareen Dec 19, 2022 •

edited

Loading

k-sareen Dec 19, 2022 •

edited

Loading

k-sareen Dec 19, 2022 •

edited

Loading