Remove the coordinator (controller) thread #1053

wks · 2023-12-20T07:21:07Z

It was created because OpenJDK requires the thread that stops the world to be the same thread that starts the world. But now the OpenJDK binding uses a dedicated "companion" thread for that purpose.

The coordinator thread has brought much trouble to the synchronization with GC workers, and it has been changed many time for that. With the concept of "coordinator work" removed in #794, the coordinator thread is now only used to respond to GC triggers. We realized that we can remove the coordinator and the GC workers can coordinate themselves.

Removing the coordinator thread can also save one context switch between the last GC worker starting sleeping and the coordinator thread opening new buckets.

wks · 2023-12-28T13:47:02Z

We used to believe that the coordinator (controller) thread was introduced to handle STW for OpenJDK (which requires the thread that starts and stops the world to be the same). But from the source code of JikesRVM, it appears that the controller thread existed in JikesRVM, too. The ControllerCollectorContext was the context of the controller thread, and what we currently have as GCRequester originated from that (I did the refactoring in #533).

I think we already agreed that we can remove the coordinator thread. To implement this change, however, we need to acknowledge that we are deviating from JikesRVM. We probably have a good reason to do this, that is, we now use work packets. We still need to implement a GC-triggering algorithm that is equivalent to the old ControllerCollectorContext. For now, it should be as simple as adding the ScheduleCollection work packet. With concurrent GC added, it should handle the case where a mutator triggers a GC while GC is still running, in which case another GC should start when the current GC finishes.

wks · 2024-01-02T11:08:46Z

The coordinator thread behaves like an implicit state machine: WaitForGC -> ScheduleGC -> OpenMoreBuckets -> OpenMoreBuckets -> ... -> GcFinished -> WaitForGC

After removing the coordinator thread, GC workers coordinate themselves. Specifically when all GC workers parked, the last parked worker shall perform actions.

If MMTk has just been initialized, all GC workers should simply park waiting for the first GC.
If GC is requested (poll or handle_user_collection_request) and GC has not been scheduled, it should schedule the GC (by adding the ScheduleCollection work packet). If it is already scheduled, however, it will be a no-op if another mutator also requests for GC.
During GC, the last parked worker should try to open more buckets.
But if all buckets are open and drained, the last parked worker should close all buckets and resume mutators.

We need an explicit state machine for this. The simplest design will have only two states: NotInGC and InGC.

It is tempting to use the GcStatus for this purpose, but the exact timing is not suitable for this purpose. Specifically, we need a state to represent that GC has been scheduled (by mutator requesting GC). Currently I am adding a bool variable in GCRequester for this purpose. We may consider transforming GcStatus into a state machine for this. See #1061

This PR removes the coordinator (a.k.a. controller) thread, and allows temporarily terminating and restarting all GC worker threads in order to support forking. Major changes include: - `GCController` is removed. All synchronization mechanisms involving the controller are removed, too. Important synchronization operations, such as opening buckets and declaring GC finished, are done by the last parked worker. The work packet `EndOfGC` is removed, and its job is now done by the last parked worker. - The `WorkerMonitor`, which previously synchronizes between GC workers, now also synchronizes between mutators and GC workers. This allows mutators to trigger GC by directly communicating with GC workers. - Introduced a new mechanism: "goals". Mutators can now request "goals", and GC workers will work on one goal at a time. Currently, a "goal" can be either GC or StopForFork. Triggering GC is now implemented as requesting the GC goal. - Added a pair of new APIs, namely `MMTK::prepare_to_fork()` and `MMTK::after_fork()`. VM bindings call them before and after forking to let the MMTK instance do proper preparation for forking. Fixes: #1053 Fixes: #1054

wks added G-performance Goal: Performance C-refactoring Category: Refactoring A-work-queue Area: Work queue G-safety Goal: Safety labels Dec 20, 2023

wks mentioned this issue Dec 20, 2023

Supporting the fork() system call #1054

Closed

wks mentioned this issue Jan 2, 2024

Remove or refactor GcStatus #1061

Open

wks mentioned this issue Jan 5, 2024

Remove the coordinator (controller) thread #1062

Closed

qinsoon added the P-normal Priority: Normal. label Jan 17, 2024

wks mentioned this issue Jan 31, 2024

Remove coordinator and support forking #1067

Merged

wks closed this as completed in #1067 Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the coordinator (controller) thread #1053

Remove the coordinator (controller) thread #1053

wks commented Dec 20, 2023

wks commented Dec 28, 2023

wks commented Jan 2, 2024

Remove the coordinator (controller) thread #1053

Remove the coordinator (controller) thread #1053

Comments

wks commented Dec 20, 2023

wks commented Dec 28, 2023

wks commented Jan 2, 2024