Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the coordinator (controller) thread #1053

Closed
wks opened this issue Dec 20, 2023 · 2 comments · Fixed by #1067
Closed

Remove the coordinator (controller) thread #1053

wks opened this issue Dec 20, 2023 · 2 comments · Fixed by #1067
Labels
A-work-queue Area: Work queue C-refactoring Category: Refactoring G-performance Goal: Performance G-safety Goal: Safety P-normal Priority: Normal.

Comments

@wks
Copy link
Collaborator

wks commented Dec 20, 2023

It was created because OpenJDK requires the thread that stops the world to be the same thread that starts the world. But now the OpenJDK binding uses a dedicated "companion" thread for that purpose.

The coordinator thread has brought much trouble to the synchronization with GC workers, and it has been changed many time for that. With the concept of "coordinator work" removed in #794, the coordinator thread is now only used to respond to GC triggers. We realized that we can remove the coordinator and the GC workers can coordinate themselves.

Removing the coordinator thread can also save one context switch between the last GC worker starting sleeping and the coordinator thread opening new buckets.

@wks wks added G-performance Goal: Performance C-refactoring Category: Refactoring A-work-queue Area: Work queue G-safety Goal: Safety labels Dec 20, 2023
@wks
Copy link
Collaborator Author

wks commented Dec 28, 2023

We used to believe that the coordinator (controller) thread was introduced to handle STW for OpenJDK (which requires the thread that starts and stops the world to be the same). But from the source code of JikesRVM, it appears that the controller thread existed in JikesRVM, too. The ControllerCollectorContext was the context of the controller thread, and what we currently have as GCRequester originated from that (I did the refactoring in #533).

I think we already agreed that we can remove the coordinator thread. To implement this change, however, we need to acknowledge that we are deviating from JikesRVM. We probably have a good reason to do this, that is, we now use work packets. We still need to implement a GC-triggering algorithm that is equivalent to the old ControllerCollectorContext. For now, it should be as simple as adding the ScheduleCollection work packet. With concurrent GC added, it should handle the case where a mutator triggers a GC while GC is still running, in which case another GC should start when the current GC finishes.

@wks
Copy link
Collaborator Author

wks commented Jan 2, 2024

The coordinator thread behaves like an implicit state machine: WaitForGC -> ScheduleGC -> OpenMoreBuckets -> OpenMoreBuckets -> ... -> GcFinished -> WaitForGC

After removing the coordinator thread, GC workers coordinate themselves. Specifically when all GC workers parked, the last parked worker shall perform actions.

  • If MMTk has just been initialized, all GC workers should simply park waiting for the first GC.
  • If GC is requested (poll or handle_user_collection_request) and GC has not been scheduled, it should schedule the GC (by adding the ScheduleCollection work packet). If it is already scheduled, however, it will be a no-op if another mutator also requests for GC.
  • During GC, the last parked worker should try to open more buckets.
  • But if all buckets are open and drained, the last parked worker should close all buckets and resume mutators.

We need an explicit state machine for this. The simplest design will have only two states: NotInGC and InGC.

It is tempting to use the GcStatus for this purpose, but the exact timing is not suitable for this purpose. Specifically, we need a state to represent that GC has been scheduled (by mutator requesting GC). Currently I am adding a bool variable in GCRequester for this purpose. We may consider transforming GcStatus into a state machine for this. See #1061

@qinsoon qinsoon added the P-normal Priority: Normal. label Jan 17, 2024
github-merge-queue bot pushed a commit that referenced this issue Apr 9, 2024
This PR removes the coordinator (a.k.a. controller) thread, and allows
temporarily terminating and restarting all GC worker threads in order to
support forking.

Major changes include:

- `GCController` is removed. All synchronization mechanisms involving
the controller are removed, too. Important synchronization operations,
such as opening buckets and declaring GC finished, are done by the last
parked worker. The work packet `EndOfGC` is removed, and its job is now
done by the last parked worker.
- The `WorkerMonitor`, which previously synchronizes between GC workers,
now also synchronizes between mutators and GC workers. This allows
mutators to trigger GC by directly communicating with GC workers.
- Introduced a new mechanism: "goals". Mutators can now request "goals",
and GC workers will work on one goal at a time. Currently, a "goal" can
be either GC or StopForFork. Triggering GC is now implemented as
requesting the GC goal.
- Added a pair of new APIs, namely `MMTK::prepare_to_fork()` and
`MMTK::after_fork()`. VM bindings call them before and after forking to
let the MMTK instance do proper preparation for forking.

Fixes: #1053
Fixes: #1054
@wks wks closed this as completed in #1067 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-work-queue Area: Work queue C-refactoring Category: Refactoring G-performance Goal: Performance G-safety Goal: Safety P-normal Priority: Normal.
Projects
None yet
2 participants