Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect live bytes during GC #768

Merged
merged 8 commits into from
Aug 8, 2023
Merged

Collect live bytes during GC #768

merged 8 commits into from
Aug 8, 2023

Conversation

qinsoon
Copy link
Member

@qinsoon qinsoon commented Mar 7, 2023

This PR adds a feature count_live_bytes_in_gc. When enabled, each worker will maintain a counter for live bytes when scanning an object, and in Release, we get live bytes from all the workers, and sum it up. We also provide memory_manager::live_bytes_in_last_gc() for users to query this value. This is usually used to debug fragmentation.

@qinsoon qinsoon marked this pull request as ready for review March 7, 2023 22:10
@qinsoon qinsoon requested a review from wks March 7, 2023 22:10
/// objects, we increase the live bytes. We get this value from each worker
/// at the end of a GC, and reset this counter.
#[cfg(feature = "count_live_bytes_in_gc")]
live_bytes: AtomicUsize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need atomic for this? It's worker-local right?

Copy link
Member Author

@qinsoon qinsoon Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need atomic for this? It's worker-local right?

It is in the shared part of a worker. We cannot get a mutable reference to it. It needs to be in the shared part, as we need to iterate through all workers, and sum it up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. I was just thinking if we can make it cheap enough by not using atomic for worker local stats, we could enable this feature by default.

Copy link
Collaborator

@k-sareen k-sareen Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make it worker-local and each worker reports the live bytes after their transitive closure? Then we don't have to explicitly iterate through all the workers, we just collate the numbers we get from the transitive closure.

Copy link
Collaborator

@wks wks Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be any stage after the transitive closure stages (including weak reference processing stages).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's easier to defer the "live byte count" to each policy as opposed to each worker? But then it stops being worker-local so it would need synchronization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's easier to defer the "live byte count" to each policy as opposed to each worker? But then it stops being worker-local so it would need synchronization.

Yeah. It need at least the same level synchronization as the current code. And the counting code is scattered to each policy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use AtomicUsize::as_mut_ptr and do non-atomic add with unsafe code when we do the counting. But I still question if we want to make this enabled by default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't have a strong opinion. Keeping track of fragmentation is definitely useful though. If it's possible to do it cheaply for every GC, may as well do it. If it's not possible then I don't think should spend more time on it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinsoon Sorry about the typo. I meant ReleaseCollector. Yes. It will be a problem for MarkCompact. If we make the counter a private field of a Worker, it will require another rendezvous (designated works) between the marking phase and the forwarding phase. Given this situation, I don't mind if we use AtomicUsize for now.

// This Also exposes worker to the callback function. This is not a public method.
fn with_tracer_and_worker<R, F>(&self, worker: &mut GCWorker<E::VM>, func: F) -> R
where
F: FnOnce(&mut ProcessEdgesWorkTracer<E>, &mut GCWorker<E::VM>) -> R,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, it is unsafe to give the user access to both the tracer and the worker. The tracer is a closure that calls trace_object underneath, and the worker is also one of the arguments of trace_object. I planned to forbid this by adding a lifetime annotation, but I couldn't do it without the General Associated Type (GAT) feature before Rust 1.65.

See:

/// FIXME: The current code works because of the unsafe method `ProcessEdgesWork::set_worker`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem with this is that VM bindings can use ObjectTracerContext to get temporary access to trace_object during Scanning::process_weak_refs and Scanning::forward_weak_refs. Adding a with_tracer_and_worker will allow ScanObjects to record object sizes, but not for VM-specific weak reference processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_tracer takes a mutable reference of worker, so I cannot use worker any more in ScanObjects::do_work_common. We need to a way to get the worker back from the tracer. Another option is to expose some methods from ObjectTracer so we can get the worker or directly increase the counter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another problem with this is that VM bindings can use ObjectTracerContext to get temporary access to trace_object during Scanning::process_weak_refs and Scanning::forward_weak_refs. Adding a with_tracer_and_worker will allow ScanObjects to record object sizes, but not for VM-specific weak reference processing.

The remaining question is how to count live bytes for weak references. As the weak reference processing is totally at the binding side, we may have to expose a method and let the binding to call it when they scan weak refs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remaining question is how to count live bytes for weak references. As the weak reference processing is totally at the binding side, we may have to expose a method and let the binding to call it when they scan weak refs.

If we count the object size at ScanObjectsWork, it will not be a problem. The binding keeps objects alive during the weak reference processing stage using trace_object (via ObjectTracer::trace_object, which is implemented with ProcessEdgesWork::trace_object underneath and eventually gets to the space-specific trace_object). Any object visited by trace_object will eventually be visited by ScanObjectsWork.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could count the object sizes in one loop instead of splitting the objects into those that supports edge enqueuing and those that don't. By doing this, the with_tracer_and_worker is also unnecessary.

src/scheduler/gc_work.rs Outdated Show resolved Hide resolved
src/scheduler/gc_work.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small problem remains.

@@ -74,7 +74,7 @@ impl ObjectQueue for VectorQueue<ObjectReference> {
/// A transitive closure visitor to collect all the edges of an object.
pub struct ObjectsClosure<'a, E: ProcessEdgesWork> {
buffer: VectorQueue<EdgeOf<E>>,
worker: &'a mut GCWorker<E::VM>,
pub(crate) worker: &'a mut GCWorker<E::VM>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer needs to be pub(crate).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need this. When we do worker.shared.increase_live_bytes() in ScanObjectsWork::do_work_common(), we already created ObjectsClosure which takes &mut GCWorker. We can't use worker directly, and we have to use it through ObjectsClosure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. I didn't notice that.

Copy link
Collaborator

@wks wks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@caizixian
Copy link
Member

I think we should merge this. We can also add a tracepoint for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants