-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect live bytes during GC #768
Conversation
/// objects, we increase the live bytes. We get this value from each worker | ||
/// at the end of a GC, and reset this counter. | ||
#[cfg(feature = "count_live_bytes_in_gc")] | ||
live_bytes: AtomicUsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need atomic for this? It's worker-local right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need atomic for this? It's worker-local right?
It is in the shared part of a worker. We cannot get a mutable reference to it. It needs to be in the shared part, as we need to iterate through all workers, and sum it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK. I was just thinking if we can make it cheap enough by not using atomic for worker local stats, we could enable this feature by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make it worker-local and each worker reports the live bytes after their transitive closure? Then we don't have to explicitly iterate through all the workers, we just collate the numbers we get from the transitive closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be any stage after the transitive closure stages (including weak reference processing stages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's easier to defer the "live byte count" to each policy as opposed to each worker? But then it stops being worker-local so it would need synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's easier to defer the "live byte count" to each policy as opposed to each worker? But then it stops being worker-local so it would need synchronization.
Yeah. It need at least the same level synchronization as the current code. And the counting code is scattered to each policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use AtomicUsize::as_mut_ptr
and do non-atomic add with unsafe code when we do the counting. But I still question if we want to make this enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't have a strong opinion. Keeping track of fragmentation is definitely useful though. If it's possible to do it cheaply for every GC, may as well do it. If it's not possible then I don't think should spend more time on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qinsoon Sorry about the typo. I meant ReleaseCollector
. Yes. It will be a problem for MarkCompact
. If we make the counter a private field of a Worker
, it will require another rendezvous (designated works) between the marking phase and the forwarding phase. Given this situation, I don't mind if we use AtomicUsize
for now.
src/scheduler/gc_work.rs
Outdated
// This Also exposes worker to the callback function. This is not a public method. | ||
fn with_tracer_and_worker<R, F>(&self, worker: &mut GCWorker<E::VM>, func: F) -> R | ||
where | ||
F: FnOnce(&mut ProcessEdgesWorkTracer<E>, &mut GCWorker<E::VM>) -> R, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, it is unsafe to give the user access to both the tracer and the worker. The tracer is a closure that calls trace_object
underneath, and the worker
is also one of the arguments of trace_object
. I planned to forbid this by adding a lifetime annotation, but I couldn't do it without the General Associated Type (GAT) feature before Rust 1.65.
See:
Line 53 in 813fa99
/// FIXME: The current code works because of the unsafe method `ProcessEdgesWork::set_worker`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem with this is that VM bindings can use ObjectTracerContext
to get temporary access to trace_object
during Scanning::process_weak_refs
and Scanning::forward_weak_refs
. Adding a with_tracer_and_worker
will allow ScanObjects
to record object sizes, but not for VM-specific weak reference processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with_tracer
takes a mutable reference of worker
, so I cannot use worker
any more in ScanObjects::do_work_common
. We need to a way to get the worker
back from the tracer. Another option is to expose some methods from ObjectTracer
so we can get the worker or directly increase the counter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem with this is that VM bindings can use
ObjectTracerContext
to get temporary access totrace_object
duringScanning::process_weak_refs
andScanning::forward_weak_refs
. Adding awith_tracer_and_worker
will allowScanObjects
to record object sizes, but not for VM-specific weak reference processing.
The remaining question is how to count live bytes for weak references. As the weak reference processing is totally at the binding side, we may have to expose a method and let the binding to call it when they scan weak refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining question is how to count live bytes for weak references. As the weak reference processing is totally at the binding side, we may have to expose a method and let the binding to call it when they scan weak refs.
If we count the object size at ScanObjectsWork
, it will not be a problem. The binding keeps objects alive during the weak reference processing stage using trace_object
(via ObjectTracer::trace_object
, which is implemented with ProcessEdgesWork::trace_object
underneath and eventually gets to the space-specific trace_object
). Any object visited by trace_object
will eventually be visited by ScanObjectsWork
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could count the object sizes in one loop instead of splitting the objects into those that supports edge enqueuing and those that don't. By doing this, the with_tracer_and_worker
is also unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a small problem remains.
@@ -74,7 +74,7 @@ impl ObjectQueue for VectorQueue<ObjectReference> { | |||
/// A transitive closure visitor to collect all the edges of an object. | |||
pub struct ObjectsClosure<'a, E: ProcessEdgesWork> { | |||
buffer: VectorQueue<EdgeOf<E>>, | |||
worker: &'a mut GCWorker<E::VM>, | |||
pub(crate) worker: &'a mut GCWorker<E::VM>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This no longer needs to be pub(crate)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need this. When we do worker.shared.increase_live_bytes()
in ScanObjectsWork::do_work_common()
, we already created ObjectsClosure
which takes &mut GCWorker
. We can't use worker
directly, and we have to use it through ObjectsClosure
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry. I didn't notice that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think we should merge this. We can also add a tracepoint for this. |
This PR adds a feature
count_live_bytes_in_gc
. When enabled, each worker will maintain a counter for live bytes when scanning an object, and inRelease
, we get live bytes from all the workers, and sum it up. We also providememory_manager::live_bytes_in_last_gc()
for users to query this value. This is usually used to debug fragmentation.