Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark phase prefetching. #73375

Merged
merged 5 commits into from
Aug 9, 2022
Merged

Conversation

PeterSolMS
Copy link
Contributor

This adds prefetching to the mark phase.

The idea is that once we have established that an object is in one of the generations we want to collect, we prefetch its memory before we determine whether we have marked it already. This is because the mark bit is in the object itself, and thus requires accessing the object's memory.

As the prefetching will take some time to take effect, we park the object in a queue (see type mark_queue_t below). We then retrieve an older object from the queue, and test whether it has been marked. This should be faster, because we have issued a prefetch for this older object's memory a while back.

In quite a few places we now need to drain the queue to ensure correctness - see calls to drain_mark_queue().

@ghost
Copy link

ghost commented Aug 4, 2022

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This adds prefetching to the mark phase.

The idea is that once we have established that an object is in one of the generations we want to collect, we prefetch its memory before we determine whether we have marked it already. This is because the mark bit is in the object itself, and thus requires accessing the object's memory.

As the prefetching will take some time to take effect, we park the object in a queue (see type mark_queue_t below). We then retrieve an older object from the queue, and test whether it has been marked. This should be faster, because we have issued a prefetch for this older object's memory a while back.

In quite a few places we now need to drain the queue to ensure correctness - see calls to drain_mark_queue().

Author: PeterSolMS
Assignees: -
Labels:

area-GC-coreclr

Milestone: -


// retrieve a newly marked object from the queue
// returns nullptr if there is no such object
uint8_t* mark_queue_t::drain()
Copy link
Member

@adityamandaleeka adityamandaleeka Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be renamed to something that better implies that it just marks one object and returns it (maybe mark_next or something)? drain implies completely emptying it out IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of this method as draining but just needs to return if there's still objects to mark. it does drain the slot_table at the end when all slots become null (and that's the end goal, to have all slots become null).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about get_next_marked, slight variation on Aditya's suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then I would probably do get_next_to_mark since you are getting an object to do the mark work on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the object returned is already marked, so get_next_to_mark doesn't seem entirely right either. Can't think of anything better than get_next_marked.

#endif
_mm_prefetch((const char*)addr, _MM_HINT_T0);
#elif defined(TARGET_ARM64) && defined(TARGET_WINDOWS)
__prefetch((const char*)addr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__builtin_prefetch should work on non-Windows

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, on linux we should use __buildin_prefetch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jan - it looks like calling it with the default arguments should be just fine for our purposes.

@Maoni0
Copy link
Member

Maoni0 commented Aug 5, 2022

some results I measured on a 1st party prod workload, I had 3 machines -

  • baseline is without the change and 2 machines (prefetch0 and prefetch1) were with the change.
  • the workload runs with Server GC 48 heaps, samples/mb is # of CPU samples in mark_phase to promote an mb so lower is better.
gen0 GCs   samples/mb diff %
run0 baseline0 5.49  
  prefetch0 5.29 -3.64%
  prefetch1 5.21 -5.10%
run1 baseline1 5.75  
  prefetch0 5.47 -4.87%
  prefetch1 5.52 -4.00%
       
gen1 GCs   samples/mb diff %
run0 baseline0 3.46  
  prefetch0 3.3 -4.62%
  prefetch1 3.25 -6.07%
run1 baseline1 3.6  
  prefetch0 3.35 -6.94%
  prefetch1 3.39 -5.83%

…ating systems.

Checkin tests showed issue traced to missing drain_mark_queue() call in WKS version of scan_dependent_handles.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants