Incremental "mark alive" pass for cyclic GC #126511
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a "mark alive" pass to the cyclic GC, done incrementally in order to reduce pause times for full GC collections. The "mark alive" pass works by starting from known GC roots and then using
tp_traverse
to mark everything alive that's reachable from that. Those objects will be skipped when the next full (gen 2) collection happens.Based on my benchmarking it is quite effective at reducing GC pause times (latency). Here is some timing stats for a benchmark I ran. Timing with the "mark alive" feature turned off:
Meaning of terms:
The "gc timing full" are the times taken for full (generation 2) GC collections. Qxx is the quantile of the time, units of microseconds.
With the mark alive feature on:
This benchmarking shows the overall time in the GC has slightly increased but the pause times have drastically decreased. The 99% quantile pause time is 19x shorter. It's possible with additional optimization the overall time can be further reduced. If it can't be made comparable in overall cost, I think it could be turned on via a feature like the
PYTHON_GC_PRESET=min-latency
, as proposed in gh-124772.This is still a WIP. I would like to compare the pause times and overall performance with the incremental GC that is in the 3.14 and main branches.