Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-126491: Lower heap size limit with faster marking #127519

Merged
merged 24 commits into from
Dec 6, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Dec 2, 2024

With marking added to the cyclic GC (#127110) we spend a lot of the time in the GC forming transitive closures, both for marking and for the increments of the incremental GC.

Unfortunately the current algorithm has a couple of mistakes in it. One harmful, one beneficial.

  • The beneficial one is counting the initial mark twice. This helps because it reduces the cost of GC on heaps with little or no garbage
  • The harmful one is allowing the amount of work done to grow in proportion to the heap size.

These more or less cancel out.
This PR deliberately counts marking as twice as effective as normal collection, but limits the amount of work done.
To do so, we need to increase the typical amount of work done a bit.
This has the advantage of limiting the amount of garbage to (roughly) 1/3 of the heap.

This PR does two things:

  • Speeds up the marking and increment creation phases
  • Visits objects a bit faster to maintain a lower heap size.

@markshannon
Copy link
Member Author

!buildbot iOS

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8893cf5 🤖

The command will test the builders whose names match following regular expression: iOS

The builders matched are:

  • iOS ARM64 Simulator PR

@markshannon
Copy link
Member Author

!buildbot Android

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8893cf5 🤖

The command will test the builders whose names match following regular expression: Android

The builders matched are:

  • aarch64 Android PR
  • AMD64 Android PR

@markshannon markshannon changed the title GH-126491: Faster marking GH-126491: Lower heap size limit with faster marking Dec 4, 2024
@markshannon
Copy link
Member Author

!buildbot Android|iOS

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8262bf0 🤖

The command will test the builders whose names match following regular expression: Android|iOS

The builders matched are:

  • iOS ARM64 Simulator PR
  • aarch64 Android PR
  • AMD64 Android PR

@markshannon
Copy link
Member Author

Performance is a wash overall, but I think that is an artifact of our benchmarks. I would expect this to perform better on larger heaps and consume less memory, although the benchmarks show no overall change in memory consumption.

Note that the "create gc cycles" benchmark shows a 10% speedup and "gc traversal" an 8% speedup, but there is an equivalent slowdown on the "xml etree" benchmarks.

@markshannon markshannon marked this pull request as ready for review December 4, 2024 14:03
InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved
InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved
InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved

To work out how much work we need to do, consider a heap with `L` live objects
and `G0` garbage objects at the start of a full scavenge and `G1` garbage objects
at the end of the scavenge. We don't want amount of garbage to grow, `G1 ≤ G0`, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
at the end of the scavenge. We don't want amount of garbage to grow, `G1 ≤ G0`, and
at the end of the scavenge. We don't want the amount of garbage to grow, `G1 ≤ G0`, and

The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
assuming that the number of live objects remains roughly constant.
If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
For a steady state heap `G0 == G1` we get `L > 2*G0` and the desired garbage ratio.

The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
assuming that the number of live objects remains roughly constant.
If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
For a steady state heap (`G0 == G1`) we get `L > 2G` and the desired garbage ratio.


If we choose the amount of work done such that `2*M + I == 6N` then we can do
less work in most cases, but are still guaranteed to keep up.
Since `I ≥ G0 + G1` (not strictly true, but close enough)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since `I G0 + G1` (not strictly true, but close enough)
Since `I G0 + G1` (not strictly true, but close enough)

Copy link
Member Author

@markshannon markshannon Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increments (I) can include some of the live heap, depending on the how much is keep alive by C extensions.
So is more correct. Although is even more correct.

If we choose the amount of work done such that `2*M + I == 6N` then we can do
less work in most cases, but are still guaranteed to keep up.
Since `I ≥ G0 + G1` (not strictly true, but close enough)
`T == M + I == (6N + I)/2` and `(6N + I)/2 ≥ 4G`, so we can keep up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`T == M + I == (6N + I)/2` and `(6N + I)/2 4G`, so we can keep up.
`T == M + I == (6N + I)/2` and `(6N + I)/2 4G`, so we can keep up.

`T == M + I == (6N + I)/2` and `(6N + I)/2 ≥ 4G`, so we can keep up.

The reason that this improves performance is that `M` is usually much larger
than `I` Suppose `M == 10I`, then `T ≅ 3N`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
than `I` Suppose `M == 10I`, then `T ≅ 3N`.
than `I`. If `M == 10I`, then `T ≅ 3N`.

@@ -1558,24 +1547,26 @@ mark_at_start(PyThreadState *tstate)
static intptr_t
assess_work_to_do(GCState *gcstate)
{
/* The amount of work we want to do depends on three things.
/* The amount of work we want to do depends on two things.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth linking to the doc from here?

Python/gc.c Outdated
*/
intptr_t scale_factor = gcstate->old[0].threshold;
if (scale_factor < 2) {
scale_factor = 2;
}
intptr_t new_objects = gcstate->young.count;
intptr_t max_heap_fraction = new_objects*3/2;
intptr_t max_heap_fraction = new_objects*5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called fraction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for any good reason. I'll rename it.

@markshannon markshannon merged commit 023b7d2 into python:main Dec 6, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants