New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GH-126491: Lower heap size limit with faster marking #127519

Merged

markshannon merged 24 commits into python:main from faster-cpython:faster-marking

Dec 6, 2024

+208 −243

Member

markshannon commented Dec 2, 2024 •

edited

Loading

With marking added to the cyclic GC (#127110) we spend a lot of the time in the GC forming transitive closures, both for marking and for the increments of the incremental GC.

Unfortunately the current algorithm has a couple of mistakes in it. One harmful, one beneficial.

The beneficial one is counting the initial mark twice. This helps because it reduces the cost of GC on heaps with little or no garbage
The harmful one is allowing the amount of work done to grow in proportion to the heap size.

These more or less cancel out.
This PR deliberately counts marking as twice as effective as normal collection, but limits the amount of work done.
To do so, we need to increase the typical amount of work done a bit.
This has the advantage of limiting the amount of garbage to (roughly) 1/3 of the heap.

This PR does two things:

Speeds up the marking and increment creation phases
Visits objects a bit faster to maintain a lower heap size.

Issue: Mark all objects reachable from roots as live before doing main cyclic GC pass #126491

markshannon added 7 commits

November 18, 2024 14:32


          Faster marking of reachable objects

3038a78


          Handle more classes in fast marking

c024484


          Add support for asyn generators on fast path. Simplify counting

e8497ae


          Check stackref before converting to PyObject *

4c1a6bc


          Rename stuff

6efb4c0


          Remove expand_region_transitively_reachable and use move_all_transiti…

b1c7ab0

…vely_reachable.


          Merge branch 'main' into faster-marking

07f228b

bedevere-app bot mentioned this pull request

Mark all objects reachable from roots as live before doing main cyclic GC pass #126491

Open


          Fix compiler warnings and linkage

51ff78e

markshannon added the skip news label

markshannon added 10 commits

December 2, 2024 16:12


          Fix another linkage issue

df907b5


          Try 'extern'

9ca64f5


          Go back to PyAPI_FUNC and move functions together

bda13f4


          Use _Py_FALLTHROUGH

d9d63c8


          Add necessary #ifndef Py_GIL_DISABLED

57b8820


          Go back to using tp_traverse, but make traversal more efficient

a607059


          Tidy up


          A bit more tidying up

a1a38c8


          Move all work to do calculations to one place

68fc90b


          Assume that increments are 50% garbage for work done calculation

8893cf5

Member Author

markshannon commented Dec 3, 2024

!buildbot iOS

bedevere-bot commented Dec 3, 2024

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8893cf5 🤖

The command will test the builders whose names match following regular expression: iOS

The builders matched are:

iOS ARM64 Simulator PR

Member Author

markshannon commented Dec 3, 2024

!buildbot Android

bedevere-bot commented Dec 3, 2024

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8893cf5 🤖

The command will test the builders whose names match following regular expression: Android

The builders matched are:

aarch64 Android PR
AMD64 Android PR


          Elaborate comment

ba20c7c

markshannon changed the title ~~GH-126491: Faster marking~~ GH-126491: Lower heap size limit with faster marking

mhsmith mentioned this pull request

Two selective !buildbot commands in quick succession causes ALL buildbots to run python/buildmaster-config#566

Open


          More tweaking of thresholds

8262bf0

Member Author

markshannon commented Dec 4, 2024

!buildbot Android|iOS

bedevere-bot commented Dec 4, 2024

🤖 New build scheduled with the buildbot fleet by @markshannon for commit 8262bf0 🤖

The command will test the builders whose names match following regular expression: Android|iOS

The builders matched are:

iOS ARM64 Simulator PR
aarch64 Android PR
AMD64 Android PR

Member Author

markshannon commented Dec 4, 2024

Performance is a wash overall, but I think that is an artifact of our benchmarks. I would expect this to perform better on larger heaps and consume less memory, although the benchmarks show no overall change in memory consumption.

Note that the "create gc cycles" benchmark shows a 10% speedup and "gc traversal" an 8% speedup, but there is an equivalent slowdown on the "xml etree" benchmarks.

markshannon marked this pull request as ready for review

December 4, 2024 14:03

markshannon requested a review from methane as a code owner

December 4, 2024 14:03

bedevere-app bot added the awaiting core review label

markshannon added 2 commits

December 4, 2024 16:54


          Do some algebra

3c2116e


          Revert to 2M+I from 3M+I

72d0284

iritkatriel reviewed

View reviewed changes

InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved

InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved

InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved


          Address review comments

0f182e2

iritkatriel reviewed

View reviewed changes

InternalDocs/garbage_collector.md Outdated

+              To work out how much work we need to do, consider a heap with `L` live objects
+              and `G0` garbage objects at the start of a full scavenge and `G1` garbage objects
+              at the end of the scavenge. We don't want amount of garbage to grow, `G1 ≤ G0`, and

Member

iritkatriel Dec 5, 2024

Suggested change

      
            at the end of the scavenge. We don't want amount of garbage to grow, `G1 ≤ G0`, and
          
            at the end of the scavenge. We don't want the amount of garbage to grow, `G1 ≤ G0`, and

InternalDocs/garbage_collector.md Outdated

+              The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
+              assuming that the number of live objects remains roughly constant.
+              If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
+              For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.

Member

iritkatriel Dec 5, 2024

Suggested change

      
            For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
          
            For a steady state heap `G0 == G1` we get `L > 2*G0` and the desired garbage ratio.

InternalDocs/garbage_collector.md Outdated

+              The number of new objects created `N` must be at least the new garbage created, `N ≥ G1`,
+              assuming that the number of live objects remains roughly constant.
+              If we set `T == 4*N` we get `T > 4*G1` and `T = L + G0 + G1` => `L + G0 > 3G1`
+              For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.

Member

iritkatriel Dec 5, 2024

Suggested change

      
            For a steady state heap `G0 == G1` we get `L > 2G` and the desired garbage ratio.
          
            For a steady state heap (`G0 == G1`) we get `L > 2G` and the desired garbage ratio.

InternalDocs/garbage_collector.md Outdated

+              If we choose the amount of work done such that `2*M + I == 6N` then we can do
+              less work in most cases, but are still guaranteed to keep up.
+              Since `I ≥ G0 + G1` (not strictly true, but close enough)

Member

iritkatriel Dec 5, 2024

Suggested change

      
            Since `I ≥ G0 + G1` (not strictly true, but close enough)
          
            Since `I ≈ G0 + G1` (not strictly true, but close enough)

Member Author

markshannon Dec 5, 2024 •

edited

Loading

The increments (I) can include some of the live heap, depending on the how much is keep alive by C extensions.
So ≥ is more correct. Although ≳ is even more correct.

InternalDocs/garbage_collector.md Outdated

+              If we choose the amount of work done such that `2*M + I == 6N` then we can do
+              less work in most cases, but are still guaranteed to keep up.
+              Since `I ≥ G0 + G1` (not strictly true, but close enough)
+              `T == M + I == (6N + I)/2` and `(6N + I)/2 ≥ 4G`, so we can keep up.

Member

iritkatriel Dec 5, 2024

Suggested change

      
            `T == M + I == (6N + I)/2` and `(6N + I)/2 ≥ 4G`, so we can keep up.
          
            `T == M + I == (6N + I)/2` and `(6N + I)/2 ≳ 4G`, so we can keep up.

InternalDocs/garbage_collector.md Outdated

+              `T == M + I == (6N + I)/2` and `(6N + I)/2 ≥ 4G`, so we can keep up.
+              The reason that this improves performance is that `M` is usually much larger
+              than `I` Suppose `M == 10I`, then `T ≅ 3N`.

Member

iritkatriel Dec 5, 2024

Suggested change

      
            than `I` Suppose `M == 10I`, then `T ≅ 3N`.
          
            than `I`. If `M == 10I`, then `T ≅ 3N`.

Python/gc.c

@@ @@ -1558,24 +1547,26 @@ mark_at_start(PyThreadState *tstate) @@
               static intptr_t
               assess_work_to_do(GCState *gcstate)
               {
-                  /* The amount of work we want to do depends on three things.
+                  /* The amount of work we want to do depends on two things.

Member

iritkatriel Dec 5, 2024

Is it worth linking to the doc from here?

Python/gc.c Outdated

                    */
                   intptr_t scale_factor = gcstate->old[0].threshold;
                   if (scale_factor < 2) {
                       scale_factor = 2;
                   }
                   intptr_t new_objects = gcstate->young.count;
-                  intptr_t max_heap_fraction = new_objects*3/2;
+                  intptr_t max_heap_fraction = new_objects*5;

Member

iritkatriel Dec 5, 2024

Why is this called fraction?

Member Author

markshannon Dec 5, 2024

Not for any good reason. I'll rename it.


          Address review comments and clarify code a bit

d3c21bb

iritkatriel approved these changes

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels

markshannon merged commit 023b7d2 into python:main

43 checks passed

bedevere-app bot removed the awaiting merge label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels