GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

markshannon · 2024-11-06T14:23:37Z

This PR:

Performs a marking step before the incremental cycle collection
Rescans the stack before each increment
Removes lazy dict tracking as the lazy tracking optimization no longer pays off.

Performance is excellent. Speedup is 4%, with a 100% speedup on one GC benchmark.

Stats show the GC is run more frequently and collects more objects, but does only ~60% of the work.

This PR also:

Makes detaching an object's dict a bit more robust in case of a memory error.
Increases the threshold in a GC test. This test is to check for uncontrolled growth, so the exact threshold doesn't matter.

Issue: Mark all objects reachable from roots as live before doing main cyclic GC pass #126491

…ion phase

Python/gc.c

Include/internal/pycore_object.h

iritkatriel

Could you update https://github.com/python/cpython/blob/main/InternalDocs/garbage_collector.md to explain how this works?

bedevere-app · 2024-11-11T16:24:04Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Include/internal/pycore_gc.h

Include/internal/pycore_runtime_init.h

InternalDocs/garbage_collector.md

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

iritkatriel · 2024-11-18T13:07:47Z

Python/gc.c

+                }
+            }
+            if (!start && frame->visited) {
+                // If this frame has already been visited, then the lower frames


There's a warning here

It is a false positive. I think it is complaining because we don't set visited here, but this line is unreachable for entry frames.

I'll initialize visited to keep it quiet and in case the code gets reordered.

hugovk · 2024-11-18T20:40:42Z

Hmm, looks like this commit b0fcc2c has caused test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation failures for iOS and Android:

iOS ARM64 Simulator: https://buildbot.python.org/#/builders/1380/builds/1907
AMD64 Android: https://buildbot.python.org/#/builders/1591/builds/516
aarch64 Android: https://buildbot.python.org/#/builders/1594/builds/600

Reminder the 3.14 alpha 2 is due tomorrow (2024-11-19): https://buildbot.python.org/#/release_status

cc @freakboy3742 @mhsmith

nascheme · 2024-11-18T20:57:11Z

Extracted from the Android failure:

FAIL: test_incremental_gc_handles_fast_cycle_creation (test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation)
----------------------------------------------------------------------          
Traceback (most recent call last):                                              
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/contextlib.py", line 85, in inner
    return func(*args, **kwds)                                                  
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_gc.py", line 1175, in test_incremental_gc_handles_fast_cycle_creation 
    self.assertLess(new_objects, 25_000)                                        
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^                                        
AssertionError: 25032 not less than 25000                                       
                                                                                
----------------------------------------------------------------------          
Ran 53 tests in 3.744s                                                          
                                                                                
FAILED (failures=1, skipped=8)                                                  
test test_gc failed                                                             
test_gc failed (1 failure)                                                      
1 test failed again:                                                            
    test_gc

hugovk · 2024-11-18T21:17:19Z

Also some tier 1 and 2 "test_import failed (reference leak)" that include this change (and some others) are now failing:

AMD64 RHEL8 Refleaks: https://buildbot.python.org/#/builders/259/builds/1712
aarch64 RHEL8 Refleaks: https://buildbot.python.org/#/builders/551/builds/348

Plus s390x RHEL8 + RHEL9 and AMD64 FreeBSD refleaks buildbots.

I'll open a draft revert PR so we can run the buildbots on it.

Edit: #126983

… doing cycle collection (pythonGH-126502)" This reverts commit b0fcc2c.

markshannon · 2024-11-18T21:55:12Z

Hmm, looks like this commit b0fcc2c has caused test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation failures for iOS and Android:

I don't have ready access to an iOS or Android device to test on.
The failing test is designed to check that the heap doesn't keep growing. If the initial heap size is significantly smaller than on linux/windows then the heap would grow a bit more before stabilizing. Setting the limit to 27k or 30k should work.
It might be worth double checking that the heap doesn't continue to grow by increasing the number of iterations from 20k to 100k+

freakboy3742 · 2024-11-18T22:16:31Z

I've done some local testing on iOS. The test_gc module passes as-is if you run it by itself. If I increase the loop count in the failing test to 100k, the test fails with exactly the same object count (25,032).

My guess is that this one of those issues that iOS/Android expose because they don't run the test suite in parallel - my guess is that you'd likely be able to reproduce this failure on a desktop machine if you run the test suite with --single-process.

On that basis, following @markshannon's suggestion I'll increase the object count to 26k. PR incoming.

freakboy3742 · 2024-11-18T22:23:31Z

#126984 is a PR increasing the test threshold.

freakboy3742 · 2024-11-19T02:47:06Z

I've abandoned the "increase the threshold" aapproach - that's clearly not enough to avoid the issue. The threshold needed to pass the test is clearly much higher than 30k.

The good news (well... good for me, not so good for @markshannon 😄 ) is that I can reproduce this bug on macOS. If you run the test suite sequentially in a single process (./python.exe -m test -W --single-process), test_gc fails with the same error on macOS as is being reported on iOS and Android.

You don't see this error if you run test_gc in isolation - the 25k check passes. This is also consistent with the behavior on iOS and Android. You have to run the full sequence of (alphabetical) tests up to test_gc to see the failure. I'll take a quick pass to see if I can narrow down a smaller (and faster) reproduction case.

freakboy3742 · 2024-11-19T03:13:21Z

I'm able to reliably reproduce the failure on macOS with ./python.exe -m test test_email test_fnmatch test_frame test_gc.

… cycle collection (GH-126502)" (#126983)

hugovk · 2024-11-19T09:30:08Z

I've reverted this because it's release day and we need to keep tier-1 green anyway: #126983.

Let's make sure to summon the buildbots next time :)

nascheme · 2024-11-19T21:25:40Z

I'm able to re-produce on Linux with this set of tests: test_dis test_email test_funcattrs test_functools test_gc.

…ycle collection (pythonGH-126502) * Mark almost all reachable objects before doing collection phase * Add stats for objects marked * Visit new frames before each increment * Remove lazy dict tracking * Update docs * Clearer calculation of work to do.

… doing cycle collection (pythonGH-126502)" (python#126983)

markshannon added 8 commits November 4, 2024 11:42

GC experiment: mark almost all reachable objects before doing collect…

2ec8d8a

…ion phase

Add stats for objects marked

1fdf00e

Start with mark phase

5e813c5

Add stats for visits during marking

8bd7606

Visit new frames before each increment

3513da2

Redo stats

ab1faec

Fix freezing and GC untracking

9e2d93c

Don't untrack dicts

3c18fc8

markshannon requested review from ericsnowcurrently and methane as code owners November 6, 2024 14:23

bedevere-app bot added the awaiting core review label Nov 6, 2024

bedevere-app bot mentioned this pull request Nov 6, 2024

Mark all objects reachable from roots as live before doing main cyclic GC pass #126491

Open

github-advanced-security bot found potential problems Nov 6, 2024

View reviewed changes

Python/gc.c Fixed Show fixed Hide fixed

markshannon added 9 commits November 6, 2024 14:41

Remove lazy dict tracking from no-gil build

94da963

Remove unused variable

659fd1e

Add news

4cfbc4f

Fix use after free

8c92ca6

Attempt more careful fix of use-after-free

12d7f7c

Typo

1f619d7

Fix use of uninitialized variable

b55fe37

Fix compiler warnings

73b7f52

Tweak test

33f6386

iritkatriel reviewed Nov 11, 2024

View reviewed changes

Include/internal/pycore_object.h Show resolved Hide resolved

iritkatriel requested changes Nov 11, 2024

View reviewed changes

bedevere-app bot removed the awaiting core review label Nov 11, 2024

bedevere-app bot added the awaiting changes label Nov 11, 2024

markshannon added 2 commits November 11, 2024 16:41

Add section to internal docs

8574d00

Rephrase new docs

70007b0

iritkatriel reviewed Nov 11, 2024

View reviewed changes

Include/internal/pycore_gc.h Show resolved Hide resolved

Include/internal/pycore_runtime_init.h Outdated Show resolved Hide resolved

Make sure tuples are untracked and avoid quadratic time validation

278059b

AlexWaygood reviewed Nov 16, 2024

View reviewed changes

InternalDocs/garbage_collector.md Outdated Show resolved Hide resolved

markshannon and others added 4 commits November 18, 2024 09:43

Update InternalDocs/garbage_collector.md

f186b4a

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

Remove unused variable

5f6d04e

Tweak work to do calculation

9cfb5f0

Explain work to do calculation

c7683a4

iritkatriel reviewed Nov 18, 2024

View reviewed changes

Initialize field to prevent code analyzer warning.

170ea6d

iritkatriel approved these changes Nov 18, 2024

View reviewed changes

markshannon merged commit b0fcc2c into python:main Nov 18, 2024
59 checks passed

bedevere-app bot removed the awaiting merge label Nov 18, 2024

markshannon deleted the mark-first-gc branch November 18, 2024 14:36

AlexWaygood mentioned this pull request Nov 18, 2024

Incremental GC causes a significant slowdown for Sphinx #124567

Closed

hugovk added a commit to hugovk/cpython that referenced this pull request Nov 18, 2024

Revert "pythonGH-126491: GC: Mark objects reachable from roots before…

2ac5c48

… doing cycle collection (pythonGH-126502)" This reverts commit b0fcc2c.

freakboy3742 mentioned this pull request Nov 18, 2024

GH-126491: Increase the threshold for the GC fast cycle test. #126984

Closed

hugovk added a commit that referenced this pull request Nov 19, 2024

Revert "GH-126491: GC: Mark objects reachable from roots before doing…

899fdb2

… cycle collection (GH-126502)" (#126983)

markshannon mentioned this pull request Nov 19, 2024

Remove lazy dictionary tracking #127010

Closed

markshannon mentioned this pull request Nov 21, 2024

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #127110

Merged

ebonnal pushed a commit to ebonnal/cpython that referenced this pull request Jan 12, 2025

Revert "pythonGH-126491: GC: Mark objects reachable from roots before…

53b8263

… doing cycle collection (pythonGH-126502)" (python#126983)

Uh oh!

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

Uh oh!

Conversation

markshannon commented Nov 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iritkatriel left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Nov 11, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iritkatriel Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

markshannon Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hugovk commented Nov 18, 2024

Uh oh!

nascheme commented Nov 18, 2024

Uh oh!

hugovk commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markshannon commented Nov 18, 2024

Uh oh!

freakboy3742 commented Nov 18, 2024

Uh oh!

freakboy3742 commented Nov 18, 2024

Uh oh!

freakboy3742 commented Nov 19, 2024

Uh oh!

freakboy3742 commented Nov 19, 2024

Uh oh!

hugovk commented Nov 19, 2024

Uh oh!

nascheme commented Nov 19, 2024

Uh oh!

Uh oh!

markshannon commented Nov 6, 2024 •

edited

Loading

hugovk commented Nov 18, 2024 •

edited

Loading