Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-126491: GC: Mark objects reachable from roots before doing cycle collection #126502

Merged
merged 35 commits into from
Nov 18, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Nov 6, 2024

This PR:

  • Performs a marking step before the incremental cycle collection
  • Rescans the stack before each increment
  • Removes lazy dict tracking as the lazy tracking optimization no longer pays off.

Performance is excellent. Speedup is 4%, with a 100% speedup on one GC benchmark.

Stats show the GC is run more frequently and collects more objects, but does only ~60% of the work.

This PR also:

  • Makes detaching an object's dict a bit more robust in case of a memory error.
  • Increases the threshold in a GC test. This test is to check for uncontrolled growth, so the exact threshold doesn't matter.

Python/gc.c Fixed Show fixed Hide fixed
Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bedevere-app
Copy link

bedevere-app bot commented Nov 11, 2024

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Include/internal/pycore_gc.h Show resolved Hide resolved
Include/internal/pycore_runtime_init.h Outdated Show resolved Hide resolved
}
}
if (!start && frame->visited) {
// If this frame has already been visited, then the lower frames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a warning here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a false positive. I think it is complaining because we don't set visited here, but this line is unreachable for entry frames.

I'll initialize visited to keep it quiet and in case the code gets reordered.

@markshannon markshannon merged commit b0fcc2c into python:main Nov 18, 2024
59 checks passed
@markshannon markshannon deleted the mark-first-gc branch November 18, 2024 14:36
@hugovk
Copy link
Member

hugovk commented Nov 18, 2024

Hmm, looks like this commit b0fcc2c has caused test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation failures for iOS and Android:

Reminder the 3.14 alpha 2 is due tomorrow (2024-11-19): https://buildbot.python.org/#/release_status

cc @freakboy3742 @mhsmith

@nascheme
Copy link
Member

Extracted from the Android failure:

FAIL: test_incremental_gc_handles_fast_cycle_creation (test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation)
----------------------------------------------------------------------          
Traceback (most recent call last):                                              
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/contextlib.py", line 85, in inner
    return func(*args, **kwds)                                                  
  File "/data/user/0/org.python.testbed/files/python/lib/python3.14/test/test_gc.py", line 1175, in test_incremental_gc_handles_fast_cycle_creation 
    self.assertLess(new_objects, 25_000)                                        
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^                                        
AssertionError: 25032 not less than 25000                                       
                                                                                
----------------------------------------------------------------------          
Ran 53 tests in 3.744s                                                          
                                                                                
FAILED (failures=1, skipped=8)                                                  
test test_gc failed                                                             
test_gc failed (1 failure)                                                      
1 test failed again:                                                            
    test_gc         

@hugovk
Copy link
Member

hugovk commented Nov 18, 2024

Also some tier 1 and 2 "test_import failed (reference leak)" that include this change (and some others) are now failing:

Plus s390x RHEL8 + RHEL9 and AMD64 FreeBSD refleaks buildbots.

I'll open a draft revert PR so we can run the buildbots on it.

Edit: #126983

hugovk added a commit to hugovk/cpython that referenced this pull request Nov 18, 2024
@markshannon
Copy link
Member Author

Hmm, looks like this commit b0fcc2c has caused test.test_gc.IncrementalGCTests.test_incremental_gc_handles_fast_cycle_creation failures for iOS and Android:

I don't have ready access to an iOS or Android device to test on.
The failing test is designed to check that the heap doesn't keep growing. If the initial heap size is significantly smaller than on linux/windows then the heap would grow a bit more before stabilizing. Setting the limit to 27k or 30k should work.
It might be worth double checking that the heap doesn't continue to grow by increasing the number of iterations from 20k to 100k+

@freakboy3742
Copy link
Contributor

I've done some local testing on iOS. The test_gc module passes as-is if you run it by itself. If I increase the loop count in the failing test to 100k, the test fails with exactly the same object count (25,032).

My guess is that this one of those issues that iOS/Android expose because they don't run the test suite in parallel - my guess is that you'd likely be able to reproduce this failure on a desktop machine if you run the test suite with --single-process.

On that basis, following @markshannon's suggestion I'll increase the object count to 26k. PR incoming.

@freakboy3742
Copy link
Contributor

#126984 is a PR increasing the test threshold.

@freakboy3742
Copy link
Contributor

I've abandoned the "increase the threshold" aapproach - that's clearly not enough to avoid the issue. The threshold needed to pass the test is clearly much higher than 30k.

The good news (well... good for me, not so good for @markshannon 😄 ) is that I can reproduce this bug on macOS. If you run the test suite sequentially in a single process (./python.exe -m test -W --single-process), test_gc fails with the same error on macOS as is being reported on iOS and Android.

You don't see this error if you run test_gc in isolation - the 25k check passes. This is also consistent with the behavior on iOS and Android. You have to run the full sequence of (alphabetical) tests up to test_gc to see the failure. I'll take a quick pass to see if I can narrow down a smaller (and faster) reproduction case.

@freakboy3742
Copy link
Contributor

I'm able to reliably reproduce the failure on macOS with ./python.exe -m test test_email test_fnmatch test_frame test_gc.

hugovk added a commit that referenced this pull request Nov 19, 2024
@hugovk
Copy link
Member

hugovk commented Nov 19, 2024

I've reverted this because it's release day and we need to keep tier-1 green anyway: #126983.

Let's make sure to summon the buildbots next time :)

@nascheme
Copy link
Member

I'm able to re-produce on Linux with this set of tests: test_dis test_email test_funcattrs test_functools test_gc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants