-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute the garbage collector only on the eval breaker #97922
Comments
One small consideration: We can opportunistically check if the GC is scheduled to run and run it if we have a request in |
* main: pythongh-68686: Retire eptag ptag scripts (python#98064) pythongh-97922: Run the GC only on eval breaker (python#97920) GitHub Workflows security hardening (python#96492) Add `@ezio-melotti` as codeowner for `.github/`. (python#98079) pythongh-97913 Docs: Add walrus operator to the index (python#97921) [doc] Fix broken links to C extensions accelerating stdlib modules (python#96914) pythongh-97822: Fix http.server documentation reference to test() function (python#98027) pythongh-91052: Add PyDict_Unwatch for unwatching a dictionary (python#98055) pythonGH-98023: Change default child watcher to PidfdChildWatcher on supported systems (python#98024) pythonGH-94182: Run the PidfdChildWatcher on the running loop (python#94184)
* main: (5519 commits) Minor edits to the Descriptor HowTo Guide (pythonGH-24901) Fix link to Lifecycle of a Pull Request in CONTRIBUTING (python#98102) pythonGH-94597: deprecate `SafeChildWatcher`, `FastChildWatcher` and `MultiLoopChildWatcher` child watchers (python#98089) Auto-cancel old builds when new commit pushed to branch (python#98009) pythongh-95011: Migrate syslog module to Argument Clinic (pythonGH-95012) pythongh-68686: Retire eptag ptag scripts (python#98064) pythongh-97922: Run the GC only on eval breaker (python#97920) GitHub Workflows security hardening (python#96492) Add `@ezio-melotti` as codeowner for `.github/`. (python#98079) pythongh-97913 Docs: Add walrus operator to the index (python#97921) [doc] Fix broken links to C extensions accelerating stdlib modules (python#96914) pythongh-97822: Fix http.server documentation reference to test() function (python#98027) pythongh-91052: Add PyDict_Unwatch for unwatching a dictionary (python#98055) pythonGH-98023: Change default child watcher to PidfdChildWatcher on supported systems (python#98024) pythonGH-94182: Run the PidfdChildWatcher on the running loop (python#94184) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) ...
* Rename Lib/test/crashers/ to Lib/test/test_crashers/. * Move Lib/test/test_crashers.py to Lib/test/test_crashers/__init__.py. * test_crashers is no longer skipped and makes sure that scripts do crash, and no simply fail with a non-zero exit code. * Update bogus_code_obj.py to use CodeType.replace(). * Remove Lib/test/crashers/ scripts which no longer crash: * recursive_call.py: fixed by pythongh-89419 * mutation_inside_cyclegc.py: fixed by pythongh-97922 * trace_at_recursion_limit.py: fixed by Python 3.7
* Rename Lib/test/crashers/ to Lib/test/test_crashers/. * Move Lib/test/test_crashers.py to Lib/test/test_crashers/__init__.py. * test_crashers is no longer skipped and makes sure that scripts do crash, and no simply fail with a non-zero exit code. * Update bogus_code_obj.py to use CodeType.replace(). * Scripts crashing Python now uses SuppressCrashReport of test.support to not create coredump files. * Remove Lib/test/crashers/ scripts which no longer crash: * recursive_call.py: fixed by pythongh-89419 * mutation_inside_cyclegc.py: fixed by pythongh-97922 * trace_at_recursion_limit.py: fixed by Python 3.7
* Rename Lib/test/crashers/ to Lib/test/test_crashers/. * Move Lib/test/test_crashers.py to Lib/test/test_crashers/__init__.py. * test_crashers is no longer skipped and makes sure that scripts do crash, and no simply fail with a non-zero exit code. * Update bogus_code_obj.py to use CodeType.replace(). * Scripts crashing Python now uses SuppressCrashReport of test.support to not create coredump files. * Remove Lib/test/crashers/ scripts which no longer crash: * recursive_call.py: fixed by pythongh-89419 * mutation_inside_cyclegc.py: fixed by pythongh-97922 * trace_at_recursion_limit.py: fixed by Python 3.7
Add a regression test for races in the memory allocation profiler. The test is marked skip for now, for a few reasons: - It doesn't trigger the crash in a deterministic amount of time, so it's not really reasonable for CI/local dev loop as-is - It probably benefits more from having the thread sanitizer enabled, which we don't currently do for the memalloc extension I'm adding the test so that we have an actual reproducer of the problem that we can easily run ourselves available to any dd-trace-py developers, and have it actually committed somewhere people can find it. It's currently only really useful for local development. I plan to tweak/optimize some of the synchronization code to reduce memalloc overhead, and we need a reliable reproducer of the crashes the synchronization was meant to fix in order to be confident we don't reintroduce them. The test reproduces the crash fixed by #11460, as well as the exception fixed by #12075. Both issues stem from the same problem: at one point, memalloc had no synchronization beyond the GIL protecting its internal state. It turns out that calling back into C Python APIs, as we do when collecting tracebacks, can in some cases lead to the GIL being released. So we need additional synchronization for state modification that straddles C Python API calls. We previously only reliably saw this in a demo program but weren't able to reproduce it locally. Now that I understand the crash much better, I was able to create a standalone reproducer. The key elements are: allocate a lot, trigger GC a lot (including from memalloc traceback collection), and release the GIL during GC. Important note: this only reliably crashes on Python 3.11. The very specific path to releasing the GIL that we hit was modified in 3.12 and later (see python/cpython#97922). We will probably support 3.11 for a while longer, so it's still worth having this test.
Add a regression test for races in the memory allocation profiler. The test is marked skip for now, for a few reasons: - It doesn't trigger the crash in a deterministic amount of time, so it's not really reasonable for CI/local dev loop as-is - It probably benefits more from having the thread sanitizer enabled, which we don't currently do for the memalloc extension I'm adding the test so that we have an actual reproducer of the problem that we can easily run ourselves available to any dd-trace-py developers, and have it actually committed somewhere people can find it. It's currently only really useful for local development. I plan to tweak/optimize some of the synchronization code to reduce memalloc overhead, and we need a reliable reproducer of the crashes the synchronization was meant to fix in order to be confident we don't reintroduce them. The test reproduces the crash fixed by #11460, as well as the exception fixed by #12075. Both issues stem from the same problem: at one point, memalloc had no synchronization beyond the GIL protecting its internal state. It turns out that calling back into C Python APIs, as we do when collecting tracebacks, can in some cases lead to the GIL being released. So we need additional synchronization for state modification that straddles C Python API calls. We previously only reliably saw this in a demo program but weren't able to reproduce it locally. Now that I understand the crash much better, I was able to create a standalone reproducer. The key elements are: allocate a lot, trigger GC a lot (including from memalloc traceback collection), and release the GIL during GC. Important note: this only reliably crashes on Python 3.11. The very specific path to releasing the GIL that we hit was modified in 3.12 and later (see python/cpython#97922). We will probably support 3.11 for a while longer, so it's still worth having this test.
Add a regression test for races in the memory allocation profiler. The test is marked skip for now, for a few reasons: - It doesn't trigger the crash in a deterministic amount of time, so it's not really reasonable for CI/local dev loop as-is - It probably benefits more from having the thread sanitizer enabled, which we don't currently do for the memalloc extension I'm adding the test so that we have an actual reproducer of the problem that we can easily run ourselves available to any dd-trace-py developers, and have it actually committed somewhere people can find it. It's currently only really useful for local development. I plan to tweak/optimize some of the synchronization code to reduce memalloc overhead, and we need a reliable reproducer of the crashes the synchronization was meant to fix in order to be confident we don't reintroduce them. The test reproduces the crash fixed by #11460, as well as the exception fixed by #12075. Both issues stem from the same problem: at one point, memalloc had no synchronization beyond the GIL protecting its internal state. It turns out that calling back into C Python APIs, as we do when collecting tracebacks, can in some cases lead to the GIL being released. So we need additional synchronization for state modification that straddles C Python API calls. We previously only reliably saw this in a demo program but weren't able to reproduce it locally. Now that I understand the crash much better, I was able to create a standalone reproducer. The key elements are: allocate a lot, trigger GC a lot (including from memalloc traceback collection), and release the GIL during GC. Important note: this only reliably crashes on Python 3.11. The very specific path to releasing the GIL that we hit was modified in 3.12 and later (see python/cpython#97922). We will probably support 3.11 for a while longer, so it's still worth having this test.
Currently, the GC can be executed on every object allocation. This has been historically the source of many problems because it can trigger a GC run in points where the VM is in an inconsistent state. This includes critical points of the eval loop but also during complex object creation since the GC can run while creating sub-elements of the final result meanwhile the object is not fully initialized.
To improve the situation, we can schedule a GC run on object allocation but the GC will only run then on the eval breaker in a similar fashion we currently use to run signal handlers, do GIL switch and run pending callbacks.
The text was updated successfully, but these errors were encountered: