Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC aborts in debug no-gil build #126312

Closed
devdanzin opened this issue Nov 1, 2024 · 7 comments
Closed

GC aborts in debug no-gil build #126312

devdanzin opened this issue Nov 1, 2024 · 7 comments
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@devdanzin
Copy link
Contributor

devdanzin commented Nov 1, 2024

Crash report

What happened?

In debug no-gil builds with PYTHON_GIL=0, it's possible to trigger the three assertions here with simple code:

_PyObject_ASSERT_WITH_MSG(op, !gc_is_unreachable(op),
"object should not be marked as unreachable yet");
if (_Py_REF_IS_MERGED(op->ob_ref_shared)) {
_PyObject_ASSERT_WITH_MSG(op, op->ob_tid == 0,
"merged objects should have ob_tid == 0");
}
else if (!_Py_IsImmortal(op)) {
_PyObject_ASSERT_WITH_MSG(op, op->ob_tid != 0,

For this code:

import gc
gc.freeze()
gc.is_finalized(lambda: None)
gc.collect()
gc.unfreeze()
gc.collect()

We get:

Python/gc_free_threading.c:550: validate_refcounts: Assertion "!gc_is_unreachable(op)" failed: object should not be marked as unreachable yet
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x200007373f0
object refcount : 1152921504606846977
object type     : 0x55adadc26660
object type name: dict
object repr     : {'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x20000330a20>, '__spec__': None, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/home/fusil/python-61/gc-assertion-abort-2/source.py', '__cached__': None, 'gc': <module 'gc' (built-in)>}

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007f1b47a64740 (most recent call first):
  Garbage-collecting
  File "/home/fusil/python-61/gc-assertion-abort-2/source.py", line 6 in <module>
Aborted

For this code:

from threading import Thread
import gc

gc.freeze()
alive = [Thread(target=gc.freeze, args=())]
gc.collect()
gc.unfreeze()
gc.collect()

We get:

Python/gc_free_threading.c:554: validate_refcounts: Assertion "op->ob_tid == 0" failed: merged objects should have ob_tid == 0
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x20000278210
object refcount : 1152921504606846994
object type     : 0x5616274befc0
object type name: type
object repr     : <class '_thread.lock'>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007f34ed160740 (most recent call first):
  Garbage-collecting
  File "/home/fusil/python-104/gc-abort-assertion/source.py", line 8 in <module>
Aborted

Lastly, for code I haven't minimized yet (but can, if it helps), we get:

Python/gc_free_threading.c:558: validate_refcounts: Assertion "op->ob_tid != 0" failed: unmerged objects should have ob_tid != 0
Enable tracemalloc to get the memory block allocation traceback

object address  : 0x200007980b0
object refcount : 1152921504606846980
object type     : 0x5637720b4f40
object type name: builtin_function_or_method
object repr     : <built-in function collect>

Fatal Python error: _PyObject_AssertFailed: _PyObject_AssertFailed
Python runtime state: initialized

Current thread 0x00007f63aef4c740 (most recent call first):
  Garbage-collecting
  File "/home/fusil/python-106/gc-abort-assertion-sigabrt-37/source.py", line 38 in callMethod
  File "/home/fusil/python-106/gc-abort-assertion-sigabrt-37/source.py", line 42 in callFunc
  File "/home/fusil/python-106/gc-abort-assertion-sigabrt-37/source.py", line 247 in <module>
Aborted

Found using fusil by @vstinner.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a1+ experimental free-threading build (heads/main:d467d9246c, Nov 1 2024, 09:05:56) [GCC 11.4.0]

Linked PRs

@devdanzin devdanzin added the type-crash A hard crash of the interpreter, possibly with a core dump label Nov 1, 2024
@ZeroIntensity ZeroIntensity added topic-free-threading 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir 3.13 bugs and security fixes labels Nov 1, 2024
@ZeroIntensity
Copy link
Member

Confirmed on the main branch and 3.13. Interestingly, this doesn't seem to happen on any other objects passed to is_finalized, and not even objects of the same type--only lambdas cause this. I did see that the reference count of a lambda is significantly high, but I'm guessing that's just a result of the DRC bits. I'm investigating to see what's going on.

@colesbury
Copy link
Contributor

The easiest fix would be to disallow gc.freeze() while a GC is in progress.

@ZeroIntensity
Copy link
Member

That doesn't seem to fix it on my end. It looks like freezing is entirely broken on the free-threaded builds:

import gc
gc.freeze()
0/0  # Abort

@rruuaanng
Copy link
Contributor

I will test it and try to fix it when I have time. (This does not affect other people to fix it.)

@ZeroIntensity
Copy link
Member

Ah, I found the problem. The free-threaded GC only ignores a block if that object itself is frozen, but triggering the tp_traverse of a non-frozen object can traverse over a frozen one, and that breaks its reference count. I've created #126338 as a fix.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Nov 2, 2024

So, there are actually two bugs at play here. One is the problem that I described above, but then another is that objects that have deferred reference counting enabled seem to have some weird issues when getting frozen. A small reproducer for that:

import gc
import unittest


class Test(unittest.TestCase):
    def test_something(self):
        gc.freeze()
        gc.collect()
        gc.unfreeze()


if __name__ == "__main__":
    unittest.main()

I'd like my PR to land first before fixing that.

Edit: OK, nevermind. As it turns out, the fix was easy. My PR now fixes freezing for DRC as well.

@python python deleted a comment from LocoabordoR8 Nov 2, 2024
@python python deleted a comment from LocoabordoR8 Nov 2, 2024
vstinner pushed a commit that referenced this issue Nov 15, 2024
…126338)

Also, _PyGC_Freeze() no longer freezes unreachable objects.

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
ZeroIntensity added a commit to ZeroIntensity/cpython that referenced this issue Nov 15, 2024
…eaded build (pythonGH-126338)

Also, _PyGC_Freeze() no longer freezes unreachable objects.

(cherry picked from commit d4c72fe)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
vstinner pushed a commit that referenced this issue Nov 15, 2024
…build (GH-126338) (#126866)

* Fix merge conflicts.

* [3.13] gh-126312: Don't traverse frozen objects on the free-threaded build (GH-126338)

Also, _PyGC_Freeze() no longer freezes unreachable objects.

(cherry picked from commit d4c72fe)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>

---------

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
@vstinner
Copy link
Member

The two reproducer scripts of the first message no longer crash Python. I close the issue.

Thanks @devdanzin for the bug report and @ZeroIntensity for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants