Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TraceRefs build] Tests fail: _PyRefchain_Remove: Assertion `value == REFCHAIN_VALUE' failed #124785

Closed
vstinner opened this issue Sep 30, 2024 · 10 comments
Assignees
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes release-blocker tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

vstinner commented Sep 30, 2024

Multiple tests fail on a TraceRefs build: https://buildbot.python.org/#/builders/484/builds/5981

10 tests failed:
    test_atexit test_capi test_datetime test_embed test_exceptions
    test_import test_interpreters test_multibytecodec test_syslog
    test_threading

Examples of errors:

FAIL: test_memory_error_in_subinterp (test.test_exceptions.ExceptionTests.test_memory_error_in_subinterp)

0:03:13 load avg: 1.23 [ 92/479/2] test_datetime worker non-zero exit code (Exit code -6 (SIGABRT))

python: Objects/object.c:196: _PyRefchain_Remove: Assertion `value == REFCHAIN_VALUE' failed.

0:08:52 load avg: 1.82 [217/479/5] test_atexit worker non-zero exit code (Exit code -6 (SIGABRT))

Linked PRs

@vstinner vstinner added the tests Tests in the Lib/test dir label Sep 30, 2024
@vstinner
Copy link
Member Author

The bug can be reproduced with:

import _testcapi; _testcapi.run_in_subinterp(f"raise ValueError")

The regression was introduced by commit 98b2ed7:

commit 98b2ed7e239c807f379cd2bf864f372d79064aac
Author: Neil Schemenauer <nas-github@arctrix.com>
Date:   Thu Sep 26 19:16:51 2024 -0700

    gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)

cc @nascheme @ericsnowcurrently @encukou

@nascheme
Copy link
Member

My guess would be it's due to the incref on the interned_dict from the main interpreter. Perhaps we should remove that incref and corresponding decref. Eric told me to leave it out when he was helping with the patch.

@vstinner vstinner changed the title [TraceRefs build] [TraceRefs build] Tests fail: _PyRefchain_Remove: Assertion `value == REFCHAIN_VALUE' failed Sep 30, 2024
@nascheme
Copy link
Member

Unfortunately (because that should be a simple fix), that doesn't seem to be the problem. The traceback I see:

0x00007f162838aeb2 in __GI___assert_fail (
    assertion=assertion@entry=0x5625f4bf4592 "value == REFCHAIN_VALUE", 
    file=file@entry=0x5625f4bf457e "../Objects/object.c", line=line@entry=195, 
    function=function@entry=0x5625f4bf52c0 <__PRETTY_FUNCTION__.12> "_PyRefchain_Remove")
    at ./assert/assert.c:101
#6  0x00005625f49e4cbf in _PyRefchain_Remove (interp=<optimized out>, obj=obj@entry=0x7f1626d4afc0)
    at ../Objects/object.c:195
#7  0x00005625f49e686f in _Py_ForgetReference (op=op@entry=0x7f1626d4afc0)
    at ../Objects/object.c:2501
#8  0x00005625f49e5ab4 in _Py_Dealloc (op=op@entry=0x7f1626d4afc0) at ../Objects/object.c:2876
#9  0x00005625f49ce56b in Py_DECREF (filename=filename@entry=0x5625f4bbc781 "../Include/object.h", 
    lineno=lineno@entry=1042, op=0x7f1626d4afc0) at ../Include/object.h:934
#10 0x00005625f49ce58a in Py_XDECREF (op=<optimized out>) at ../Include/object.h:1042
#11 0x00005625f49ce63e in dictkeys_decref (interp=interp@entry=0x5625f4e2f7e0 <_PyRuntime+104352>, 
    dk=dk@entry=0x5625f6fba730, use_qsbr=use_qsbr@entry=false) at ../Objects/dictobject.c:496
#12 0x00005625f49d6aa5 in clear_lock_held (op=op@entry=0x7f162824c050)
    at ../Objects/dictobject.c:2780
#13 0x00005625f49d6b7f in PyDict_Clear (op=op@entry=0x7f162824c050) at ../Objects/dictobject.c:2804
#14 0x00005625f4a3154b in clear_interned_dict (
    interp=interp@entry=0x5625f4e2f7e0 <_PyRuntime+104352>) at ../Objects/unicodeobject.c:323

That suggests to me that when the main interpreter is cleaning up, it's confused by the immortal interned strings coming from sub-interpreters. I'm still working on it.

@nascheme nascheme added 3.13 bugs and security fixes 3.14 new features, bugs and security fixes 3.12 bugs and security fixes labels Sep 30, 2024
@nascheme
Copy link
Member

nascheme commented Sep 30, 2024

@pablogsal thinks breaking the trace-refs build could be a release blocker for 3.13 so let's try to fix this quickly. I noticed that the logic that decides to copy the extension module dict is not the same as my has_shared_intern_dict() function. I check for interp->feature_flags & Py_RTFLAGS_USE_MAIN_OBMALLOC while import.c looks at Py_RTFLAGS_MULTI_INTERP_EXTENSIONS.

Changing my function to look at that flag doesn't fix it though. It seems run_in_subinterp_with_config() used by the unit tests will set both those flags sometimes. What i really want to know is if PyDict_Update() is going to be used when the extension module is loaded (the m_size == -1 case). However, you don't know that until the module is actually imported and that's too late. So I think you have to check if basic single-phase init modules are allowed. If they are, you have to share the intern_dict with the main interpreter or you will have refcount bugs due to immortalization.

@ericsnowcurrently
Copy link
Member

Would it help to make use of _Py_ForgetReference() in clear_interned_dict(), to forget the dict and/or each interned string?

@gpshead
Copy link
Member

gpshead commented Sep 30, 2024

(labelling it as release blocker to force the RM to decide, sorry =)

@nascheme
Copy link
Member

I think commit 98b2ed7 should get reverted if trace-refs failures is a release blocker. I'm not coming up with a simple fix for that. Reverting means that Python embedding programs like Weechat and Kodi will likely crash due to them using "basic single-phase init" extensions. So, I'm not really sure the cure is worse than the disease, so to speak.

I'm working on an alternative to 98b2ed7 that just leaks those strings. I think that could be simpler and safer to include in 3.12 and 3.13. Sharing the interned_strings with the main interpreter could have other unexpected side-effects and leaking is perhaps better.

@nascheme
Copy link
Member

Would it help to make use of _Py_ForgetReference() in clear_interned_dict(), to forget the dict and/or each interned string?

I'm not sure but i don't think so. The problem is likely here:

#ifdef Py_DEBUG
            // Skip the Immortal Instance check and restore
            // the two references (key and value) ignored
            // by PyUnicode_InternInPlace().
            _Py_SetMortal(s, 2);
#ifdef Py_REF_DEBUG
            /* let's be pedantic with the ref total */
            _Py_IncRefTotal(_PyThreadState_GET());
            _Py_IncRefTotal(_PyThreadState_GET());
#endif

when this runs in the main interpreter. I think those _Py_IncRefTotal calls are not correct since the string might have been interned in a subinterpreter. The best fix I've devised for this is to put the subinterpreter interned strings into their own dict, owned by the main interpreter, e.g. interp_strings_legacy. This would get handled differently when the main interpreter shutdowns. The code is fairly complex though and I think it's too scary to put in the 3.13 RC.

nascheme added a commit to nascheme/cpython that referenced this issue Sep 30, 2024
The pythongh-124646 fix has the issue that it breaks the trace-refs debug
build.  This is a simpler fix that avoids the use-after-free crashes by
just leaking immortal interned strings allocated by legacy
subinterpreters.
nascheme added a commit to nascheme/cpython that referenced this issue Sep 30, 2024
The pythonGH-124646 fix has the issue that it breaks the trace-refs debug
build.  This is a simpler fix that avoids the use-after-free crashes by
just leaking immortal interned strings allocated by legacy
subinterpreters.
nascheme added a commit to nascheme/cpython that referenced this issue Sep 30, 2024
nascheme pushed a commit that referenced this issue Sep 30, 2024
…ed strings (gh-124646)" (gh-124807)

Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)"

This reverts commit 98b2ed7.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 30, 2024
…ortal interned strings (pythongh-124646)" (pythongh-124807)

Revert "pythongh-116510: Fix crash due to shared immortal interned strings. (pythongh-124646)"

This reverts commit 98b2ed7.
(cherry picked from commit 7bdfabe)

Co-authored-by: T. Wouters <thomas@python.org>
Yhg1s added a commit that referenced this issue Oct 1, 2024
… interned strings (gh-124646)" (gh-124807) (#124812)

gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807)

Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)"

This reverts commit 98b2ed7.
(cherry picked from commit 7bdfabe)

Co-authored-by: T. Wouters <thomas@python.org>
Yhg1s added a commit to Yhg1s/cpython that referenced this issue Oct 1, 2024
…red immortal interned strings (pythongh-124646)" (pythongh-124807)

Revert "pythongh-116510: Fix crash due to shared immortal interned strings. (pythongh-124646)"

This reverts commit 98b2ed7.
(cherry picked from commit 7bdfabe)

Co-authored-by: T. Wouters <thomas@python.org>
Yhg1s pushed a commit that referenced this issue Oct 1, 2024
… immortal interned strings. (gh-124541)" (#124814)

Revert "[3.12] gh-116510: Fix a crash due to shared immortal interned strings. (gh-124541)"

This reverts commit 5dd07eb.
@Yhg1s
Copy link
Member

Yhg1s commented Oct 1, 2024

This should now be fixed, but of course the original issue is still open.

@Yhg1s Yhg1s closed this as completed Oct 1, 2024
ericsnowcurrently added a commit to ericsnowcurrently/cpython that referenced this issue Oct 1, 2024
@ericsnowcurrently
Copy link
Member

FYI, it looks like the problem is that the immortal interned strings were added to the subinterpreter's "refchain" but should have been on the main interpreter's, since they were now tied to the main interpreter's lifecycle. The failing assertion1 was hit while finalizing the main interpreter because the refchain the object was originally bound to belonged to the subinterpreter, which was already finalized.

I've posted a fix: gh-124865.

(This does make we wonder if all immortal objects from legacy interpreters (sharing the main interpreter's obmalloc) should be forced onto the main interpreter's refchain. We can work that out separately.)

Footnotes

  1. python: Objects/object.c:196: _PyRefchain_Remove: Assertion 'value == REFCHAIN_VALUE' failed.

ericsnowcurrently added a commit that referenced this issue Oct 9, 2024
…4865)

Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS
failures identified by gh-124785.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 9, 2024
…ythongh-124865)

Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of pythongh-124646 that then addresses the Py_TRACE_REFS
failures identified by pythongh-124785.
(cherry picked from commit f2cb399)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 9, 2024
…ythongh-124865)

Fix a crash caused by immortal interned strings being shared between
sub-interpreters that use basic single-phase init. In that case, the string
can be used by an interpreter that outlives the interpreter that created and
interned it. For interpreters that share obmalloc state, also share the
interned dict with the main interpreter.

This is an un-revert of pythongh-124646 that then addresses the Py_TRACE_REFS
failures identified by pythongh-124785.
(cherry picked from commit f2cb399)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes release-blocker tests Tests in the Lib/test dir
Projects
Development

No branches or pull requests

5 participants