-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPython 3.12 embedded in WeeChat causes segfault on subsequent calls to Py_EndInterpreter #116510
Comments
The same thing happens in Kodi in various situations (e.g. xbmc/xbmc#24440 and reports on https://forum.kodi.tv/), the stack trace is basically the same: ASAN output
GDB (with debug symbols)
Are there plans to address this in the 3.12 release cycle? |
I tried unsuccessfully backporting 7a7bce5 as threading tests failed. If someone could attach a 3.12 backport patch I could test Kodi. |
I have a backport of the patch mostly done. I can finish it and then you can test. I think backporting this change would be a good idea given that it seems to work without issue in 3.13 and would solve a few problems for users of Python 3.12. |
ping @Yhg1s |
Hi, |
I can investigate but I need some direction on how to compile weechat with a specific version of Python. E.g. if I have Python installed with prefix /usr/local/python-3.12.4, how do I build weechat that uses it? I'm not familiar with CMake and my attempts at making it use that Python failed. It uses /usr/bin/python3.11 from my OS. |
@nascheme: you must compile Python with
|
I managed to get weechat running with the Python plugin today and did some debugging. The problem occurs during the teardown of the Python interpreter (within
The gc_next and gc_prev pointers on that object (a module method for the "weechat" module) are not valid. Using
The type object being deallocated doesn't seem to be associated with weechat. It is https://gist.github.com/nascheme/e807f03fd15312bbae52595f21ad0957 The idea is to break the reference cycles in the This is a band-aid and not fixing the real problem. More investigation would be needed to find it. It could be a bug in Python or perhaps a reference counting bug in weechat. |
3.12 is going to go security-only relatively soon (sometime in the next few months, according to PEP 693), and this is still causing many problems with subinterpreters (namely, things like lists are completely broken under multithreaded isolated interpreters). If we don't backport the fix, 3.12 will remain with this problem forever. Re-pinging @Yhg1s |
I did a lot of debugging on this crash and here's what I found.
It seems this bug can be avoided by setting According to some discussion with Eric Snow, if |
Fix a bug that can cause a crash when sub-interpreters use "basic" single-phase extension modules. Shared objects could refer to PyGC_Head nodes that had been freed as part of interpreter shutdown.
Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpeter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter.
FTR, this case of single-phase init modules with |
… (#124649) gh-116510: Fix crash during sub-interpreter shutdown (gh-124645) Fix a bug that can cause a crash when sub-interpreters use "basic" single-phase extension modules. Shared objects could refer to PyGC_Head nodes that had been freed as part of interpreter shutdown. (cherry picked from commit 6f9525d) Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>
…rned strings. (pythongh-124646)" This reverts commit dc09a0c.
…rings. (pythongh-124646)" This reverts commit 98b2ed7.
…ortal interned strings (pythongh-124646)" (pythongh-124807) Revert "pythongh-116510: Fix crash due to shared immortal interned strings. (pythongh-124646)" This reverts commit 98b2ed7. (cherry picked from commit 7bdfabe) Co-authored-by: T. Wouters <thomas@python.org>
…terned strings. (pythongh-124541)" This reverts commit 5dd07eb.
… interned strings (gh-124646)" (gh-124807) (#124812) gh-124785: Revert "gh-116510: Fix crash due to shared immortal interned strings (gh-124646)" (gh-124807) Revert "gh-116510: Fix crash due to shared immortal interned strings. (gh-124646)" This reverts commit 98b2ed7. (cherry picked from commit 7bdfabe) Co-authored-by: T. Wouters <thomas@python.org>
…red immortal interned strings (pythongh-124646)" (pythongh-124807) Revert "pythongh-116510: Fix crash due to shared immortal interned strings. (pythongh-124646)" This reverts commit 98b2ed7. (cherry picked from commit 7bdfabe) Co-authored-by: T. Wouters <thomas@python.org>
…ared immortal interned strings (pythongh-124646)" (pythongh-124807)" This reverts commit 7bdfabe.
@nascheme: Thanks a lot for your debugging and findings! I'll try to check if it's safe to set In the last comment you linked to this issue twice. Which PRs did you mean to link to? |
Yes, using I fixed the links to the PRs. There one to fix the GC object headers and one for the immortalized interned strings. The second one is still waiting on the fix to be merged. According to the Kodi folk, both are need to fix the crash there. For Weechat, I think only the first might be needed but maybe it's only luck that the second bug doesn't cause it to crash. |
…4865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS failures identified by gh-124785.
…ythongh-124865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of pythongh-124646 that then addresses the Py_TRACE_REFS failures identified by pythongh-124785. (cherry picked from commit f2cb399) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…ythongh-124865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of pythongh-124646 that then addresses the Py_TRACE_REFS failures identified by pythongh-124785. (cherry picked from commit f2cb399) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…gh-124865) (gh-125709) (GH-125204) * gh-116510: Fix a Crash Due to Shared Immortal Interned Strings (gh-124865) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS failures identified by gh-124785. (cherry picked from commit f2cb399) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com> * [3.13] gh-125286: Share the Main Refchain With Legacy Interpreters (gh-125709) They used to be shared, before 3.12. Returning to sharing them resolves a failure on Py_TRACE_REFS builds. --------- Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
I'll wait with 3.12 until the buildbots run on 3.13. |
…gh-125205) Fix a crash caused by immortal interned strings being shared between sub-interpreters that use basic single-phase init. In that case, the string can be used by an interpreter that outlives the interpreter that created and interned it. For interpreters that share obmalloc state, also share the interned dict with the main interpreter. This is an un-revert of gh-124646 that then addresses the Py_TRACE_REFS failures identified by gh-124785 (i.e. backporting gh-125709 too). (cherry picked from commit f2cb399, AKA gh-124865) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
This value determines the size of the per-module memory area. Setting this value to -1 as it was before this change means that the module has global state and therefore does not support subinterpreters. However, subinterpreters are used to run the Python scripts, so the weechat module has to support subinterpreters. Therefore we should set this value to 0 as no per-module memory is required. This seems to fix the crash reported in weechat#2046 without the need for the workaround added in commit 85c7494 (it does for me when testing with Python 3.12.0 at least). This change came up as a suggestion in cpython's issue tracker where it was pointed out that using modules with m_size set to -1 is not supported in subinterpreters. See these two comments: python/cpython#116510 (comment) python/cpython#116510 (comment) It's not completely clear to me what is required for a module to support subinterpreters and re-initialization (which is required for setting m_size to 0), but https://peps.pythondiscord.com/pep-0489/ says: A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. The only static data we define is of type int and str, so I think it should be fine.
This value determines the size of the per-module memory area. Setting this value to -1 as it was before this change means that the module has global state and therefore does not support subinterpreters. However, subinterpreters are used to run the Python scripts, so the weechat module has to support subinterpreters. Therefore we should set this value to 0 as no per-module memory is required. This seems to fix the crash reported in #2046 without the need for the workaround added in commit 85c7494 (it does for me when testing with Python 3.12.0 at least). This change came up as a suggestion in cpython's issue tracker where it was pointed out that using modules with m_size set to -1 is not supported in subinterpreters. See these two comments: python/cpython#116510 (comment) python/cpython#116510 (comment) It's not completely clear to me what is required for a module to support subinterpreters and re-initialization (which is required for setting m_size to 0), but https://peps.pythondiscord.com/pep-0489/ says: A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. The only static data we define is of type int and str, so I think it should be fine.
Crash report
What happened?
WeeChat embeds CPython in order to run Python scripts inside WeeChat. It can load multiple scripts and they each get their own interpreter. When a script is loaded
Py_NewInterpreter
is called, and when it's unloadedPy_EndInterpreter
is called.With CPython 3.12 loading two scripts and then unloading them in the same order causes a segmentation fault. Interestingly, the segmentation fault doesn't happen if the script that was loaded last is unloaded first.
I bisected this and found it was introduced in commit de64e75. I also noticed that the crash doesn't occur in the main branch, and did another bisect and found it was fixed in commit 7a7bce5.
This issue seems similar to the one reported in #115649 which is also introduced by the same commit, but that one still crashes on the main branch (commit 735fc2c).
I haven't been able to reproduce this outside of WeeChat unfortunately, but here is a backtrace from the crash with WeeChat, with commit de64e75 of CPython and commit ec56a1103f47b15a641ff93528fd6f50025dd524 of WeeChat.
This was produced by creating these two python scripts:
dummy1.py
:dummy2.py
And then running
weechat -t -r '/script load dummy1.py; /script load dummy2.py; /quit'
.Also, here is the issue report for WeeChat: weechat/weechat#2046
Since it's fixed in main it seems there won't be a problem with 3.13, but I wonder if the fix can be backported to 3.12?
CPython versions tested on:
3.12, CPython main branch
Operating systems tested on:
Linux
Output from running 'python -VV' on the command line:
Python 3.12.0a7+ (tags/v3.12.0a7-340-gde64e75616:de64e75616, Mar 8 2024, 19:43:39) [GCC 13.2.1 20230801]
Linked PRs
The text was updated successfully, but these errors were encountered: