Skip to content

faulthandler will hang the process with a TSAN and free-thread build Python #120696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
aisk opened this issue Jun 18, 2024 · 6 comments
Open

faulthandler will hang the process with a TSAN and free-thread build Python #120696

aisk opened this issue Jun 18, 2024 · 6 comments
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@aisk
Copy link
Contributor

aisk commented Jun 18, 2024

Bug report

Bug description:

I tried to run the TSAN check on macOS: #120502

However, the test_capi.test_mem will hang indefinitely with the TSAN build on my local machine or on GHA: https://github.com/python/cpython/actions/runs/9517250225/job/26235426291?pr=120502

After some investigation, I found that test_pymem_malloc_without_gil and test_pyobject_malloc_without_gil are causing the hang. These tests simply execute ./python.exe -X faulthandler -c "import _testcapi; _testcapi.pymem_malloc_without_gil()" under the hood.

The call stack for the hung process:

Call graph:
    2035 Thread_345029   DispatchQueue_1: com.apple.main-thread  (serial)
      2035 start  (in dyld) + 1942  [0x7ff8106a3386]
        2035 main  (in python.exe) + 33  [0x102ef5421]  python.c:15
          2035 Py_BytesMain  (in python.exe) + 74  [0x1033b47da]  main.c:773
            2035 pymain_main  (in python.exe) + 414  [0x1033b475e]  main.c:749
              2035 Py_RunMain  (in python.exe) + 3602  [0x1033b3992]  main.c:719
                2035 _PyRun_SimpleStringFlagsWithName  (in python.exe) + 215  [0x103375f37]  pythonrun.c:516
                  2035 run_mod  (in python.exe) + 2199  [0x103379f17]  pythonrun.c:1377
                    2035 run_eval_code_obj  (in python.exe) + 265  [0x10337a359]  pythonrun.c:1292
                      2035 PyEval_EvalCode  (in python.exe) + 198  [0x103252f06]  ceval.c:599
                        2035 _PyEval_Vector  (in python.exe) + 773  [0x1032532b5]  ceval.c:1819
                          2035 _PyEval_EvalFrameDefault  (in python.exe) + 24778  [0x1032595da]  generated_cases.c.h:813
                            2035 PyObject_Vectorcall  (in python.exe) + 76  [0x102ff6d0c]  call.c:327
                              2035 _PyObject_VectorcallTstate  (in python.exe) + 270  [0x102ff50ae]  pycore_call.h:168
                                2035 cfunction_vectorcall_NOARGS  (in python.exe) + 620  [0x1030baf5c]  methodobject.c:484
                                  2035 pymem_malloc_without_gil  (in _testcapi.cpython-314td-darwin.so) + 34  [0x107f1ad12]  mem.c:510
                                    2035 PyMem_Malloc  (in python.exe) + 78  [0x1030edbfe]  obmalloc.c:981
                                      2035 _PyMem_DebugMalloc  (in python.exe) + 79  [0x1030f080f]  obmalloc.c:2875
                                        2035 _Py_FatalErrorFunc  (in python.exe) + 72  [0x103339a48]  pylifecycle.c:3093
                                          2035 fatal_error  (in python.exe) + 1287  [0x10333a3f7]  pylifecycle.c:3059
                                            2035 _Py_DumpExtensionModules  (in python.exe) + 198  [0x103339b76]  pylifecycle.c:2929
                                              2035 ???  (in <unknown binary>)  [0xcdcdcdcdcdcdcdcd]
                                                2035 _sigtramp  (in libsystem_platform.dylib) + 29  [0x7ff810a5c37d]
                                                  2035 sighandler(int, __sanitizer::__sanitizer_siginfo*, void*)  (in libclang_rt.tsan_osx_dynamic.dylib) + 377  [0x1041e83f9]
                                                    2035 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*)  (in libclang_rt.tsan_osx_dynamic.dylib) + 255  [0x1041e7f0f]
                                                      2035 faulthandler_fatal_error  (in python.exe) + 638  [0x1033baaee]  faulthandler.c:326
                                                        2035 _Py_DumpExtensionModules  (in python.exe) + 198  [0x103339b76]  pylifecycle.c:2929
                                                          2035 PyDict_Next  (in python.exe) + 182  [0x1030863e6]  dictobject.c:2886
                                                            2035 _PyCriticalSection_BeginSlow  (in python.exe) + 80  [0x1032bac30]  critical_section.c:14

I discovered that removing the -X faulthandler option resolves the hang. Another notable observation is that the hung process consumes 100% CPU time, but attaching lldb to it normalizes the CPU usage.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

@aisk aisk added the type-bug An unexpected behavior, bug, or error label Jun 18, 2024
@colesbury
Copy link
Contributor

This is in the free-threaded build? (I see _PyCriticalSection_BeginSlow in the call stack)

@aisk
Copy link
Contributor Author

aisk commented Jun 18, 2024

This is in the free-threaded build? (I see _PyCriticalSection_BeginSlow in the call stack)

Yes, I didn't try it without the free-threaded build, I'll try it now.

@aisk
Copy link
Contributor Author

aisk commented Jun 18, 2024

The free-thread build doesn't have this issue. @colesbury

@aisk aisk changed the title faulthandler will hang the process with a TSAN build Python faulthandler will hang the process with a TSAN and free-thread build Python Jun 18, 2024
@colesbury
Copy link
Contributor

Faulthandler has other issues with the free-threaded build as well. Dumping tracebacks for other threads is a lot more likely to itself crash when the GIL disabled.

@aisk
Copy link
Contributor Author

aisk commented Jun 18, 2024

The free-thread build doesn't have this issue. @colesbury

Sorry, I mean non free-thread build doesn't have this issue 😂

@kumaraditya303
Copy link
Contributor

I think was fixed by #128400

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants