Skip to content

faulthandler will hang the process with a TSAN and free-thread build Python #120696

Open
@aisk

Description

@aisk

Bug report

Bug description:

I tried to run the TSAN check on macOS: #120502

However, the test_capi.test_mem will hang indefinitely with the TSAN build on my local machine or on GHA: https://github.com/python/cpython/actions/runs/9517250225/job/26235426291?pr=120502

After some investigation, I found that test_pymem_malloc_without_gil and test_pyobject_malloc_without_gil are causing the hang. These tests simply execute ./python.exe -X faulthandler -c "import _testcapi; _testcapi.pymem_malloc_without_gil()" under the hood.

The call stack for the hung process:

Call graph:
    2035 Thread_345029   DispatchQueue_1: com.apple.main-thread  (serial)
      2035 start  (in dyld) + 1942  [0x7ff8106a3386]
        2035 main  (in python.exe) + 33  [0x102ef5421]  python.c:15
          2035 Py_BytesMain  (in python.exe) + 74  [0x1033b47da]  main.c:773
            2035 pymain_main  (in python.exe) + 414  [0x1033b475e]  main.c:749
              2035 Py_RunMain  (in python.exe) + 3602  [0x1033b3992]  main.c:719
                2035 _PyRun_SimpleStringFlagsWithName  (in python.exe) + 215  [0x103375f37]  pythonrun.c:516
                  2035 run_mod  (in python.exe) + 2199  [0x103379f17]  pythonrun.c:1377
                    2035 run_eval_code_obj  (in python.exe) + 265  [0x10337a359]  pythonrun.c:1292
                      2035 PyEval_EvalCode  (in python.exe) + 198  [0x103252f06]  ceval.c:599
                        2035 _PyEval_Vector  (in python.exe) + 773  [0x1032532b5]  ceval.c:1819
                          2035 _PyEval_EvalFrameDefault  (in python.exe) + 24778  [0x1032595da]  generated_cases.c.h:813
                            2035 PyObject_Vectorcall  (in python.exe) + 76  [0x102ff6d0c]  call.c:327
                              2035 _PyObject_VectorcallTstate  (in python.exe) + 270  [0x102ff50ae]  pycore_call.h:168
                                2035 cfunction_vectorcall_NOARGS  (in python.exe) + 620  [0x1030baf5c]  methodobject.c:484
                                  2035 pymem_malloc_without_gil  (in _testcapi.cpython-314td-darwin.so) + 34  [0x107f1ad12]  mem.c:510
                                    2035 PyMem_Malloc  (in python.exe) + 78  [0x1030edbfe]  obmalloc.c:981
                                      2035 _PyMem_DebugMalloc  (in python.exe) + 79  [0x1030f080f]  obmalloc.c:2875
                                        2035 _Py_FatalErrorFunc  (in python.exe) + 72  [0x103339a48]  pylifecycle.c:3093
                                          2035 fatal_error  (in python.exe) + 1287  [0x10333a3f7]  pylifecycle.c:3059
                                            2035 _Py_DumpExtensionModules  (in python.exe) + 198  [0x103339b76]  pylifecycle.c:2929
                                              2035 ???  (in <unknown binary>)  [0xcdcdcdcdcdcdcdcd]
                                                2035 _sigtramp  (in libsystem_platform.dylib) + 29  [0x7ff810a5c37d]
                                                  2035 sighandler(int, __sanitizer::__sanitizer_siginfo*, void*)  (in libclang_rt.tsan_osx_dynamic.dylib) + 377  [0x1041e83f9]
                                                    2035 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*)  (in libclang_rt.tsan_osx_dynamic.dylib) + 255  [0x1041e7f0f]
                                                      2035 faulthandler_fatal_error  (in python.exe) + 638  [0x1033baaee]  faulthandler.c:326
                                                        2035 _Py_DumpExtensionModules  (in python.exe) + 198  [0x103339b76]  pylifecycle.c:2929
                                                          2035 PyDict_Next  (in python.exe) + 182  [0x1030863e6]  dictobject.c:2886
                                                            2035 _PyCriticalSection_BeginSlow  (in python.exe) + 80  [0x1032bac30]  critical_section.c:14

I discovered that removing the -X faulthandler option resolves the hang. Another notable observation is that the hung process consumes 100% CPU time, but attaching lldb to it normalizes the CPU usage.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions