Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

race condition in threading when interpreter finalized while daemon thread runs (thread sanitizer identified) #124878

Open
gpshead opened this issue Oct 2, 2024 · 3 comments
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Oct 2, 2024

Bug report

Bug description:

Using the code in #105805 with the newly added test.test_threading.ThreadTests.test_finalize_daemon_thread_hang test enabled you can reproduce this thread sanitizer crash as follows (I used clang 18):

This also happens if I just take the new test and corresponding Modules/_testcapimodule.c change and patch it on top of main - it's a pre-existing bug not related to my PR adding the new test. (Filing now before I check this in decorated to be skipped under sanitizers so I can reference the issue number in a comment)

CC=clang LD=lld ./configure --with-thread-sanitizer --with-pydebug && make -j8
./python -m test test_threading -v
...
======================================================================
FAIL: test_finalize_daemon_thread_hang (test.test_threading.ThreadTests.test_finalize_daemon_thread_hang)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/greg/oss/cpython/Lib/test/test_threading.py", line 1236, in test_finalize_daemon_thread_hang
    assert_python_ok("-c", script)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/home/greg/oss/cpython/Lib/test/support/script_helper.py", line 182, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/home/greg/oss/cpython/Lib/test/support/script_helper.py", line 167, in _assert_python
    res.fail(cmd_line)
    ~~~~~~~~^^^^^^^^^^
  File "/home/greg/oss/cpython/Lib/test/support/script_helper.py", line 80, in fail
    raise AssertionError(f"Process return code is {exitcode}\n"
    ...<10 lines>...
                         f"---")
AssertionError: Process return code is 66
command line: ['/home/greg/oss/b/python', '-X', 'faulthandler', '-I', '-c', "\nimport os\nimport sys\nimport threading\nimport time\nimport _testcapi\n\nlock = threa
ding.Lock()\nlock.acquire()\nthread_started_event = threading.Event()\ndef thread_func():\n    try:\n        thread_started_event.set()\n        _testcapi.finalize_t
hread_hang(lock.acquire)\n    finally:\n        # Control must not reach here.\n        os._exit(2)\n\nt = threading.Thread(target=thread_func)\nt.daemon = True\nt.s
tart()\nthread_started_event.wait()\n# Sleep to ensure daemon thread is blocked on `lock.acquire`\n#\n# Note: This test is designed so that in the unlikely case that
\n# `0.1` seconds is not sufficient time for the thread to become\n# blocked on `lock.acquire`, the test will still pass, it just\n# won't be properly testing the th
read behavior during\n# finalization.\ntime.sleep(0.1)\n\ndef run_during_finalization():\n    # Wake up daemon thread\n    lock.release()\n    # Sleep to give the da
emon thread time to crash if it is going\n    # to.\n    #\n    # Note: If due to an exceptionally slow execution this delay is\n    # insufficient, the test will st
ill pass but will simply be\n    # ineffective as a test.\n    time.sleep(0.1)\n    # If control reaches here, the test succeeded.\n    os._exit(0)\n\n# Replace sys.
stderr.flush as a way to run code during finalization\norig_flush = sys.stderr.flush\ndef do_flush(*args, **kwargs):\n    orig_flush(*args, **kwargs)\n    if not sys
.is_finalizing:\n        return\n    sys.stderr.flush = orig_flush\n    run_during_finalization()\n\nsys.stderr.flush = do_flush\n\n# If the follow exit code is reta
ined, `run_during_finalization`\n# did not run.\nsys.exit(1)\n"]

stdout:
---

---

stderr:
---
==================
WARNING: ThreadSanitizer: data race (pid=2184927)
  Write of size 8 at 0x724800000028 by main thread:
    #0 __tsan_memset <null> (python+0xdc23d) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #1 fill_mem_debug /home/greg/oss/b/../cpython/Objects/obmalloc.c:2637:5 (python+0x31bc3a) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #2 _PyMem_DebugRawFree /home/greg/oss/b/../cpython/Objects/obmalloc.c:2766:5 (python+0x31bc3a)
    #3 PyMem_RawFree /home/greg/oss/b/../cpython/Objects/obmalloc.c:971:5 (python+0x319498) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #4 free_threadstate /home/greg/oss/b/../cpython/Python/pystate.c:1455:9 (python+0x54ba27) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #5 _PyThreadState_DeleteList /home/greg/oss/b/../cpython/Python/pystate.c:1933:9 (python+0x54ba27)
    #6 _Py_Finalize /home/greg/oss/b/../cpython/Python/pylifecycle.c:2043:5 (python+0x522649) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #7 Py_Exit /home/greg/oss/b/../cpython/Python/pylifecycle.c:3390:9 (python+0x5253a5) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #8 handle_system_exit /home/greg/oss/b/../cpython/Python/pythonrun.c:635:9 (python+0x550ae6) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #9 _PyErr_PrintEx /home/greg/oss/b/../cpython/Python/pythonrun.c:644:5 (python+0x550ae6)
    #10 PyErr_PrintEx /home/greg/oss/b/../cpython/Python/pythonrun.c:721:5 (python+0x55027c) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #11 PyErr_Print /home/greg/oss/b/../cpython/Python/pythonrun.c:727:5 (python+0x55027c)
    #12 _PyRun_SimpleStringFlagsWithName /home/greg/oss/b/../cpython/Python/pythonrun.c:552:9 (python+0x55027c)
    #13 pymain_run_command /home/greg/oss/b/../cpython/Modules/main.c:253:11 (python+0x58f2e4) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #14 pymain_run_python /home/greg/oss/b/../cpython/Modules/main.c:687:21 (python+0x58f2e4)
    #15 Py_RunMain /home/greg/oss/b/../cpython/Modules/main.c:775:5 (python+0x58f2e4)
    #16 pymain_main /home/greg/oss/b/../cpython/Modules/main.c:805:12 (python+0x58f8e9) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #17 Py_BytesMain /home/greg/oss/b/../cpython/Modules/main.c:829:12 (python+0x58f969) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #18 main /home/greg/oss/b/../cpython/Programs/python.c:15:12 (python+0x15e810) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)

  Previous atomic read of size 8 at 0x724800000028 by thread T1:
    #0 _Py_atomic_load_uintptr_relaxed /home/greg/oss/b/../cpython/Include/cpython/pyatomic_gcc.h:347:10 (python+0x4de525) (BuildId: f07474199a50b9dfeeb5b474be4e3a40
79af30a5)
    #1 _Py_eval_breaker_bit_is_set /home/greg/oss/b/../cpython/Include/internal/pycore_ceval.h:307:19 (python+0x4de525)
    #2 drop_gil /home/greg/oss/b/../cpython/Python/ceval_gil.c:259:9 (python+0x4de525)
    #3 _PyEval_ReleaseLock /home/greg/oss/b/../cpython/Python/ceval_gil.c:596:5 (python+0x4de6fb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #4 detach_thread /home/greg/oss/b/../cpython/Python/pystate.c:2144:5 (python+0x54c1ec) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #5 _PyThreadState_Detach /home/greg/oss/b/../cpython/Python/pystate.c:2150:5 (python+0x548adb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #6 PyEval_SaveThread /home/greg/oss/b/../cpython/Python/ceval_gil.c:640:5 (python+0x4de922) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #7 PyThread_acquire_lock_timed_with_retries /home/greg/oss/b/../cpython/Python/thread.c:148:13 (python+0x573cbc) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30
a5)
    #8 acquire_timed /home/greg/oss/b/../cpython/Modules/_threadmodule.c:737:12 (python+0x61dbc0) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #9 lock_PyThread_acquire_lock /home/greg/oss/b/../cpython/Modules/_threadmodule.c:793:22 (python+0x61dbc0)
    #10 cfunction_call /home/greg/oss/b/../cpython/Objects/methodobject.c:540:18 (python+0x2e9488) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #11 _PyObject_MakeTpCall /home/greg/oss/b/../cpython/Objects/call.c:242:18 (python+0x2539d2) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #12 _PyObject_VectorcallTstate /home/greg/oss/b/../cpython/Include/internal/pycore_call.h:165:16 (python+0x2532ed) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af
30a5)
    #13 PyObject_CallNoArgs /home/greg/oss/b/../cpython/Objects/call.c:106:12 (python+0x2531bd) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #14 finalize_thread_hang /home/greg/oss/b/../cpython/Modules/_testcapimodule.c:3332:5 (_testcapi.cpython-314d-x86_64-linux-gnu.so+0x1e204) (BuildId: 4bdac866b639
cba10e0265b34477a0f6dd6d394c)
    #15 cfunction_vectorcall_O /home/greg/oss/b/../cpython/Objects/methodobject.c:512:24 (python+0x2e873d) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #16 _PyObject_VectorcallTstate /home/greg/oss/b/../cpython/Include/internal/pycore_call.h:167:11 (python+0x25328b) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af
30a5)
    #17 PyObject_Vectorcall /home/greg/oss/b/../cpython/Objects/call.c:327:12 (python+0x2549a0) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #18 _PyEval_EvalFrameDefault /home/greg/oss/b/../cpython/Python/generated_cases.c.h:920:35 (python+0x459d5a) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #19 _PyEval_EvalFrame /home/greg/oss/b/../cpython/Include/internal/pycore_ceval.h:119:16 (python+0x453c62) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #20 _PyEval_Vector /home/greg/oss/b/../cpython/Python/ceval.c:1852:12 (python+0x453c62)
    #21 _PyFunction_Vectorcall /home/greg/oss/b/../cpython/Objects/call.c (python+0x254ebc) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #22 _PyObject_VectorcallTstate /home/greg/oss/b/../cpython/Include/internal/pycore_call.h:167:11 (python+0x25a35b) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af
30a5)
    #23 method_vectorcall /home/greg/oss/b/../cpython/Objects/classobject.c:70:20 (python+0x258a85) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #24 _PyVectorcall_Call /home/greg/oss/b/../cpython/Objects/call.c:273:16 (python+0x2548a7) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #25 _PyObject_Call /home/greg/oss/b/../cpython/Objects/call.c:348:16 (python+0x254abb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #26 PyObject_Call /home/greg/oss/b/../cpython/Objects/call.c:373:12 (python+0x254ce7) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #27 thread_run /home/greg/oss/b/../cpython/Modules/_threadmodule.c:337:21 (python+0x61c1e8) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #28 pythread_wrapper /home/greg/oss/b/../cpython/Python/thread_pthread.h:242:5 (python+0x5740bb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)

  Location is heap block of size 360 at 0x724800000000 allocated by main thread:
    #0 calloc <null> (python+0xdeaaa) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #1 _PyMem_RawCalloc /home/greg/oss/b/../cpython/Objects/obmalloc.c:76:12 (python+0x3174cb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #2 _PyMem_DebugRawAlloc /home/greg/oss/b/../cpython/Objects/obmalloc.c:2696:24 (python+0x31babe) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #3 _PyMem_DebugRawCalloc /home/greg/oss/b/../cpython/Objects/obmalloc.c:2741:12 (python+0x31babe)
    #4 PyMem_RawCalloc /home/greg/oss/b/../cpython/Objects/obmalloc.c:957:12 (python+0x3193cb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #5 alloc_threadstate /home/greg/oss/b/../cpython/Python/pystate.c:1440:12 (python+0x549de1) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #6 new_threadstate /home/greg/oss/b/../cpython/Python/pystate.c:1549:38 (python+0x549de1)
    #7 _PyThreadState_New /home/greg/oss/b/../cpython/Python/pystate.c:1632:12 (python+0x54a7de) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #8 ThreadHandle_start /home/greg/oss/b/../cpython/Modules/_threadmodule.c:405:20 (python+0x61bb6a) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #9 do_start_new_thread /home/greg/oss/b/../cpython/Modules/_threadmodule.c:1882:9 (python+0x61bb6a)
    #10 thread_PyThread_start_joinable_thread /home/greg/oss/b/../cpython/Modules/_threadmodule.c:2005:14 (python+0x61aacc) (BuildId: f07474199a50b9dfeeb5b474be4e3a4
079af30a5)
    #11 cfunction_call /home/greg/oss/b/../cpython/Objects/methodobject.c:540:18 (python+0x2e9488) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #12 _PyObject_MakeTpCall /home/greg/oss/b/../cpython/Objects/call.c:242:18 (python+0x2539d2) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #13 _PyObject_VectorcallTstate /home/greg/oss/b/../cpython/Include/internal/pycore_call.h:165:16 (python+0x2532ed) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af
30a5)
    #14 PyObject_Vectorcall /home/greg/oss/b/../cpython/Objects/call.c:327:12 (python+0x2549a0) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #15 _PyEval_EvalFrameDefault /home/greg/oss/b/../cpython/Python/generated_cases.c.h:1831:35 (python+0x45ec3e) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #16 _PyEval_EvalFrame /home/greg/oss/b/../cpython/Include/internal/pycore_ceval.h:119:16 (python+0x453a19) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #17 _PyEval_Vector /home/greg/oss/b/../cpython/Python/ceval.c:1852:12 (python+0x453a19)
    #18 PyEval_EvalCode /home/greg/oss/b/../cpython/Python/ceval.c:650:21 (python+0x453a19)
    #19 run_eval_code_obj /home/greg/oss/b/../cpython/Python/pythonrun.c:1323:9 (python+0x55356d) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #20 run_mod /home/greg/oss/b/../cpython/Python/pythonrun.c:1408:19 (python+0x552f7e) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #21 _PyRun_StringFlagsWithName /home/greg/oss/b/../cpython/Python/pythonrun.c:1207:15 (python+0x550012) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #22 _PyRun_SimpleStringFlagsWithName /home/greg/oss/b/../cpython/Python/pythonrun.c:547:15 (python+0x550012)
    #23 pymain_run_command /home/greg/oss/b/../cpython/Modules/main.c:253:11 (python+0x58f2e4) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #24 pymain_run_python /home/greg/oss/b/../cpython/Modules/main.c:687:21 (python+0x58f2e4)
    #25 Py_RunMain /home/greg/oss/b/../cpython/Modules/main.c:775:5 (python+0x58f2e4)
    #26 pymain_main /home/greg/oss/b/../cpython/Modules/main.c:805:12 (python+0x58f8e9) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #27 Py_BytesMain /home/greg/oss/b/../cpython/Modules/main.c:829:12 (python+0x58f969) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #28 main /home/greg/oss/b/../cpython/Programs/python.c:15:12 (python+0x15e810) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)

  Thread T1 (tid=2184929, running) created by main thread at:
    #0 pthread_create <null> (python+0xe01ff) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #1 do_start_joinable_thread /home/greg/oss/b/../cpython/Python/thread_pthread.h:289:14 (python+0x572deb) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #2 PyThread_start_joinable_thread /home/greg/oss/b/../cpython/Python/thread_pthread.h:313:9 (python+0x572c0a) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #3 ThreadHandle_start /home/greg/oss/b/../cpython/Modules/_threadmodule.c:422:9 (python+0x61bc7b) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #4 do_start_new_thread /home/greg/oss/b/../cpython/Modules/_threadmodule.c:1882:9 (python+0x61bc7b)
    #5 thread_PyThread_start_joinable_thread /home/greg/oss/b/../cpython/Modules/_threadmodule.c:2005:14 (python+0x61aacc) (BuildId: f07474199a50b9dfeeb5b474be4e3a40
79af30a5)
    #6 cfunction_call /home/greg/oss/b/../cpython/Objects/methodobject.c:540:18 (python+0x2e9488) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #7 _PyObject_MakeTpCall /home/greg/oss/b/../cpython/Objects/call.c:242:18 (python+0x2539d2) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #8 _PyObject_VectorcallTstate /home/greg/oss/b/../cpython/Include/internal/pycore_call.h:165:16 (python+0x2532ed) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af3
0a5)
    #9 PyObject_Vectorcall /home/greg/oss/b/../cpython/Objects/call.c:327:12 (python+0x2549a0) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #10 _PyEval_EvalFrameDefault /home/greg/oss/b/../cpython/Python/generated_cases.c.h:1831:35 (python+0x45ec3e) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #11 _PyEval_EvalFrame /home/greg/oss/b/../cpython/Include/internal/pycore_ceval.h:119:16 (python+0x453a19) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #12 _PyEval_Vector /home/greg/oss/b/../cpython/Python/ceval.c:1852:12 (python+0x453a19)
    #13 PyEval_EvalCode /home/greg/oss/b/../cpython/Python/ceval.c:650:21 (python+0x453a19)
    #14 run_eval_code_obj /home/greg/oss/b/../cpython/Python/pythonrun.c:1323:9 (python+0x55356d) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #15 run_mod /home/greg/oss/b/../cpython/Python/pythonrun.c:1408:19 (python+0x552f7e) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #16 _PyRun_StringFlagsWithName /home/greg/oss/b/../cpython/Python/pythonrun.c:1207:15 (python+0x550012) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #17 _PyRun_SimpleStringFlagsWithName /home/greg/oss/b/../cpython/Python/pythonrun.c:547:15 (python+0x550012)
    #18 pymain_run_command /home/greg/oss/b/../cpython/Modules/main.c:253:11 (python+0x58f2e4) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #19 pymain_run_python /home/greg/oss/b/../cpython/Modules/main.c:687:21 (python+0x58f2e4)
    #20 Py_RunMain /home/greg/oss/b/../cpython/Modules/main.c:775:5 (python+0x58f2e4)
    #21 pymain_main /home/greg/oss/b/../cpython/Modules/main.c:805:12 (python+0x58f8e9) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #22 Py_BytesMain /home/greg/oss/b/../cpython/Modules/main.c:829:12 (python+0x58f969) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)
    #23 main /home/greg/oss/b/../cpython/Programs/python.c:15:12 (python+0x15e810) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5)

SUMMARY: ThreadSanitizer: data race (/home/greg/oss/b/python+0xdc23d) (BuildId: f07474199a50b9dfeeb5b474be4e3a4079af30a5) in __tsan_memset
==================
ThreadSanitizer: reported 1 warnings

Examining the code in question where the race occurs... it's this block https://github.com/python/cpython/blob/v3.13.0rc3/Python/ceval_gil.c#L258

#ifdef FORCE_SWITCHING
    /* We might be releasing the GIL for the last time in this thread.  In that
       case there's a possible race with tstate->interp getting deleted after
       gil->mutex is unlocked and before the following code runs, leading to a
       crash.  We can use final_release to indicate the thread is done with the
       GIL, and that's the only time we might delete the interpreter.  See
       https://github.com/python/cpython/issues/104341. */
    if (!final_release &&
        _Py_eval_breaker_bit_is_set(tstate, _PY_GIL_DROP_REQUEST_BIT))

looping in @ericsnowcurrently for #104341 context.

The int final_release value in that call stack is 0 so the next bit tries to load the eval breaker bit but the thread was woken up by python code executing during finalization of the main thread per the test.

How'd thread T1 ever obtain the GIL upon waking up in the first place given finalization had started?

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

@gpshead gpshead added the type-bug An unexpected behavior, bug, or error label Oct 2, 2024
@gpshead
Copy link
Member Author

gpshead commented Oct 2, 2024

Quoting a comment from @mpage in a #105805 comment as this is somewhat tstate use after it has been freed related:

"""
I think this might be a good time to also fix the issues with threads referencing their PyThreadStates after they have been freed. This would allow us to avoid having to introduce new APIs to put threads to sleep. Instead, threads could block on a lock in their PyThreadState that is never released (also what the JVM does). I believe @colesbury suggested refcounting PyThreadStates as a solution to the PyThreadState lifetime problem.
"""

A reference count on the tstate is interesting, presuming that is not something that'd need to change frequently? Mostly I see it as a "if the thread still exists, the PyThreadState should never be freed" marker - do we have a single clear marker of who owns deallocation of thread states? For daemon threads that we're abandoning during finalization we clearly should not be freeing their PyThreadState's today if that is what the above race actually shows as happening.

@gpshead
Copy link
Member Author

gpshead commented Oct 2, 2024

This bug probably exists on older versions as well, I didn't try testing further back. In general: Friends don't let friends spawn daemon threads. For the health of the process and all code maintainers.

colesbury added a commit to colesbury/cpython that referenced this issue Feb 26, 2025
The race condition with `free_threadstate` and daemon threads exists in
both the free threading and default builds. We were missing a
suppression in the default build.
@colesbury
Copy link
Contributor

I'm going to put up a PR soon that I think addresses the race conditions during interpreter finalization. The main ideas are:

PyThreadState reference count

The PyThreadState field gains a reference count field to avoid the issues with PyThreadState being a dangling pointer to freed memory. It starts out with the value of 2. One reference is owned by the interpreter's linked list of thread states and one reference is owned by the OS thread. In the _PyThreadState_RemoveExcept() call in _Py_Finalize, we decrement the reference counts of the removed thread states. The OS thread also decrements the reference count before calling PyThread_hang_thread. The caller that decrements the refcount to zero frees the memory for the PyThreadState struct.

Those are the only reference count operations. We don't need to ever increment the reference count of the PyThreadState and we don't bother with it during the common PyThreadState_DeleteCurrent() or PyThreadState_Delete() calls.

_Py_THREAD_SHUTTING_DOWN state

_Py_THREAD_SHUTTING_DOWN is a new value for PyThreadState.state. _PyThreadState_MustExit(tstate) is now just a check if tstate.state == _Py_THREAD_SHUTTING_DOWN. This is important for the free threading build because the relevant synchronization happens on PyThreadState.state instead of when acquiring the GIL.

During the runtime finalization, we set the state of non-finalizing threads to _Py_THREAD_SHUTTING_DOWN.

colesbury added a commit to colesbury/cpython that referenced this issue Feb 27, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
colesbury added a commit to colesbury/cpython that referenced this issue Feb 27, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
colesbury added a commit to colesbury/cpython that referenced this issue Feb 27, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
colesbury added a commit to colesbury/cpython that referenced this issue Feb 27, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
colesbury added a commit that referenced this issue Feb 28, 2025
…0602)

The race condition with `free_threadstate` and daemon threads exists in
both the free threading and default builds. We were missing a
suppression in the default build.
colesbury added a commit to colesbury/cpython that referenced this issue Feb 28, 2025
…dstate (pythongh-130602)

The race condition with `free_threadstate` and daemon threads exists in
both the free threading and default builds. We were missing a
suppression in the default build.
(cherry picked from commit cc17307)

Co-authored-by: Sam Gross <colesbury@gmail.com>
colesbury added a commit to colesbury/cpython that referenced this issue Feb 28, 2025
colesbury added a commit that referenced this issue Feb 28, 2025
…gh-130602) (gh-130687)

The race condition with `free_threadstate` and daemon threads exists in
both the free threading and default builds. We were missing a
suppression in the default build.
(cherry picked from commit cc17307)
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Feb 28, 2025
colesbury added a commit to colesbury/cpython that referenced this issue Mar 3, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
colesbury added a commit that referenced this issue Mar 6, 2025
The PyThreadState field gains a reference count field to avoid
issues with PyThreadState being a dangling pointer to freed memory.
The refcount starts with a value of two: one reference is owned by the
interpreter's linked list of thread states and one reference is owned by
the OS thread. The reference count is decremented when the thread state
is removed from the interpreter's linked list and before the OS thread
calls `PyThread_hang_thread()`. The thread that decrements it to zero
frees the `PyThreadState` memory.

The `holds_gil` field is moved out of the `_status` bit field, to avoid
a data race where on thread calls `PyThreadState_Clear()`, modifying the
`_status` bit field while the OS thread reads `holds_gil` when
attempting to acquire the GIL.

The `PyThreadState.state` field now has `_Py_THREAD_SHUTTING_DOWN` as a
possible value. This corresponds to the `_PyThreadState_MustExit()`
check. This avoids race conditions in the free threading build when
checking `_PyThreadState_MustExit()`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

3 participants