diff --git a/peps/pep-0788.rst b/peps/pep-0788.rst index 0f5b58e1804..0b7347af066 100644 --- a/peps/pep-0788.rst +++ b/peps/pep-0788.rst @@ -1,5 +1,5 @@ PEP: 788 -Title: Reimagining native threads +Title: Reimagining Native Threads Author: Peter Bierma Sponsor: Victor Stinner Discussions-To: https://discuss.python.org/t/89863 @@ -14,85 +14,77 @@ Post-History: `10-Mar-2025 `__, Abstract ======== -:c:func:`PyGILState_Ensure`, :c:func:`PyGILState_Release`, and other related -functions in the ``PyGILState`` family are the most common way to create -native threads that interact with Python. They have been the standard for over -twenty years (:pep:`311`). But, over time, these functions have -become problematic: - -- They aren't safe for finalization, either causing the calling thread to hang or - crashing it with a segmentation fault, preventing further execution. -- When they're called before finalization, they force the thread to be - "daemon", meaning that an interpreter won't wait for it to reach any point - of execution. This is mostly frustrating for developers, but can lead to - deadlocks! -- Subinterpreters don't play nicely with them, because they all assume that - the main interpreter is the only one that exists. A fresh thread (that is, - has never had a thread state) that calls :c:func:`PyGILState_Ensure` will - always be for the main interpreter. -- The term "GIL" in the name is quite confusing for users of free-threaded - Python. There isn't a GIL, why do they still have to call it? - -This PEP intends to fix all of these issues by providing two new functions, -:c:func:`PyThreadState_Ensure` and :c:func:`PyThreadState_Release`, as a more -correct and safer replacement for :c:func:`PyGILState_Ensure` and -:c:func:`PyGILState_Release`. For example: - -.. code-block:: c - - if (PyThreadState_Ensure(interp) < 0) { - fputs("Python is shutting down", stderr); - return; - } - - /* Interact with Python, without worrying about finalization. */ - // ... - - PyThreadState_Release(); +In the C API, threads are able to interact with an interpreter by holding an +:term:`attached thread state` for the current thread. This works well, but +can get complicated when it comes to creating and attaching +:term:`thread states ` in a thread-safe manner. + +Specifically, the C API doesn't have any way to ensure that an interpreter +is in a state where it can be called when creating and/or attaching a thread +state. As such, attachment might hang the thread, or it might flat-out crash +due to the interpreter's structure being deallocated in subinterpreters. +This can be a frustrating issue to deal with in large applications that +want to execute Python code alongside some other native code. + +In addition, assumptions about which interpreter to use tend to be wrong +inside of subinterpreters, primarily because :c:func:`PyGILState_Ensure` +always creates a thread state for the main interpreter in threads where +Python hasn't ever run. + +This PEP intends to solve these kinds issues by *reimagining* how we approach +thread states in the C API. This is done through the introduction of interpreter +references that prevent an interpreter from finalizing (or more technically, +entering a stage in which attachment of a thread state hangs). +This allows for more structure and reliability when it comes to thread state +management, because it forces a layer of synchronization between the +interpreter and the caller. + +With this new system, there are a lot of changes needed in CPython and +third-party libraries to adopt it. For example, in APIs that don't require +the caller to hold an attached thread state, a strong interpreter reference +should be passed to ensure that it targets the correct interpreter, and that +the interpreter doesn't concurrently deallocate itself. The best example of +this in CPython is :c:func:`PyGILState_Ensure`. As part of this proposal, +:c:func:`PyThreadState_Ensure` is provided as a modern replacement that +takes a strong interpreter reference. + +Terminology +=========== -This is achieved by introducing two concepts into the C API: +Interpreters +------------ -- "Daemon" and "non-daemon" threads, similar to how it works in the - :mod:`threading` module. -- Interpreter reference counts which prevent an interpreter from finalizing. +In this proposal, "interpreter" refers to a singular, isolated interpreter +(see :pep:`684`), with its own :c:type:`PyInterpreterState` pointer (referred +to as an "interpreter-state"). "Interpreter" *does not* refer to the entirety +of a Python process. -In :c:func:`PyThreadState_Ensure`, both of these ideas are applied. The -calling thread is to store a reference to an interpreter via -:c:func:`PyInterpreterState_Hold`. :c:func:`PyInterpreterState_Hold` -increases the reference count of an interpreter, requiring the thread -to finish (by eventually calling :c:func:`PyThreadState_Release`) before -beginning finalization. +The "current interpreter" refers to the interpreter-state +pointer on an :term:`attached thread state`, as returned by +:c:func:`PyThreadState_GetInterpreter`. -For example, creating a native thread with this API would look something -like this: +Native and Python Threads +------------------------- -.. code-block:: c +This PEP refers to a thread created using the C API as a "native thread", +also sometimes referred to as a "non-Python created thread", where a "Python +created" is a thread created by the :mod:`threading` module. - static PyObject * - my_method(PyObject *self, PyObject *unused) - { - PyThread_handle_t handle; - PyThead_indent_t indent; - - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); - return NULL; - } - /* The thread will always attach and finish, because we increased - the reference count of the interpreter. */ - Py_RETURN_NONE; - } +A native thread is typically registered with the interpreter by +:c:func:`PyGILState_Ensure`, but any thread with an :term:`attached thread state` +qualifies as a native thread. Motivation ========== -Native threads will always hang during finalization ---------------------------------------------------- +Native Threads Always Hang During Finalization +---------------------------------------------- -Many codebases might need to call Python code in highly-asynchronous -situations where the interpreter is already finalizing, or might finalize, and -want to continue running code after the Python call. This desire has been +Many large libraries might need to call Python code in highly-asynchronous +situations where the desired interpreter +(:ref:`typically the main interpreter `) +could be finalizing or deleted, but want to continue running code after +invoking the interpreter. This desire has been `brought up by users `_. For example, a callback that wants to call Python code might be invoked when: @@ -101,23 +93,41 @@ For example, a callback that wants to call Python code might be invoked when: - A thread has quit, and a native library is executing static finalizers of thread local storage. -In the current C API, any non-Python thread (one not created via the +Generally, this pattern would look something like this: + +.. code-block:: c + + static void + some_callback(void *closure) + { + /* Do some work */ + /* ... */ + + PyGILState_STATE gstate = PyGILState_Ensure(); + /* Invoke the C API to do some computation */ + PyGILState_Release(gstate); + + /* ... */ + } + +In the current C API, any "native" thread (one not created via the :mod:`threading` module) is considered to be "daemon", meaning that the interpreter -won't wait on that thread to finalize. Instead, the interpreter will hang the +won't wait on that thread before shutting down. Instead, the interpreter will hang the thread when it goes to :term:`attach ` a :term:`thread state`, -making it unusable past that point. Attaching a thread state can happen at -any point when invoking Python, such as releasing the GIL in-between bytecode -instructions, or when a C function exits a :c:macro:`Py_BEGIN_ALLOW_THREADS` -block. (Note that hanging the thread is relatively new behavior; in prior -versions, the thread would terminate, but the issue is the same.) - -This means that any non-Python thread may be terminated at any point, which +making the thread unusable past that point. Attaching a thread state can happen at +any point when invoking Python, such as in-between bytecode instructions +(to yield the :term:`GIL` to a different thread), or when a C function exits a +:c:macro:`Py_BEGIN_ALLOW_THREADS` block, so simply guarding against whether the +interpreter is finalizing isn't enough to safely call Python code. (Note that hanging +the thread is relatively new behavior; in prior versions, the thread would exit, +but the issue is the same.) + +This means that any non-Python/native thread may be terminated at any point, which is severely limiting for users who want to do more than just execute Python -code in their stream of calls (for example, C++ executing finalizers in -*addition* to calling Python). +code in their stream of calls. -Using ``Py_IsFinalizing`` is insufficient -***************************************** +``Py_IsFinalizing`` is Insufficient +*********************************** The :ref:`docs ` currently recommend :c:func:`Py_IsFinalizing` to guard against termination of @@ -131,55 +141,100 @@ the thread: Unfortunately, this isn't correct, because of time-of-call to time-of-use issues; the interpreter might not be finalizing during the call to -:c:func:`Py_IsFinalizing`, but it might start finalizing immediately afterwards, which -would cause the attachment of a thread state (typically via -:c:func:`PyGILState_Ensure`) to hang the thread. +:c:func:`Py_IsFinalizing`, but it might start finalizing immediately +afterwards, which would cause the attachment of a thread state to hang the +thread. -Daemon threads can cause finalization deadlocks -*********************************************** +Daemon Threads Can Break Finalization +************************************* When acquiring locks, it's extremely important to detach the thread state to prevent deadlocks. This is true on both the with-GIL and free-threaded builds. -When the GIL is enabled, a deadlock can occur pretty easily when acquiring a -lock if the GIL wasn't released, and lock-ordering deadlocks can still occur -free-threaded builds if the thread state wasn't detached. -So, all code that needs to work with locks need to detach the thread state. -In C, this is almost always done via :c:macro:`Py_BEGIN_ALLOW_THREADS` and -:c:macro:`Py_END_ALLOW_THREADS`, in a code block that looks something like this: +When the GIL is enabled, a deadlock can occur pretty easily when acquiring a +lock if the GIL wasn't released; thread A grabs a lock, and starts waiting on +its thread state to attach, while thread B holds the GIL and is waiting on the +lock. A similar deadlock can occur on the free-threaded build during stop-the-world +pauses when running the garbage collector. -.. code-block:: c +This affects CPython itself, and there's not much that can be done +to fix it with the current API. For example, +`python/cpython#129536 `_ +remarks that the :mod:`ssl` module can emit a fatal error when used at +finalization, because a daemon thread got hung while holding the lock. + + +Daemon Threads are not the Problem +********************************** + +Prior to this PEP, deprecating daemon threads was discussed +`extensively `_. Daemon threads technically +cause many of the issues outlined in this proposal, so removing daemon threads +could be seen as a potential solution. The main argument for removing daemon +threads is that they're a large cause of problems in the interpreter: + + Except that daemon threads don’t actually work reliably. They’re attempting + to run and use Python interpreter resources after the runtime has been shut + down upon runtime finalization. As in they have pointers to global state for + the interpreter. + +In practice, daemon threads are useful for simplifying many threading applications +in Python, and since the program is about to close in most cases, it's not worth +the added complexity to try and gracefully shut down a thread. + + When I’ve needed daemon threads, it’s usually been the case of “Long-running, + uninterruptible, third-party task” in terms of the examples in the linked issue. + Basically I’ve had something that I need running in the background, but I have + no easy way to terminate it short of process termination. Unfortunately, I’m on + Windows, so ``signal.pthread_kill`` isn’t an option. I guess I could use the + Windows Terminate Thread API, but it’s a lot of work to wrap it myself compared + to just letting process termination handle things. + +Finally, removing Python-level daemon threads does not fix the whole problem. +As noted by this PEP, extension modules are free to create their own threads +and attach thread states for them. Similar to daemon threads, Python doesn't +try and join them during finalization, so trying to remove daemon threads +as a whole would involve trying to remove them from the C API, which would +require a massive API change. + + Realize however that even if we get rid of daemon threads, extension + module code can and does spawn its own threads that are not tracked by + Python. ... Those are realistically an alternate form of daemon thread + ... and those are never going to be forbidden. + +Joining the Thread isn't Always a Good Idea +******************************************* - Py_BEGIN_ALLOW_THREADS - acquire_lock(); - Py_END_ALLOW_THREADS +Even in daemon threads, it's generally *possible* to prevent hanging of +native threads through :mod:`atexit` functions. +A thread could be started by some C function, and then as long as +that thread is joined by :mod:`atexit`, then the thread won't hang. -Again, in a daemon thread, :c:macro:`Py_END_ALLOW_THREADS` will hang the thread -if the interpreter is finalizing. But, :c:macro:`Py_BEGIN_ALLOW_THREADS` will -*not* hang the thread; the lock will be acquired, and *then* the thread will -be hung! Once that happens, nothing can try to acquire that lock without -deadlocking. The main thread will continue to run finalizers past that point, -though. If any of those finalizers try to acquire the lock, deadlock ensues. +:mod:`atexit` isn't always an option for a function, because to call it, it +needs to already have an :term:`attached thread state` for the thread. If +there's no guarantee of that, then :func:`atexit.register` cannot be safely +called without the risk of hanging the thread. This shifts the contract +of joining the thread to the caller rather than the callee, which again, +isn't done in practice. -This affects CPython itself, and there's not much that can be done -to fix it. For example, `python/cpython#129536 `_ -remarks that the :mod:`ssl` module can emit a fatal error when used at -finalization, because a daemon thread got hung while holding the lock. There -are workarounds for this for pure-Python code, but native threads don't have -such an option. +For example, large C++ applications might want to expose an interface that can +call Python code. To do this, a C++ API would take a Python object, and then +call :c:func:`PyGILState_Ensure` to safely interact with it (for example, by +calling it). If the interpreter is finalizing or has shut down, then the thread +is hung, disrupting the C++ stream of calls. .. _pep-788-hanging-compat: -We can't change finalization behavior for ``PyGILState_Ensure`` -*************************************************************** +Finalization Behavior for ``PyGILState_Ensure`` Cannot Change +************************************************************* There will always have to be a point in a Python program where -:c:func:`PyGILState_Ensure` can no longer acquire the GIL (or more correctly, -attach a thread state). If the interpreter is long dead, then Python -obviously can't give a thread a way to invoke it. -:c:func:`PyGILState_Ensure` doesn't have any meaningful way to return a -failure, so it has no choice but to terminate the thread or emit a fatal -error, as noted in `python/cpython#124622 `_: +:c:func:`PyGILState_Ensure` can no longer attach a thread state. +If the interpreter is long dead, then Python obviously can't give a +thread a way to invoke it. :c:func:`PyGILState_Ensure` doesn't have any +meaningful way to return a failure, so it has no choice but to terminate +the thread or emit a fatal error, as noted in +`python/cpython#124622 `_: I think a new GIL acquisition and release C API would be needed. The way the existing ones get used in existing C code is not amenible to suddenly @@ -189,30 +244,28 @@ error, as noted in `python/cpython#124622 `_. +There are currently two public ways for a user to create and attach a +:term:`thread state` for their thread; manual use of :c:func:`PyThreadState_New` +and :c:func:`PyThreadState_Swap`, and :c:func:`PyGILState_Ensure`. The latter, +:c:func:`PyGILState_Ensure`, is `the most common `_. -``PyGILState_Ensure`` generally crashes during finalization +``PyGILState_Ensure`` Generally Crashes During Finalization *********************************************************** At the time of writing, the current behavior of :c:func:`PyGILState_Ensure` does not -match the documentation. Instead of hanging the thread during finalization -as previously noted, it's extremely common for it to crash with a segmentation +always match the documentation. Instead of hanging the thread during finalization +as previously noted, it's possible for it to crash with a segmentation fault. This is a `known issue `_ -that could, in theory, be fixed in CPython, but it's definitely worth noting +that could be fixed in CPython, but it's definitely worth noting here. Incidentally, acceptance and implementation of this PEP will likely fix the existing crashes caused by :c:func:`PyGILState_Ensure`. -The term "GIL" is tricky for free-threading +The Term "GIL" is Tricky for Free-threading ******************************************* A large issue with the term "GIL" in the C API is that it is semantically @@ -224,19 +277,40 @@ created by the authors of this PEP: erroneously call the C API inside ``Py_BEGIN_ALLOW_THREADS`` blocks or omit ``PyGILState_Ensure`` in fresh threads. -Since Python 3.12, it is an :term:`attached thread state` that lets a thread -invoke the C API. On with-GIL builds, holding an attached thread state -implies holding the GIL, so only one thread can have one at a time. Free-threaded -builds achieve the effect of multi-core parallism while remaining -ackwards-compatible by simply removing that limitation: threads still need a -thread state (and thus need to call :c:func:`PyGILState_Ensure`), but they -don't need to wait on one another to do so. +Again, :c:func:`PyGILState_Ensure` gets an :term:`attached thread state` +for the thread on both with-GIL and free-threaded builds. To demonstate, +:c:func:`PyGILState_Ensure` is very roughly equivalent to the following: -Subinterpreters don't work with ``PyGILState_Ensure`` ------------------------------------------------------ +.. code-block:: c + + PyGILState_STATE + PyGILState_Ensure(void) + { + PyThreadState *existing = PyThreadState_GetUnchecked(); + if (existing == NULL) { + // Chooses the interpreter of the last attached thread state + // for this thread. If Python has never ran in this thread, the + // main interpreter is used. + PyInterpreterState *interp = guess_interpreter(); + PyThreadState *tstate = PyThreadState_New(interp); + PyThreadState_Swap(tstate); + return opaque_tstate_handle(tstate); + } else { + return opaque_tstate_handle(existing); + } + } + +An attached thread state is always needed to call the C API, so +:c:func:`PyGILState_Ensure` still needs to be called on free-threaded builds, +but with a name like "ensure GIL", it's not immediately clear that that's true. + +.. _pep-788-subinterpreters-gilstate: + +``PyGILState_Ensure`` Doesn't Guess the Correct Interpreter +----------------------------------------------------------- As noted in the :ref:`documentation `, -``PyGILState`` APIs aren't officially supported in subinterpreters: +the ``PyGILState`` functions aren't officially supported in subinterpreters: Note that the ``PyGILState_*`` functions assume there is only one global interpreter (created automatically by ``Py_Initialize()``). Python @@ -244,46 +318,73 @@ As noted in the :ref:`documentation `, ``Py_NewInterpreter()``), but mixing multiple interpreters and the ``PyGILState_*`` API is unsupported. -More technically, this is because ``PyGILState_Ensure`` doesn't have any way +This is because :c:func:`PyGILState_Ensure` doesn't have any way to know which interpreter created the thread, and as such, it has to assume that it was the main interpreter. There isn't any way to detect this at runtime, so spurious races are bound to come up in threads created by subinterpreters, because synchronization for the wrong interpreter will be used on objects shared between the threads. +For example, if the thread had access to object A, which belongs to a +subinterpreter, but then called :c:func:`PyGILState_Ensure`, the thread would +have an :term:`attached thread state` pointing to the main interpreter, +not the subinterpreter. This means that any :term:`GIL` assumptions about the +object are wrong! There isn't any synchronization between the two GILs, so both +the thread (who thinks it's in the subinterpreter) and the main thread could try +to increment the reference count at the same time, causing a data race! -Interpreters can concurrently shut down -*************************************** +Concurrent Interpreter Deallocation is Frustrating +-------------------------------------------------- The other way of creating a native thread that can invoke Python, -:c:func:`PyThreadState_New` / :c:func:`PyThreadState_Swap`, is a lot better +:c:func:`PyThreadState_New` and :c:func:`PyThreadState_Swap`, is a lot better for supporting subinterpreters (because :c:func:`PyThreadState_New` takes an -explicit interpreter, rather than assuming that the main interpreter was intended), -but is still limited by the current API. +explicit interpreter, rather than assuming that the main interpreter was +requested), but is still limited by the current hanging problems in the C API. -In particular, subinterpreters typically have a much shorter lifetime than the -main interpreter, and as such, there's not necessarily a guarantee that a -:c:type:`PyInterpreterState` (acquired by :c:func:`PyInterpreterState_Get`) -passed to a fresh thread will still be alive. Similarly, a -:c:type:`PyInterpreterState` pointer could have been replaced with a *new* -interpreter, causing all sorts of unknown issues. They are also subject to -all the finalization related hanging mentioned previously. +In addition, subinterpreters typically have a much shorter lifetime than the +main interpreter, so there's a much higher chance that an interpreter passed +to a thread will have already finished and have been deallocated. So, passing +that interpreter to :c:func:`PyThreadState_New` will most likely crash the program +because of a use-after-free on the interpreter-state. Rationale ========= -This PEP includes several new APIs that intend to fix all of the issues stated -above. +So, how do we address all of this? The best way seems to be starting from +scratch and "reimagining" how to create, acquire and attach +:term:`thread states ` in the C API. + +Preventing Interpreter Shutdown with Reference Counting +------------------------------------------------------- -Replacing the old APIs ----------------------- +This PEP takes an approach where an interpreter is given a reference count +that prevents it from shutting down. So, holding a "strong reference" to the +interpreter will make it safe to call the C API without worrying about the +thread being hung. -As made clear in Motivation_, ``PyGILState`` is already pretty buggy, and -even if it was magically fixed, the current behavior of hanging the thread is -beyond repair. In turn, this PEP intends to completely deprecate the existing -``PyGILState`` APIs and provide better alternatives. However, even if this PEP -is rejected, all of the APIs can be replaced with more correct ``PyThreadState`` -functions in the current C API: +This means that interfacing Python (for example, in a C++ library) will need +a reference to the interpreter in order to safely call the object, which is +definitely more inconvenient than assuming the main interpreter is the right +choice, but there's not really another option. + +Weak References +*************** + +This proposal also comes with weak references to an interpreter that don't +prevent it from shutting down, but can be promoted to a strong reference when +the user decides that they want to call the C API. Promotion of a weak reference +to a strong reference can fail if the interpreter has already finalized, or +reached a point during finalization where it can't be guaranteed that the +thread won't hang. + +Deprecation of the GIL-state APIs +--------------------------------- + +Due to the plethora of issues with ``PyGILState``, this PEP intends to do away +with them entirely. In today's C API, all ``PyGILState`` functions are +replaceable with ``PyThreadState`` counterparts that are compatibile with +subinterpreters: - :c:func:`PyGILState_Ensure`: :c:func:`PyThreadState_Swap` & :c:func:`PyThreadState_New` - :c:func:`PyGILState_Release`: :c:func:`PyThreadState_Clear` & :c:func:`PyThreadState_Delete` @@ -291,187 +392,179 @@ functions in the current C API: - :c:func:`PyGILState_Check`: ``PyThreadState_GetUnchecked() != NULL`` This PEP specifies a ten-year deprecation for these functions (while remaining -in the stable ABI), primarily because it's expected that the migration won't be -seamless, due to the new requirement of storing an interpreter state. The -exact details of this deprecation are currently unclear, see -:ref:`pep-788-deprecation`. - -A light layer of magic ----------------------- - -The APIs proposed by this PEP intentionally have a layer of abstraction that is -hidden from the user and offloads complexity onto CPython. This is done -primarily to help ease the transition from ``PyGILState`` for existing -codebases, and for ease-of-use to those who provide wrappers the C API, such -as Cython or PyO3. - -In particular, the API hides details about the lifetime of the thread state -and most of the details with interpreter references. - -See also :ref:`pep-788-activate-deactivate-instead`. - -Bikeshedding and the ``PyThreadState`` namespace ------------------------------------------------- - -To solve the issue with "GIL" terminology, the new functions described by this -PEP intended as replacements for ``PyGILState`` will go under the existing -``PyThreadState`` namespace. In Python 3.14, the documentation has been -updated to switch over to terms like -:term:`"attached thread state" ` instead of -:term:`"global interpreter lock" `, so this namespace -seems to fit well for this PEP. - -Preventing interpreter finalization with references ---------------------------------------------------- - -Several iterations of this API have taken an approach where -:c:func:`PyThreadState_Ensure` can return a failure based on the state of -the interpreter. Instead, this PEP takes an approach where an interpreter -keeps track of the number of non-daemon threads, which inherently prevents -it from beginning finalization. - -The main upside with this approach is that there's more consistency with -attaching threads. Using an interpreter reference from the calling thread -keeps the interpreter from finalizing before the thread starts, ensuring -that it always works. An approach that were to return a failure based on -the start-time of the thread could cause spurious issues. - -In the case where it is useful to let the interpreter finalize, such as in -an asynchronous callback where there's no guarantee that the thread will start, -strong references to an interpreter can be acquired through -:c:func:`PyInterpreterState_Lookup`. +in the stable ABI), mainly because it's expected that the migration will be a +little painful, because :c:func:`PyThreadState_Ensure` and +:c:func:`PyThreadState_Release` aren't drop-in replacements for +:c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`, due to the +requirement of a specific interpreter. The exact details of this deprecation +aren't too clear, see :ref:`pep-788-deprecation`. Specification ============= -Daemon and non-daemon threads ------------------------------ +Interpreter References to Prevent Shutdown +------------------------------------------ + +An interpreter will keep a reference count that's managed by users of the +C API. When the interpreter starts finalizing, it will until its reference count +reaches zero before proceeding to a point where threads will be hung. This will +happen around the same time when :class:`threading.Thread` objects are joined, +but note that this *is not* the same as joining the thread; the interpreter will +only wait until the reference count is zero, and then proceed. The interpreter +must not hang threads until this reference count has reached zero. +After the reference count has reached zero, threads can no longer prevent the +interpreter from shutting down. + +A weak reference to the interpreter won't prevent it from finalizing, but can +be safely accessed after the interpreter no longer supports strong references, +and even after the interpreter has been deleted. But, at that point, the weak +reference can no longer be promoted to a strong reference. + +Strong Interpreter References +***************************** + +.. c:type:: PyInterpreterRef + + An opaque, strong reference to an interpreter. + The interpreter will wait until a strong reference has been released + before shutting down. + + This type is guaranteed to be pointer-sized. + +.. c:function:: PyInterpreterRef PyInterpreterRef_Get(void) + + Acquire a strong reference to the current interpreter. + + This function cannot fail, other than with a fatal error when the caller + doesn't hold an :term:`attached thread state`. + +.. c:function:: int PyInterpreterState_AsStrong(PyInterpreterState *interp, PyInterpreterRef *ref_ptr) + + Acquire a strong reference to *interp*. + + Unless *interp* is the main interpreter, this function can cause crashes + if *interp* shuts down in another thread! Prefer safely acquiring a + reference through :c:func:`PyInterpreterRef_Get` whenever possible. -This PEP introduces the concept of non-daemon thread states. By default, all -threads created without the :mod:`threading` module will hang when trying to -attach a thread state for a finalizing interpreter (in fact, daemon threads -that *are* created with the :mod:`threading` module will hang in the same -way). This generally happens when a thread calls :c:func:`PyEval_RestoreThread` -or in between bytecode instructions, based on :func:`sys.setswitchinterval`. + On success, this function will return ``0`` and set *ref_ptr* to a strong + reference, and on failure, this function will return ``-1``. + (Failure typically indicates that *interp* has already finished + waiting on its reference count.) -A new, internal field will be added to the ``PyThreadState`` structure that -determines if the thread is daemon. Before finalization, an interpreter -will wait until all non-daemon threads call :c:func:`PyThreadState_Delete`. + The caller does not need to hold an :term:`attached thread state`. -For backwards compatibility, all thread states created by existing APIs, -including :c:func:`PyGILState_Ensure`, will remain daemon by default. -See :ref:`pep-788-hanging-compat`. +.. c:function:: PyInterpreterState *PyInterpreterRef_AsInterpreter(PyInterpreterRef ref) -.. c:function:: int PyThreadState_SetDaemon(int is_daemon) + Return the interpreter denoted by *ref*. - Set the :term:`attached thread state` as non-daemon or daemon. + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. - The attached thread state must not be the main thread for the - interpreter. All thread states created without - :c:func:`PyThreadState_Ensure` are daemon by default. +.. c:function:: PyInterpreterRef PyInterpreterRef_Dup(PyInterpreterRef ref) - If the thread state is non-daemon, then the current interpreter will wait - for this thread to finish before shutting down. See also - :attr:`threading.Thread.daemon`. + Duplicate a strong reference to an interpreter. - Return zero on success, non-zero *without* an exception set on failure. + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. -Interpreter reference counting ------------------------------- +.. c:function:: void PyInterpreterRef_Close(PyInterpreterRef ref) -Internally, an interpreter will have to keep track of the number of -non-daemon native threads, which will determine when an interpreter can -finalize. This is done to prevent use-after-free crashes in -:c:func:`PyThreadState_Ensure` for interpreters with short lifetimes, and -to remove needless layers of synchronization between the calling thread and -the started thread. + Release a strong reference to an interpreter, allowing it to shut down + if there are no references left. -An interpreter state returned by :c:func:`Py_NewInterpreter` (or really, -:c:func:`PyInterpreterState_New`) will start with a native thread countdown. -For simplicity's sake, this will be referred to as a reference count. -A non-zero reference count prevents the interpreter from finalizing. + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. -.. c:function:: PyInterpreterState *PyInterpreterState_Hold(void) +Weak Interpreter References +*************************** - Similar to :c:func:`PyInterpreterState_Get`, but returns a strong - reference to the interpreter (meaning, it has its reference count - incremented by one, allowing the returned interpreter state to be safely - accessed by another thread, because it will be prevented from finalizing). +.. c:type:: PyInterpreterWeakRef + + An opaque, weak reference to an interpreter. + The interpreter will *not* wait for the reference to be + released before shutting down. + +.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Get(void) + + Acquire a weak reference to the current interpreter. + + This function is generally meant to be used in tandem with + :c:func:`PyInterpreterWeakRef_AsStrong`, and cannot fail. + + The caller must hold an :term:`attached thread state`. + +.. c:function:: PyInterpreterWeakRef PyInterpreterWeakRef_Dup(PyInterpreterWeakRef wref) + + Duplicate a weak reference to *wref*. This function is generally meant to be used in tandem with - :c:func:`PyThreadState_Ensure`. + :c:func:`PyInterpreterWeakRef_AsStrong`. - The caller must have an :term:`attached thread state`. This function - cannot return ``NULL``. Failures are always a fatal error. + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. -.. c:function:: PyInterpreterState *PyInterpreterState_Lookup(int64_t interp_id) +.. c:function:: int PyInterpreterWeakRef_AsStrong(PyInterpreterWeakRef wref, PyInterpreterRef *ref_ptr) - Similar to :c:func:`PyInterpreterState_Hold`, but looks up an interpreter - based on an ID (see :c:func:`PyInterpreterState_GetID`). This has the - benefit of allowing the interpreter to finalize in cases where the thread - might not start, such as inside of an asynchronous callback. + Acquire a strong reference to an interpreter through a weak reference. - This function will return ``NULL`` without an exception set on failure. - If the return value is non-``NULL``, then the returned interpreter will be - prevented from finalizing until the reference is released by - :c:func:`PyThreadState_Release` or :c:func:`PyInterpreterState_Release`. + On success, this function returns ``0`` and sets *ref_ptr* to a strong + reference to the interpreter denoted by *wref*. - Returning ``NULL`` typically means that the interpreter is at a point - where threads cannot start, or no longer exists. + If the interpreter no longer exists or has already finished waiting + for its reference count to reach zero, then this function returns ``-1``. - The caller does not need to have an :term:`attached thread state`. + This function is not safe to call in a re-entrant signal handler. -.. c:function:: void PyInterpreterState_Release(PyInterpreterState *interp) + The caller does not need to hold an :term:`attached thread state`. - Decrement the reference count of the interpreter, as was incremented by - :c:func:`PyInterpreterState_Hold` or :c:func:`PyInterpreterState_Lookup`. +.. c:function:: void PyInterpreterWeakRef_Close(PyInterpreterWeakRef wref) - This function cannot fail, other than with a fatal error. The caller does - not need to have an :term:`attached thread state` for *interp*. + Release a weak reference, possibly deallocating it. -Ensuring and releasing thread states + This function cannot fail, and the caller doesn't need to hold an + :term:`attached thread state`. + +Ensuring and Releasing Thread States ------------------------------------ This proposal includes two new high-level threading APIs that intend to replace :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release`. -.. c:function:: int PyThreadState_Ensure(PyInterpreterState *interp) - - Ensure that the thread has an :term:`attached thread state` for *interp*, - and thus can safely invoke that interpreter. It is OK to call this - function if the thread already has an attached thread state, as long as - there is a subsequent call to :c:func:`PyThreadState_Release` that matches - this one. +.. c:function:: int PyThreadState_Ensure(PyInterpreterRef ref) - The reference to the interpreter *interp* is stolen by this function. - As such, *interp* should have been acquired by - :c:func:`PyInterpreterState_Hold`. + Ensure that the thread has an :term:`attached thread state` for the + interpreter denoted by *ref*, and thus can safely invoke that + interpreter. It is OK to call this function if the thread already has an + attached thread state, as long as there is a subsequent call to + :c:func:`PyThreadState_Release` that matches this one. - Thread states created by this function are non-daemon by default. See - :c:func:`PyThreadState_SetDaemon`. If the calling thread already has an - attached thread state that matches *interp*, then this function - will mark the existing thread state as non-daemon and return. It will - be restored to its prior daemon status upon the next - :c:func:`PyThreadState_Release` call. + Nested calls to this function will only sometimes create a new + :term:`thread state`. If there is no attached thread state, + then this function will check for the most recent attached thread + state used by this thread. If none exists or it doesn't match *ref*, + a new thread state is created. If it does match *ref*, it is reattached. + If there is an attached thread state, then a similar check occurs; + if the interpreter matches *ref*, it is attached, and otherwise a new + thread state is created. - Return zero on success, and non-zero with the old attached thread state - restored (which may have been ``NULL``). + Return zero on success, and non-zero on failure. .. c:function:: void PyThreadState_Release() - Release the :term:`attached thread state` set by - :c:func:`PyThreadState_Ensure`. Any thread state that was set prior - to the original call to :c:func:`PyThreadState_Ensure` will be restored. + Release a :c:func:`PyThreadState_Ensure` call. - This function cannot fail, but may hang the thread if the - attached thread state prior to the original :c:func:`!PyThreadState_Ensure` - was daemon and the interpreter was finalized. + The :term:`attached thread state` prior to the corresponding + :c:func:`PyThreadState_Ensure` call is guaranteed to be restored upon + returning. The cached thread state as used by :c:func:`PyThreadState_Ensure` + and :c:func:`PyGILState_Ensure` will also be restored. -Deprecation of ``PyGILState`` APIs ----------------------------------- + This function cannot fail. + +Deprecation of GIL-state APIs +----------------------------- This PEP deprecates all of the existing ``PyGILState`` APIs in favor of the -new ``PyThreadState`` APIs for the reasons given in the Motivation_. Namely: +existing and new ``PyThreadState`` APIs. Namely: - :c:func:`PyGILState_Ensure`: use :c:func:`PyThreadState_Ensure` instead. - :c:func:`PyGILState_Release`: use :c:func:`PyThreadState_Release` instead. @@ -509,8 +602,53 @@ Examples These examples are here to help understand the APIs described in this PEP. Ideally, they could be reused in the documentation. -Single-threaded example -*********************** +Example: A Library Interface +**************************** + +Imagine that you're developing a C library for logging. +You might want to provide an API that allows users to log to a Python file +object. + +With this PEP, you'd implement it like this: + +.. code-block:: c + + int + LogToPyFile(PyInterpreterWeakRef wref, + PyObject *file, + const char *text) + { + PyInterpreterRef ref; + if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) { + // Python interpreter has shut down + return -1; + } + + if (PyThreadState_Ensure(ref) < 0) { + PyInterpreterRef_Close(ref); + puts("Out of memory.\n", stderr); + return -1; + } + + char *to_write = do_some_text_mutation(text); + int res = PyFile_WriteString(to_write, file); + free(to_write); + PyErr_Print(); + + PyThreadState_Release(); + PyInterpreterRef_Close(ref); + return res < 0; + } + +If you were to use :c:func:`PyGILState_Ensure` for this case, then your +thread would hang if the interpreter were to be finalizing at that time! + +Additionally, the API supports subinterpreters. If you were to assume that +the main interpreter created the file object, then your library wouldn't be safe to use +with file objects created by a subinterpreter. + +Example: A Single-threaded Ensure +********************************* This example shows acquiring a lock in a Python method. @@ -524,12 +662,12 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! my_critical_operation(PyObject *self, PyObject *unused) { assert(PyThreadState_GetUnchecked() != NULL); - PyInterpreterState *interp = PyInterpreterState_Hold(); - /* Temporarily make this thread non-daemon to ensure that the + PyInterpreterRef ref = PyInterpreterRef_Get(); + /* Temporarily hold a strong reference to ensure that the lock is released. */ - if (PyThreadState_Ensure(interp) < 0) { - PyErr_SetString(PyExc_PythonFinalizationError, - "interpreter is shutting down"); + if (PyThreadState_Ensure(ref) < 0) { + PyErr_NoMemory(); + PyInterpreterRef_Close(ref); return NULL; } @@ -537,18 +675,20 @@ held. Any future finalizer that wanted to acquire the lock would be deadlocked! acquire_some_lock(); Py_END_ALLOW_THREADS; - /* Do something while holding the lock */ + /* Do something while holding the lock. + The interpreter won't finalize during this period. */ // ... release_some_lock(); PyThreadState_Release(); + PyInterpreterRef_Close(ref); Py_RETURN_NONE; } -Transitioning from old functions -******************************** +Example: Transitioning From the Legacy Functions +************************************************ -The following code uses the old ``PyGILState`` APIs: +The following code uses the ``PyGILState`` APIs: .. code-block:: c @@ -582,22 +722,23 @@ The following code uses the old ``PyGILState`` APIs: Py_RETURN_NONE; } -This is the same code, updated to use the new functions: +This is the same code, rewritten to use the new functions: .. code-block:: c static int thread_func(void *arg) { - PyInterpreterState *interp = (PyInterpreterState *)arg; + PyInterpreterRef interp = (PyInterpreterRef)arg; if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); + PyInterpreterRef_Close(interp); return -1; } if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } PyThreadState_Release(); + PyInterpreterRef_Close(interp); return 0; } @@ -607,9 +748,9 @@ This is the same code, updated to use the new functions: PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); + PyInterpreterRef ref = PyInterpreterRef_Get(); + if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { + PyInterpreterRef_Close(ref); return NULL; } Py_BEGIN_ALLOW_THREADS @@ -619,8 +760,8 @@ This is the same code, updated to use the new functions: } -Daemon thread example -********************* +Example: A Daemon Thread +************************ Native daemon threads are still a use-case, and as such, they can still be used with this API: @@ -630,12 +771,14 @@ they can still be used with this API: static int thread_func(void *arg) { - PyInterpreterState *interp = (PyInterpreterState *)arg; - if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); + PyInterpreterRef ref = (PyInterpreterRef)arg; + if (PyThreadState_Ensure(ref) < 0) { + PyInterpreterRef_Close(ref); return -1; } - (void)PyThreadState_SetDaemon(1); + /* Release the interpreter reference, allowing it to + finalize. This means that print(42) can hang this thread. */ + PyInterpreterRef_Close(ref); if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } @@ -649,100 +792,171 @@ they can still be used with this API: PyThread_handle_t handle; PyThead_indent_t indent; - PyInterpreterState *interp = PyInterpreterState_Hold(); - if (PyThread_start_joinable_thread(thread_func, interp, &ident, &handle) < 0) { - PyInterpreterState_Release(interp); + PyInterpreterRef ref = PyInterpreterRef_Get(); + if (PyThread_start_joinable_thread(thread_func, (void *)ref, &ident, &handle) < 0) { + PyInterpreterRef_Close(ref); return NULL; } Py_RETURN_NONE; } -Asynchronous callback example -***************************** - -As stated in the Motivation_, there are many cases where it's desirable -to call Python in an asynchronous callback. In such cases, it's not safe to -call :c:func:`PyInterpreterState_Hold`, because it's not guaranteed that -:c:func:`PyThreadState_Ensure` will ever be called. -If not, finalization becomes deadlocked. +Example: An Asynchronous Callback +********************************* -This scenario requires using :c:func:`PyInterpreterState_Lookup` instead, -which only prevents finalization once the lookup has been made. - -For example: +In some cases, the thread might not ever start, such as in a callback. +We can't use a strong reference here, because a strong reference would +deadlock the interpreter if it's not released. .. code-block:: c typedef struct { - int64_t interp_id; - } pyrun_t; + PyInterpreterWeakRef wref; + } ThreadData; static int async_callback(void *arg) { - pyrun_t *data = (pyrun_t *)arg; - PyInterpreterState *interp = PyInterpreterState_Lookup(data->interp_id); - PyMem_RawFree(data); - if (interp == NULL) { - fputs("Python has shut down", stderr); + ThreadData *data = (ThreadData *)arg; + PyInterpreterWeakRef wref = data->wref; + PyInterpreterRef ref; + if (PyInterpreterWeakRef_AsStrong(wref, &ref) < 0) { + fputs("Python has shut down!\n", stderr); return -1; } - if (PyThreadState_Ensure(interp) < 0) { - fputs("Cannot talk to Python", stderr); + + if (PyThreadState_Ensure(ref) < 0) { + PyInterpreterRef_Close(ref); return -1; } if (PyRun_SimpleString("print(42)") < 0) { PyErr_Print(); } PyThreadState_Release(); + PyInterpreterRef_Close(ref); return 0; } static PyObject * setup_callback(PyObject *self, PyObject *unused) { - PyThread_handle_t handle; - PyThead_indent_t indent; - - pyrun_t *data = PyMem_RawMalloc(sizeof(pyrun_t)); - if (data == NULL) { - return PyErr_NoMemory(); - } // Weak reference to the interpreter. It won't wait on the callback // to finalize. - data->interp_id = PyInterpreterState_GetID(PyInterpreterState_Get()); - register_callback(async_callback, data); + ThreadData *tdata = PyMem_Malloc(sizeof(ThreadData)); + if (tdata == NULL) { + PyErr_NoMemory(); + return NULL; + } + PyInterpreterWeakRef wref = PyInterpreterWeakRef_Get(); + tdata->wref = wref; + register_callback(async_callback, tdata); Py_RETURN_NONE; } +Example: Calling Python Without a Callback Parameter +**************************************************** + +There are a few cases where callback functions don't take a callback parameter +(``void *arg``), so it's impossible to acquire a reference to any specific +interpreter. The solution to this problem is to acquire a reference to the main +interpreter through :c:func:`PyInterpreterState_AsStrong`. + +But wait, won't that break with subinterpreters, per +:ref:`pep-788-subinterpreters-gilstate`? Fortunately, since the callback has +no callback parameter, it's not possible for the caller to pass any objects or +interpreter-specific data, so it's completely safe to choose the main +interpreter here. + +.. code-block:: c + + static void + call_python(void) + { + PyInterpreterRef ref; + if (PyInterpreterState_AsStrong(PyInterpreterState_Main(), &ref) < 0) { + fputs("Python has shut down!", stderr); + return; + } + + if (PyThreadState_Ensure(ref) < 0) { + PyInterpreterRef_Close(ref); + return -1; + } + if (PyRun_SimpleString("print(42)") < 0) { + PyErr_Print(); + } + PyThreadState_Release(); + PyInterpreterRef_Close(ref); + return 0; + } + Reference Implementation ======================== A reference implementation of this PEP can be found -`here `_. +at `python/cpython#133110 `_. Rejected Ideas ============== -Using an interpreter ID instead of a interpreter state for ``PyThreadState_Ensure`` ------------------------------------------------------------------------------------ +Non-daemon Thread States +------------------------ + +In prior iterations of this PEP, interpreter references were a property of +a thread state rather than a property of an interpreter. This meant that +:c:func:`PyThreadState_Ensure` stole a strong interpreter reference, and +it was released upon calling :c:func:`PyThreadState_Release`. A thread state +that held a reference to an interpreter was known as a "non-daemon thread +state." At first, this seemed like an improvement, because it shifted management +of a reference's lifetime to the thread instead of the user, which eliminated +some boilerplate. + +However, this ended up making the proposal significantly more complex and +hurt the proposal's goals: + +- Most importantly, non-daemon thread states put too much emphasis on daemon + threads as the problem, which hurt the clarity of the PEP. Additionally, the + phrase "non-daemon" added extra confusion, because non-daemon Python threads + are explicitly joined, whereas a non-daemon C thread is only waited on + until it releases its reference. +- In many cases, an interpreter reference should outlive a singular thread + state. Stealing the interpreter reference in :c:func:`PyThreadState_Ensure` + was particularly troublesome for these cases. If :c:func:`PyThreadState_Ensure` + didn't steal a reference with non-daemon thread states, it would muddy the + ownership story of the interpreter reference, leading to a more confusing API. + +Retrofiting the Existing Structures with Reference Counts +--------------------------------------------------------- + +Interpreter-State Pointers for Reference Counting +************************************************* + +Originally, this PEP specified :c:func:`!PyInterpreterState_Hold` +and :c:func:`!PyInterpreterState_Release` for managing strong references +to an interpreter, alongside :c:func:`!PyInterpreterState_Lookup` which +converted interpreter IDs (weak references) to strong references. + +In the end, this was rejected, primarily because it was needlessly +confusing. Interpreter states hadn't ever had a reference count prior, so +there was a lack of intuition about when and where something was a strong +reference. The :c:type:`PyInterpreterRef` and :c:type:`PyInterpreterWeakRef` +types seem a lot clearer. + +Interpreter IDs for Reference Counting +************************************** Some iterations of this API took an ``int64_t interp_id`` parameter instead of ``PyInterpreterState *interp``, because interpreter IDs cannot be concurrently -deleted and cause use-after-free violations. :c:func:`PyInterpreterState_Hold` -fixes this issue anyway, but an interpreter ID does have the benefit of -requiring less magic in the implementation, but has several downsides: +deleted and cause use-after-free violations. The reference counting APIs in +this PEP sidestep this issue anyway, but an interpreter ID have the advantage +of requiring less magic: - Nearly all existing interpreter APIs already return a :c:type:`PyInterpreterState` pointer, not an interpreter ID. Functions like :c:func:`PyThreadState_GetInterpreter` would have to be accompanied by - frustrating calls to :c:func:`PyInterpreterState_GetID`. There's also - no existing way to go from an ``int64_t`` back to a - :c:expr:`PyInterpreterState *`, and providing such an API would come - with its own set of design problems. + frustrating calls to :c:func:`PyInterpreterState_GetID`. - Threads typically take a ``void *arg`` parameter, not an ``int64_t arg``. - As such, passing an interpreter pointer requires much less boilerplate + As such, passing a reference requires much less boilerplate for the user, because an additional structure definition or heap allocation would be needed to store the interpreter ID. This is especially an issue on 32-bit systems, where ``void *`` is too small for an ``int64_t``. @@ -751,11 +965,9 @@ requiring less magic in the implementation, but has several downsides: the native thread gets a chance to attach. The problem with using an interpreter ID is that the reference count has to be "invisible"; it must be tracked elsewhere in the interpreter, likely being *more* - complex than :c:func:`PyInterpreterState_Hold`. There's also a lack + complex than :c:func:`PyInterpreterRef_Get`. There's also a lack of intuition that a standalone integer could have such a thing as - a reference count. :c:func:`PyInterpreterState_Lookup` sidesteps this - problem because the reference count is always associated with the returned - interpreter state, not the integer ID. + a reference count. .. _pep-788-activate-deactivate-instead: @@ -780,7 +992,7 @@ This was ultimately rejected for two reasons: for code-generators like Cython to use, as there isn't any additional complexity with tracking :c:type:`PyThreadState` pointers around. -Using ``PyStatus`` for the return value of ``PyThreadState_Ensure`` +Using ``PyStatus`` for the Return Value of ``PyThreadState_Ensure`` ------------------------------------------------------------------- In prior iterations of this API, :c:func:`PyThreadState_Ensure` returned a @@ -803,8 +1015,8 @@ Open Issues .. _pep-788-deprecation: -When should the legacy APIs be removed? ---------------------------------------- +When Should the GIL-state APIs be Removed? +------------------------------------------ :c:func:`PyGILState_Ensure` and :c:func:`PyGILState_Release` have been around for over two decades, and it's expected that the migration will be difficult.