Skip to content

Commit bc613e4

Browse files
committed
pythongh-112175: Add eval_breaker to PyThreadState
This change adds an `eval_breaker` field to `PyThreadState`, renaming the existing `eval_breaker` to `interp_eval_breaker` (its uses are explained further down). The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. There are still two situations where we want the first available thread to handle a request: - Running a garbage collection: In normal builds, we set `_PY_GC_SCHEDULED_BIT` on the current thread. In case a thread suspends before handling the collection, the bit is copied to and from `interp_eval_breaker` on thread suspend and resume, respectively. In a free-threaded build, we simply iterate over all threads and set the bit. The first thread to check its eval breaker runs the collection, unsetting the bit on all threads. - Free-threaded builds could have multiple threads attempt a GC from one trigger if we get very unlucky with thread scheduling. I didn't put any protections against this in place because a) the consequences of it happening are just that one or more threads will check the GC thresholds right after a collection finishes, which won't affect correctness and b) it's incredibly, vanishingly unlikely. - Pending calls not limited to the main thread (possible since python/cpython@757b402ea1c2). This is a little tricker, since the callback can be added from any thread, with or without the GIL held. If the targeted interpreter's GIL is locked, we signal the holding thread. When a thread is resumed, its `_PY_CALLS_TO_DO` bit is derived from the source of truth for pending calls (one of two `_pending_calls` structs). This handles situations where no thread held the GIL when the call was first added, or if the active thread did not handle the call before releasing the GIL. In a free-threaded build, all threads all signaled, similar to scheduling a GC. The source of truth for the global instrumentation version is still in `interp_eval_breaker`, in both normal and free-threaded builds. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits, and the method for keeping it up to date depends on build type. All builds first update the version in `interp_eval_breaker`, and then: - Normal builds update the version in the current thread's `eval_breaker`. When a thread takes the GIL, it copies the current version from `interp_eval_breaker` as part of the same operation that copies `_PY_GC_SCHEDULED_BIT`. - Free-threaded builds again iterate over all threads in the current interpreter, updating the version on each one. Instrumentation (and the specializing interpreter more generally) will need more work to be compatible with free-threaded builds, so these changes are just intended to maintain the status quo in normal builds for now. Other notable changes are: - The `_PY_*_BIT` macros now expand to the actual bit being set, rather than the bit's index. I think this is simpler overall. I also moved their definitions from `pycore_ceval.h` to `pycore_pystate.h`, since their main usage is on `PyThreadState`s now. - Most manipulations of `eval_breaker` are done with a new pair of functions: `_PyThreadState_Signal()` and `_PyThreadState_Unsignal()`. Having two separate functions to set/unset a bit, rather than one function that takes the bit value to use, lets us use a single atomic `or`/`and`, rather than a loop around an atomic compare/exchange like the old `_Py_set_eval_breaker_bit` function. Existing tests provide pretty good coverage for most of this functionality. The one new test I added is to make sure a GC still happens if a thread schedules it then drops the GIL before the GC runs. I don't love how complicated this test ended up so I'm open to other ideas for how to test this (or other things to test in general).
1 parent 5914a21 commit bc613e4

20 files changed

+441
-169
lines changed

Include/cpython/pystate.h

+5
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,11 @@ struct _ts {
6868
PyThreadState *next;
6969
PyInterpreterState *interp;
7070

71+
/* The global instrumentation version in high bits, plus flags indicating
72+
when to break out of the interpreter loop in lower bits. See details in
73+
pycore_pystate.h. */
74+
uintptr_t eval_breaker;
75+
7176
struct {
7277
/* Has been initialized to a safe state.
7378

Include/internal/pycore_ceval.h

+4-38
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ PyAPI_FUNC(int) _PyEval_MakePendingCalls(PyThreadState *);
4242

4343
extern void _Py_FinishPendingCalls(PyThreadState *tstate);
4444
extern void _PyEval_InitState(PyInterpreterState *);
45-
extern void _PyEval_SignalReceived(PyInterpreterState *interp);
45+
extern void _PyEval_SignalReceived(void);
4646

4747
// bitwise flags:
4848
#define _Py_PENDING_MAINTHREADONLY 1
@@ -55,7 +55,6 @@ PyAPI_FUNC(int) _PyEval_AddPendingCall(
5555
void *arg,
5656
int flags);
5757

58-
extern void _PyEval_SignalAsyncExc(PyInterpreterState *interp);
5958
#ifdef HAVE_FORK
6059
extern PyStatus _PyEval_ReInitThreads(PyThreadState *tstate);
6160
#endif
@@ -181,8 +180,9 @@ extern struct _PyInterpreterFrame* _PyEval_GetFrame(void);
181180
extern PyObject* _Py_MakeCoro(PyFunctionObject *func);
182181

183182
/* Handle signals, pending calls, GIL drop request
184-
and asynchronous exception */
185-
extern int _Py_HandlePending(PyThreadState *tstate);
183+
and asynchronous exception.
184+
Export for '_testinternalcapi' shared extension. */
185+
PyAPI_FUNC(int) _Py_HandlePending(PyThreadState *tstate);
186186

187187
extern PyObject * _PyEval_GetFrameLocals(void);
188188

@@ -200,40 +200,6 @@ int _PyEval_UnpackIterable(PyThreadState *tstate, PyObject *v, int argcnt, int a
200200
void _PyEval_FrameClearAndPop(PyThreadState *tstate, _PyInterpreterFrame *frame);
201201

202202

203-
#define _PY_GIL_DROP_REQUEST_BIT 0
204-
#define _PY_SIGNALS_PENDING_BIT 1
205-
#define _PY_CALLS_TO_DO_BIT 2
206-
#define _PY_ASYNC_EXCEPTION_BIT 3
207-
#define _PY_GC_SCHEDULED_BIT 4
208-
#define _PY_EVAL_PLEASE_STOP_BIT 5
209-
210-
/* Reserve a few bits for future use */
211-
#define _PY_EVAL_EVENTS_BITS 8
212-
#define _PY_EVAL_EVENTS_MASK ((1 << _PY_EVAL_EVENTS_BITS)-1)
213-
214-
static inline void
215-
_Py_set_eval_breaker_bit(PyInterpreterState *interp, uint32_t bit, uint32_t set)
216-
{
217-
assert(set == 0 || set == 1);
218-
uintptr_t to_set = set << bit;
219-
uintptr_t mask = ((uintptr_t)1) << bit;
220-
uintptr_t old = _Py_atomic_load_uintptr(&interp->ceval.eval_breaker);
221-
if ((old & mask) == to_set) {
222-
return;
223-
}
224-
uintptr_t new;
225-
do {
226-
new = (old & ~mask) | to_set;
227-
} while (!_Py_atomic_compare_exchange_uintptr(&interp->ceval.eval_breaker, &old, new));
228-
}
229-
230-
static inline bool
231-
_Py_eval_breaker_bit_is_set(PyInterpreterState *interp, int32_t bit)
232-
{
233-
return _Py_atomic_load_uintptr_relaxed(&interp->ceval.eval_breaker) & (((uintptr_t)1) << bit);
234-
}
235-
236-
237203
#ifdef __cplusplus
238204
}
239205
#endif

Include/internal/pycore_ceval_state.h

+8-5
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,14 @@ struct _ceval_runtime_state {
7878

7979

8080
struct _ceval_state {
81-
/* This single variable consolidates all requests to break out of
82-
* the fast path in the eval loop.
83-
* It is by far the hottest field in this struct and
84-
* should be placed at the beginning. */
85-
uintptr_t eval_breaker;
81+
/* This single variable holds the global instrumentation version and some
82+
* interpreter-global requests to break out of the fast path in the eval
83+
* loop. PyThreadState also contains an eval_breaker, which is the source
84+
* of truth when a thread is running.
85+
*
86+
* It is by far the hottest field in this struct and should be placed at
87+
* the beginning. */
88+
uintptr_t interp_eval_breaker;
8689
/* Avoid false sharing */
8790
int64_t padding[7];
8891
int recursion_limit;

Include/internal/pycore_gc.h

+2-1
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,8 @@ extern void _PySlice_ClearCache(_PyFreeListState *state);
287287
extern void _PyDict_ClearFreeList(_PyFreeListState *state, int is_finalization);
288288
extern void _PyAsyncGen_ClearFreeLists(_PyFreeListState *state, int is_finalization);
289289
extern void _PyContext_ClearFreeList(_PyFreeListState *state, int is_finalization);
290-
extern void _Py_ScheduleGC(PyInterpreterState *interp);
290+
// Export for '_testinternalcapi' shared extension.
291+
PyAPI_FUNC(void) _Py_ScheduleGC(PyThreadState *interp);
291292
extern void _Py_RunGC(PyThreadState *tstate);
292293

293294
#ifdef __cplusplus

Include/internal/pycore_pystate.h

+36
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,42 @@ static inline _PyFreeListState* _PyFreeListState_GET(void)
282282
#endif
283283
}
284284

285+
/* Bits that can be set in PyThreadState.eval_breaker */
286+
#define _PY_GIL_DROP_REQUEST_BIT (1U << 0)
287+
#define _PY_SIGNALS_PENDING_BIT (1U << 1)
288+
#define _PY_CALLS_TO_DO_BIT (1U << 2)
289+
#define _PY_ASYNC_EXCEPTION_BIT (1U << 3)
290+
#define _PY_GC_SCHEDULED_BIT (1U << 4)
291+
#define _PY_EVAL_PLEASE_STOP_BIT (1U << 5)
292+
293+
/* Reserve a few bits for future use */
294+
#define _PY_EVAL_EVENTS_BITS 8
295+
#define _PY_EVAL_EVENTS_MASK ((1U << _PY_EVAL_EVENTS_BITS)-1)
296+
297+
static inline void
298+
_PyThreadState_Signal(PyThreadState *tstate, uintptr_t bit)
299+
{
300+
_Py_atomic_or_uintptr(&tstate->eval_breaker, bit);
301+
}
302+
303+
static inline void
304+
_PyThreadState_Unsignal(PyThreadState *tstate, uintptr_t bit)
305+
{
306+
_Py_atomic_and_uintptr(&tstate->eval_breaker, ~bit);
307+
}
308+
309+
static inline int
310+
_PyThreadState_IsSignalled(PyThreadState *tstate, uintptr_t bit)
311+
{
312+
uintptr_t b = _Py_atomic_load_uintptr_relaxed(&tstate->eval_breaker);
313+
return (b & bit) != 0;
314+
}
315+
316+
// Free-threaded builds use these functions to set or unset a bit on all
317+
// threads in the given interpreter.
318+
void _PyInterpreterState_SignalAll(PyInterpreterState *interp, uintptr_t bit);
319+
void _PyInterpreterState_UnsignalAll(PyInterpreterState *interp, uintptr_t bit);
320+
285321
#ifdef __cplusplus
286322
}
287323
#endif

Include/internal/pycore_runtime.h

+3
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,10 @@ typedef struct pyruntimestate {
191191
int64_t next_id;
192192
} interpreters;
193193

194+
/* Platform-specific identifier and PyThreadState, respectively, for the
195+
main thread in the main interpreter. */
194196
unsigned long main_thread;
197+
PyThreadState *main_tstate;
195198

196199
/* ---------- IMPORTANT ---------------------------
197200
The fields above this line are declared as early as

Lib/test/test_gc.py

+41
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
from test.support.os_helper import temp_dir, TESTFN, unlink
77
from test.support.script_helper import assert_python_ok, make_script
88
from test.support import threading_helper
9+
try:
10+
import _testinternalcapi
11+
except ImportError:
12+
_testinternalcapi = None
913

1014
import gc
1115
import sys
@@ -1418,6 +1422,43 @@ def test_ast_fini(self):
14181422
assert_python_ok("-c", code)
14191423

14201424

1425+
class GCSchedulingTests(unittest.TestCase):
1426+
@unittest.skipIf(_testinternalcapi is None,
1427+
"Requires functions from _testinternalcapi")
1428+
@threading_helper.requires_working_threading()
1429+
def test_gc_schedule_before_thread_switch(self):
1430+
# Ensure that a scheduled collection is not lost due to thread
1431+
# switching. Most of the work happens in helper functions in
1432+
# _testinternalcapi.
1433+
1434+
class Cycle:
1435+
def __init__(self):
1436+
self._self = self
1437+
1438+
thresholds = gc.get_threshold()
1439+
gc.enable()
1440+
1441+
try:
1442+
state = _testinternalcapi.schedule_gc_new_state()
1443+
1444+
def thread1():
1445+
_testinternalcapi.schedule_gc_do_schedule(state)
1446+
1447+
gc.set_threshold(1)
1448+
threads = [threading.Thread(target=thread1)]
1449+
with threading_helper.start_threads(threads):
1450+
r = weakref.ref(Cycle())
1451+
_testinternalcapi.schedule_gc_do_wait(state)
1452+
1453+
# Ensure that at least one GC has happened
1454+
for i in range(5):
1455+
self.assertEqual(1, 1)
1456+
self.assertIsNone(r())
1457+
finally:
1458+
gc.disable()
1459+
gc.set_threshold(*thresholds)
1460+
1461+
14211462
def setUpModule():
14221463
global enabled, debug
14231464
enabled = gc.isenabled()

Modules/_testinternalcapi.c

+114
Original file line numberDiff line numberDiff line change
@@ -1650,6 +1650,117 @@ get_rare_event_counters(PyObject *self, PyObject *type)
16501650
);
16511651
}
16521652

1653+
// The schedule_gc_* functions work together to test GC timing and the eval
1654+
// breaker, when used by
1655+
// test_gc.py:GCSchedulingTests.test_gc_schedule_before_thread_switch().
1656+
//
1657+
// The expected sequence of events is:
1658+
// - thread 2 waits for thread 1 to be ready
1659+
// - thread 1 waits for thread 2 to be ready
1660+
// (both threads are now at known locations in their respective C functions)
1661+
// - thread 1 clears out pending eval breaker flags
1662+
// - thread 2 checks that a GC is not scheduled
1663+
// - thread 1 schedules a GC and releases the GIL without checking its eval breaker
1664+
// - thread 2 checks that a GC is scheduled and returns
1665+
// - thread 1 sees that thread 2 is done and returns, allowing Python code to run again
1666+
typedef enum {
1667+
SCHEDULE_GC_INIT,
1668+
SCHEDULE_GC_THREAD1_READY,
1669+
SCHEDULE_GC_THREAD2_READY,
1670+
SCHEDULE_GC_THREAD1_CLEARED,
1671+
SCHEDULE_GC_THREAD2_VERIFIED,
1672+
SCHEDULE_GC_THREAD1_SCHEDULED,
1673+
SCHEDULE_GC_THREAD2_DONE,
1674+
1675+
SCHEDULE_GC_STOP,
1676+
} schedule_gc_state;
1677+
1678+
static void
1679+
schedule_gc_state_destructor(PyObject *capsule)
1680+
{
1681+
void *state = PyCapsule_GetPointer(capsule, NULL);
1682+
assert(state != NULL);
1683+
free(state);
1684+
}
1685+
1686+
static PyObject *
1687+
schedule_gc_new_state(PyObject *self, PyObject *Py_UNUSED(ignored))
1688+
{
1689+
schedule_gc_state *state = malloc(sizeof(schedule_gc_state));
1690+
if (state == NULL) {
1691+
PyErr_SetString(PyExc_RuntimeError, "Failed to allocate state");
1692+
return NULL;
1693+
}
1694+
*state = SCHEDULE_GC_INIT;
1695+
return PyCapsule_New(state, NULL, schedule_gc_state_destructor);
1696+
}
1697+
1698+
// Repeatedly release the GIL until the desired state appears in *state.
1699+
#define SCHEDULE_GC_WAIT_FOR(desired) \
1700+
do { \
1701+
while (*state != desired) { \
1702+
if (*state == SCHEDULE_GC_STOP) { \
1703+
Py_RETURN_NONE; \
1704+
} \
1705+
PyEval_RestoreThread(PyEval_SaveThread()); \
1706+
} \
1707+
} while (0)
1708+
1709+
static PyObject *
1710+
schedule_gc_do_schedule(PyObject *self, PyObject *capsule)
1711+
{
1712+
PyThreadState *tstate = PyThreadState_Get();
1713+
schedule_gc_state *state = PyCapsule_GetPointer(capsule, NULL);
1714+
assert(state != NULL);
1715+
1716+
*state = SCHEDULE_GC_THREAD1_READY;
1717+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD2_READY);
1718+
1719+
if (_Py_HandlePending(tstate) < 0) {
1720+
*state = SCHEDULE_GC_STOP;
1721+
return NULL;
1722+
}
1723+
*state = SCHEDULE_GC_THREAD1_CLEARED;
1724+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD2_VERIFIED);
1725+
1726+
_Py_ScheduleGC(tstate);
1727+
*state = SCHEDULE_GC_THREAD1_SCHEDULED;
1728+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD2_DONE);
1729+
1730+
Py_RETURN_NONE;
1731+
}
1732+
1733+
static PyObject *
1734+
schedule_gc_do_wait(PyObject *self, PyObject *capsule)
1735+
{
1736+
PyThreadState *tstate = PyThreadState_Get();
1737+
schedule_gc_state *state = PyCapsule_GetPointer(capsule, NULL);
1738+
assert(state != NULL);
1739+
1740+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD1_READY);
1741+
1742+
*state = SCHEDULE_GC_THREAD2_READY;
1743+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD1_CLEARED);
1744+
1745+
if (_PyThreadState_IsSignalled(tstate, _PY_GC_SCHEDULED_BIT)) {
1746+
PyErr_SetString(PyExc_AssertionError,
1747+
"GC_SCHEDULED_BIT unexpectedly set");
1748+
return NULL;
1749+
}
1750+
*state = SCHEDULE_GC_THREAD2_VERIFIED;
1751+
SCHEDULE_GC_WAIT_FOR(SCHEDULE_GC_THREAD1_SCHEDULED);
1752+
1753+
if (!_PyThreadState_IsSignalled(tstate, _PY_GC_SCHEDULED_BIT)) {
1754+
PyErr_SetString(PyExc_AssertionError,
1755+
"GC_SCHEDULED_BIT not carried over from thread 1");
1756+
return NULL;
1757+
}
1758+
*state = SCHEDULE_GC_THREAD2_DONE;
1759+
// Let the GC run naturally once we've returned to Python.
1760+
1761+
Py_RETURN_NONE;
1762+
}
1763+
16531764

16541765
#ifdef Py_GIL_DISABLED
16551766
static PyObject *
@@ -1727,6 +1838,9 @@ static PyMethodDef module_functions[] = {
17271838
_TESTINTERNALCAPI_TEST_LONG_NUMBITS_METHODDEF
17281839
{"get_type_module_name", get_type_module_name, METH_O},
17291840
{"get_rare_event_counters", get_rare_event_counters, METH_NOARGS},
1841+
{"schedule_gc_new_state", schedule_gc_new_state, METH_NOARGS},
1842+
{"schedule_gc_do_schedule", schedule_gc_do_schedule, METH_O},
1843+
{"schedule_gc_do_wait", schedule_gc_do_wait, METH_O},
17301844
#ifdef Py_GIL_DISABLED
17311845
{"py_thread_id", get_py_thread_id, METH_NOARGS},
17321846
#endif

Modules/signalmodule.c

+3-7
Original file line numberDiff line numberDiff line change
@@ -276,11 +276,7 @@ trip_signal(int sig_num)
276276
cleared in PyErr_CheckSignals() before .tripped. */
277277
_Py_atomic_store_int(&is_tripped, 1);
278278

279-
/* Signals are always handled by the main interpreter */
280-
PyInterpreterState *interp = _PyInterpreterState_Main();
281-
282-
/* Notify ceval.c */
283-
_PyEval_SignalReceived(interp);
279+
_PyEval_SignalReceived();
284280

285281
/* And then write to the wakeup fd *after* setting all the globals and
286282
doing the _PyEval_SignalReceived. We used to write to the wakeup fd
@@ -303,6 +299,7 @@ trip_signal(int sig_num)
303299

304300
int fd = wakeup.fd;
305301
if (fd != INVALID_FD) {
302+
PyInterpreterState *interp = _PyInterpreterState_Main();
306303
unsigned char byte = (unsigned char)sig_num;
307304
#ifdef MS_WINDOWS
308305
if (wakeup.use_send) {
@@ -1770,8 +1767,7 @@ PyErr_CheckSignals(void)
17701767
Python code to ensure signals are handled. Checking for the GC here
17711768
allows long running native code to clean cycles created using the C-API
17721769
even if it doesn't run the evaluation loop */
1773-
if (_Py_eval_breaker_bit_is_set(tstate->interp, _PY_GC_SCHEDULED_BIT)) {
1774-
_Py_set_eval_breaker_bit(tstate->interp, _PY_GC_SCHEDULED_BIT, 0);
1770+
if (_PyThreadState_IsSignalled(tstate, _PY_GC_SCHEDULED_BIT)) {
17751771
_Py_RunGC(tstate);
17761772
}
17771773

0 commit comments

Comments
 (0)