gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe #116775

DinoV · 2024-03-14T02:05:29Z

Stops the world when we're doing a monitoring event that impacts all threads.

Updates the initialization of the monitoring data structures so that modifications generally lock the code object.

There's various side data structures depending upon the tracing modes that are enabled. These are all allocated and published once. This is necessary because _Py_Instrument can be called on code objects pretty freely and we probably don't want locks on the readers. Races with the actual values should be benign and just result in some delay in seeing the events being enabled/disabled.

Also adds multi-threaded test cases which are passing with PYTHON_GIL=0.

Issue: Make sys.settrace, sys.setprofile, and sys.monitoring thread safe in --disable-gil builds #116818

colesbury

It's not clear to me that the strategy of either locking the code object or a stop-the-world pause is safe.

Is locking code objects sufficient? What if other threads are executing those code objects?
We need to be careful about what we do doing a stop-the-world pause. It's not safe to call Py_DECREF either directly or indirectly, for example.

Python/instrumentation.c

Python/legacy_tracing.c

Python/instrumentation.c

markshannon

I think fine grained locking is the wrong approach here.
PEP 667 specifically states that setting events is expensive.

Stopping the world for setting events and locking code objects to instrument seems like the right approach to me.

markshannon · 2024-03-19T09:14:20Z

Include/internal/pycore_pyatomic_ft_wrappers.h

@@ -23,18 +23,25 @@ extern "C" {
 #define FT_ATOMIC_LOAD_SSIZE(value) _Py_atomic_load_ssize(&value)
 #define FT_ATOMIC_LOAD_SSIZE_RELAXED(value) \
 _Py_atomic_load_ssize_relaxed(&value)
+#define FT_ATOMIC_LOAD_PTR_ACQUIRE(value) \


Wrapping things in macros often obfuscates the intent.
If I see _Py_atomic_load_ptr_acquire(&value) I know what it does (assuming I know how atomics work)
If I see FT_ATOMIC_LOAD_PTR_ACQUIRE(value) I need to scan the source code to find the macro definition.

Can we come up with a name that describes the semantics, not the context in which it should be used?
All these operations are thread safe on all platforms. Either the GIL makes them safe or we use hardware atomics.
So, maybe choose a name based on the semantics not the context, and drop the FT_ prefix?

OOI, do we have any evidence that relaxed/release/acquire gains us performance over sequential consistency which is a lot easier to reason about and thus safer.

markshannon · 2024-03-19T09:36:13Z

Python/instrumentation.c

 }

 static void
 remove_tools(PyCodeObject * code, int offset, int event, int tools)
 {
+ Py_BEGIN_CRITICAL_SECTION(code);


I think it would be better to assert that this is locked here and move the critical section to the DISABLE section of call_instrumentation_vector. _Py_Instrument should already be locked (with stop the world)

Do we really want _Py_Instrument to stop the world? The reason why I didn't want to have that stop the world is because it can happen outside of calls to enable tracing, e.g. in RESUME which seems like it would be a massive slow down.

markshannon · 2024-03-19T09:41:18Z

Python/instrumentation.c

- assert(instrumentation_data->per_instruction_opcodes);
- int next_opcode = instrumentation_data->per_instruction_opcodes[offset];
+ _PyCoMonitoringData *instr_data = get_monitoring(code);
+ uint8_t *per_instr_opcodes = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_opcodes);


No need for atomic operations here. per_instruction_opcodes is updated in _Py_Instrument which should only be called with the world stopped, so it cannot change while threads are running.

markshannon · 2024-03-19T09:41:31Z

Python/instrumentation.c

 if (tstate->tracing) {
 return next_opcode;
 }
 PyInterpreterState *interp = tstate->interp;
- uint8_t tools = instrumentation_data->per_instruction_tools != NULL ?
- instrumentation_data->per_instruction_tools[offset] :
+ uint8_t *per_instr_tools = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_tools);


markshannon · 2024-03-19T09:43:28Z

Python/instrumentation.c

@@ -1320,15 +1375,23 @@ _PyMonitoring_RegisterCallback(int tool_id, int event_id, PyObject *obj)
 PyInterpreterState *is = _PyInterpreterState_GET();
 assert(0 <= tool_id && tool_id < PY_MONITORING_TOOL_IDS);
 assert(0 <= event_id && event_id < _PY_MONITORING_EVENTS);
+#ifdef Py_GIL_DISABLED


Maybe drop the #ifdef Py_GIL_DISABLED and use (FT_)ATOMIC_EXCHANGE_POINTER_PTR?

markshannon · 2024-03-19T09:45:37Z

Python/instrumentation.c

@@ -1382,9 +1445,10 @@ initialize_tools(PyCodeObject *code)
 #define NO_LINE -128

 static void
-initialize_lines(PyCodeObject *code)
+initialize_lines(PyCodeObject *code, _PyCoLineInstrumentationData *line_data)


Why add the extra argument?
(Same for initialize_line_tools)

This was so that we could initialize the data and then publish it so any readers would either see no line data or would see the initialized line data.

markshannon · 2024-03-19T09:46:54Z

Python/instrumentation.c

+ monitoring->line_tools = NULL;
+ monitoring->per_instruction_opcodes = NULL;
+ monitoring->per_instruction_tools = NULL;
+ FT_ATOMIC_STORE_PTR_RELEASE(code->_co_monitoring, monitoring);


Why the atomic operation here?
Won't this leak memory in case of a race?
I think it would be better to lock the code object when doing this initialization.

We actually do lock the code object on initialization, but the current implementation was trying to avoid locking the code object on reads. The readers of _co_monitoring are all using acquire semantics so if there is a value there then they'll also pick all of the writes to the data which it is pointing to as well. The alternative here is to either lock the code object on reads as well or to stop the world on _Py_Instrument calls which seems to have disproportionately compared to calls to actually enable monitoring.

markshannon · 2024-03-19T09:49:13Z

Python/instrumentation.c

@@ -1746,6 +1833,10 @@ _Py_Instrument(PyCodeObject *code, PyInterpreterState *interp)

 static int
 instrument_all_executing_code_objects(PyInterpreterState *interp) {
+#ifdef Py_GIL_DISABLED


Use ASSERT_WORLD_STOPPED(), much like we use ASSERT_WORLD_STOPPED_OR_LOCKED() instead of the #ifdef ...?

markshannon · 2024-03-19T09:51:21Z

Python/instrumentation.c

 uint32_t existing_events = get_events(&interp->monitors, tool_id);
 if (existing_events == events) {
+ _PyEval_StartTheWorld(interp);


Rather than three _PyEval_StartTheWorld(interp); calls, could you use our standard cleanup idiom of jumping to a done: label, and calling _PyEval_StartTheWorld(interp) there?

Python/instrumentation.c

markshannon · 2024-03-19T10:07:27Z

Note that it is not possible for a code object that needs to be instrumented to execute without hitting a RESUME (or specialized variant) instruction, which will perform the instrumentation.
Provided that _Py_Instrument locks the code object, then executing code objects can safely read the instrumentation data structures without locks or atomics.

DinoV · 2024-03-19T19:29:52Z

Note that it is not possible for a code object that needs to be instrumented to execute without hitting a RESUME (or specialized variant) instruction, which will perform the instrumentation. Provided that _Py_Instrument locks the code object, then executing code objects can safely read the instrumentation data structures without locks or atomics.

I don't think locking alone is sufficient, unless you're referring to stopping the world to lock. If a code object starts running on one thread, hits resume, and another thread enables tracing then calls that function it's going to start mutating the code object. I'm just concerned that stopping the world to instrument every new code object is going to be extremely expensive, but if that seems like an okay performance cost to you then I'm happy to do that, it is much simpler :)

gvanrossum · 2024-03-27T03:00:58Z

@DinoV Did you actually request a review from me or did I just get that message because I'm a code owner?

colesbury · 2024-03-27T19:42:32Z

Lib/test/test_free_threading/test_monitoring.py

+ """Runs once after the test is done"""
+ pass
+
+ def test_instrumention(self):


Suggested change

def test_instrumention(self):

def test_instrumentation(self):

DinoV · 2024-03-27T20:36:34Z

@DinoV Did you actually request a review from me or did I just get that message because I'm a code owner?

I think you just got it because you're a code owner.

brandtbucher · 2024-03-29T19:55:49Z

Ah, interesting. Thanks for digging that up.

Easy solution would be to move the CHECK_EVAL_BREAKER(); after the Py_INCREF(executor); at the end with a comment. Arguably more "correct" solution would be to check the opcode after the CHECK_EVAL_BREAKER(); and re-run the instruction if it's anything other than ENTER_EXECUTOR.

I'm slightly leaning towards the former, since the tier two code should bail immediately anyways upon seeing it's been invalidated, and it keeps the redundant check for the opcode out of the happy path. To the user, it would appear as if the jump happened just before tracing was enabled, rather than just after. I think that's okay.

brandtbucher · 2024-03-29T20:01:43Z

Hm, actually, I think that could leak the incref'ed executor if the eval breaker check raises (I forgot there's a goto error hidden in that macro).

Python/bytecodes.c

colesbury

A few random things I noticed while reading the PR

Include/cpython/pyatomic_msc.h

Python/instrumentation.c

Python/legacy_tracing.c

DinoV · 2024-04-02T20:55:10Z

Hm, actually, I think that could leak the incref'ed executor if the eval breaker check raises (I forgot there's a goto error hidden in that macro).

Okay, i've gone with re-checking the opcode approach. It looks like in addition to checking the opcode we've also got to check the oparg as we can seemingly end up just updating the oparg.

gvanrossum

(You're not waiting for me are you?)

DinoV · 2024-04-04T06:25:18Z

(You're not waiting for me are you?)

Nope, I'm more hoping to get a review from @markshannon and/or @colesbury unless you feel particularly familiar with this area :)

colesbury

LGTM!

Python/instrumentation.c

DinoV · 2024-04-15T17:15:32Z

@markshannon Do you want to take a look at this or are you happy with where it's at?

DinoV added skip news topic-free-threading labels Mar 14, 2024

DinoV requested a review from colesbury March 14, 2024 02:05

DinoV force-pushed the nogil_settrace branch 3 times, most recently from f65320d to 1b5ea7d Compare March 14, 2024 14:01

DinoV changed the title ~~tracing thread safety~~ gh-116818: tracing thread safety Mar 14, 2024

bedevere-app bot mentioned this pull request Mar 14, 2024

Make sys.settrace, sys.setprofile, and sys.monitoring thread safe in --disable-gil builds #116818

Closed

DinoV force-pushed the nogil_settrace branch 3 times, most recently from 40ef815 to 3bcc7f0 Compare March 14, 2024 17:13

DinoV mentioned this pull request Mar 14, 2024

gh-116868: Avoid locking in PyType_IsSubtype #116829

Merged

DinoV force-pushed the nogil_settrace branch from 3bcc7f0 to d5a6b2e Compare March 14, 2024 19:12

DinoV marked this pull request as ready for review March 14, 2024 19:55

DinoV requested a review from ericsnowcurrently as a code owner March 14, 2024 19:55

bedevere-app bot added the awaiting core review label Mar 14, 2024

DinoV requested a review from markshannon March 14, 2024 19:55

DinoV force-pushed the nogil_settrace branch from d5a6b2e to d8ea6bb Compare March 15, 2024 14:42

DinoV changed the title ~~gh-116818: tracing thread safety~~ gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe Mar 15, 2024

colesbury mentioned this pull request Mar 15, 2024

Make the Python test suite pass with the GIL disabled #116749

Closed

colesbury reviewed Mar 18, 2024

View reviewed changes

Python/instrumentation.c Outdated Show resolved Hide resolved

Python/legacy_tracing.c Outdated Show resolved Hide resolved

Python/instrumentation.c Outdated Show resolved Hide resolved

Python/instrumentation.c Show resolved Hide resolved

markshannon reviewed Mar 19, 2024

View reviewed changes

DinoV force-pushed the nogil_settrace branch from d8ea6bb to b973188 Compare March 25, 2024 20:04

DinoV requested a review from gvanrossum as a code owner March 25, 2024 20:04

colesbury reviewed Mar 27, 2024

View reviewed changes

DinoV force-pushed the nogil_settrace branch from b973188 to ac91d8f Compare March 27, 2024 20:39

DinoV force-pushed the nogil_settrace branch from 88f8634 to 773e016 Compare March 29, 2024 20:03

gvanrossum reviewed Mar 29, 2024

View reviewed changes

Python/bytecodes.c Outdated Show resolved Hide resolved

colesbury reviewed Apr 2, 2024

View reviewed changes

Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved

Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved

Python/instrumentation.c Outdated Show resolved Hide resolved

Python/legacy_tracing.c Outdated Show resolved Hide resolved

DinoV force-pushed the nogil_settrace branch from 0b1ec19 to ab0ede8 Compare April 2, 2024 19:00

DinoV requested a review from brandtbucher as a code owner April 2, 2024 19:00

DinoV force-pushed the nogil_settrace branch from ab0ede8 to 1cec6a4 Compare April 2, 2024 20:07

DinoV force-pushed the nogil_settrace branch from 1cec6a4 to 9f2d6b9 Compare April 2, 2024 22:57

gvanrossum reviewed Apr 4, 2024

View reviewed changes

colesbury approved these changes Apr 4, 2024

View reviewed changes

Python/instrumentation.c Outdated Show resolved Hide resolved

Python/instrumentation.c Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels Apr 4, 2024

DinoV force-pushed the nogil_settrace branch from 90192e6 to b8c10f5 Compare April 4, 2024 23:18

DinoV force-pushed the nogil_settrace branch from b8c10f5 to c39f9e8 Compare April 15, 2024 17:15

DinoV added 7 commits April 19, 2024 09:50

tracing thread safety

2f6ca58

Lock on allocation, use version as sync point

fe19d59

Lock in non-debug builds

210b802

Move CHECK_EVAL_BREAKER in ENTER_EXECUTOR

597f40d

Don't potentially leak executor

8393bfe

Just use _Py_atomic_exchange_ptr instead of FT macro

65ff505

Use FT_ATOMIC_LOAD_UINTPTR_ACQUIRE for INSTRUMENTED_RESUME too

6e19601

DinoV force-pushed the nogil_settrace branch from 3bface0 to 6e19601 Compare April 19, 2024 16:51

DinoV mentioned this pull request Apr 19, 2024

gh-117657: Fix small issues with instrumentation and TSAN #118064

Merged

DinoV merged commit 07525c9 into python:main Apr 19, 2024
55 of 59 checks passed

bedevere-app bot removed the awaiting merge label Apr 19, 2024

DinoV deleted the nogil_settrace branch May 31, 2024 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe #116775

gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe #116775

DinoV commented Mar 14, 2024 •

edited

Loading

colesbury left a comment

markshannon left a comment

markshannon Mar 19, 2024

markshannon Mar 19, 2024

DinoV Mar 19, 2024

markshannon Mar 19, 2024

markshannon Mar 19, 2024

markshannon Mar 19, 2024

markshannon Mar 19, 2024

DinoV Mar 19, 2024

markshannon Mar 19, 2024

DinoV Mar 19, 2024

markshannon Mar 19, 2024

markshannon Mar 19, 2024

markshannon commented Mar 19, 2024 •

edited

Loading

DinoV commented Mar 19, 2024

gvanrossum commented Mar 27, 2024

colesbury Mar 27, 2024

DinoV commented Mar 27, 2024

brandtbucher commented Mar 29, 2024 •

edited

Loading

brandtbucher commented Mar 29, 2024

colesbury left a comment

DinoV commented Apr 2, 2024

gvanrossum left a comment

DinoV commented Apr 4, 2024

colesbury left a comment

DinoV commented Apr 15, 2024

	def test_instrumention(self):
	def test_instrumentation(self):

gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe #116775

gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe #116775

Conversation

DinoV commented Mar 14, 2024 • edited Loading

colesbury left a comment

Choose a reason for hiding this comment

markshannon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markshannon commented Mar 19, 2024 • edited Loading

DinoV commented Mar 19, 2024

gvanrossum commented Mar 27, 2024

Choose a reason for hiding this comment

DinoV commented Mar 27, 2024

brandtbucher commented Mar 29, 2024 • edited Loading

brandtbucher commented Mar 29, 2024

colesbury left a comment

Choose a reason for hiding this comment

DinoV commented Apr 2, 2024

gvanrossum left a comment

Choose a reason for hiding this comment

DinoV commented Apr 4, 2024

colesbury left a comment

Choose a reason for hiding this comment

DinoV commented Apr 15, 2024

gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe #116775

gh-116818: Make `sys.settrace`, `sys.setprofile`, and monitoring thread-safe #116775

DinoV commented Mar 14, 2024 •

edited

Loading

markshannon commented Mar 19, 2024 •

edited

Loading

brandtbucher commented Mar 29, 2024 •

edited

Loading