Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe #116775

Merged
merged 7 commits into from
Apr 19, 2024

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented Mar 14, 2024

Stops the world when we're doing a monitoring event that impacts all threads.

Updates the initialization of the monitoring data structures so that modifications generally lock the code object.

There's various side data structures depending upon the tracing modes that are enabled. These are all allocated and published once. This is necessary because _Py_Instrument can be called on code objects pretty freely and we probably don't want locks on the readers. Races with the actual values should be benign and just result in some delay in seeing the events being enabled/disabled.

Also adds multi-threaded test cases which are passing with PYTHON_GIL=0.

@DinoV DinoV requested a review from colesbury March 14, 2024 02:05
@DinoV DinoV force-pushed the nogil_settrace branch 3 times, most recently from f65320d to 1b5ea7d Compare March 14, 2024 14:01
@DinoV DinoV changed the title tracing thread safety gh-116818: tracing thread safety Mar 14, 2024
@DinoV DinoV force-pushed the nogil_settrace branch 3 times, most recently from 40ef815 to 3bcc7f0 Compare March 14, 2024 17:13
@DinoV DinoV marked this pull request as ready for review March 14, 2024 19:55
@DinoV DinoV requested a review from markshannon March 14, 2024 19:55
@DinoV DinoV changed the title gh-116818: tracing thread safety gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe Mar 15, 2024
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me that the strategy of either locking the code object or a stop-the-world pause is safe.

  • Is locking code objects sufficient? What if other threads are executing those code objects?
  • We need to be careful about what we do doing a stop-the-world pause. It's not safe to call Py_DECREF either directly or indirectly, for example.

Python/instrumentation.c Outdated Show resolved Hide resolved
Python/legacy_tracing.c Outdated Show resolved Hide resolved
Python/instrumentation.c Outdated Show resolved Hide resolved
Python/instrumentation.c Show resolved Hide resolved
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fine grained locking is the wrong approach here.
PEP 667 specifically states that setting events is expensive.

Stopping the world for setting events and locking code objects to instrument seems like the right approach to me.

@@ -23,18 +23,25 @@ extern "C" {
#define FT_ATOMIC_LOAD_SSIZE(value) _Py_atomic_load_ssize(&value)
#define FT_ATOMIC_LOAD_SSIZE_RELAXED(value) \
_Py_atomic_load_ssize_relaxed(&value)
#define FT_ATOMIC_LOAD_PTR_ACQUIRE(value) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping things in macros often obfuscates the intent.
If I see _Py_atomic_load_ptr_acquire(&value) I know what it does (assuming I know how atomics work)
If I see FT_ATOMIC_LOAD_PTR_ACQUIRE(value) I need to scan the source code to find the macro definition.

Can we come up with a name that describes the semantics, not the context in which it should be used?
All these operations are thread safe on all platforms. Either the GIL makes them safe or we use hardware atomics.
So, maybe choose a name based on the semantics not the context, and drop the FT_ prefix?

OOI, do we have any evidence that relaxed/release/acquire gains us performance over sequential consistency which is a lot easier to reason about and thus safer.

}

static void
remove_tools(PyCodeObject * code, int offset, int event, int tools)
{
Py_BEGIN_CRITICAL_SECTION(code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to assert that this is locked here and move the critical section to the DISABLE section of call_instrumentation_vector. _Py_Instrument should already be locked (with stop the world)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want _Py_Instrument to stop the world? The reason why I didn't want to have that stop the world is because it can happen outside of calls to enable tracing, e.g. in RESUME which seems like it would be a massive slow down.

assert(instrumentation_data->per_instruction_opcodes);
int next_opcode = instrumentation_data->per_instruction_opcodes[offset];
_PyCoMonitoringData *instr_data = get_monitoring(code);
uint8_t *per_instr_opcodes = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_opcodes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for atomic operations here. per_instruction_opcodes is updated in _Py_Instrument which should only be called with the world stopped, so it cannot change while threads are running.

if (tstate->tracing) {
return next_opcode;
}
PyInterpreterState *interp = tstate->interp;
uint8_t tools = instrumentation_data->per_instruction_tools != NULL ?
instrumentation_data->per_instruction_tools[offset] :
uint8_t *per_instr_tools = FT_ATOMIC_LOAD_PTR_ACQUIRE(instr_data->per_instruction_tools);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

@@ -1320,15 +1375,23 @@ _PyMonitoring_RegisterCallback(int tool_id, int event_id, PyObject *obj)
PyInterpreterState *is = _PyInterpreterState_GET();
assert(0 <= tool_id && tool_id < PY_MONITORING_TOOL_IDS);
assert(0 <= event_id && event_id < _PY_MONITORING_EVENTS);
#ifdef Py_GIL_DISABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop the #ifdef Py_GIL_DISABLED and use (FT_)ATOMIC_EXCHANGE_POINTER_PTR?

@@ -1382,9 +1445,10 @@ initialize_tools(PyCodeObject *code)
#define NO_LINE -128

static void
initialize_lines(PyCodeObject *code)
initialize_lines(PyCodeObject *code, _PyCoLineInstrumentationData *line_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add the extra argument?
(Same for initialize_line_tools)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was so that we could initialize the data and then publish it so any readers would either see no line data or would see the initialized line data.

monitoring->line_tools = NULL;
monitoring->per_instruction_opcodes = NULL;
monitoring->per_instruction_tools = NULL;
FT_ATOMIC_STORE_PTR_RELEASE(code->_co_monitoring, monitoring);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the atomic operation here?
Won't this leak memory in case of a race?
I think it would be better to lock the code object when doing this initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually do lock the code object on initialization, but the current implementation was trying to avoid locking the code object on reads. The readers of _co_monitoring are all using acquire semantics so if there is a value there then they'll also pick all of the writes to the data which it is pointing to as well. The alternative here is to either lock the code object on reads as well or to stop the world on _Py_Instrument calls which seems to have disproportionately compared to calls to actually enable monitoring.

@@ -1746,6 +1833,10 @@ _Py_Instrument(PyCodeObject *code, PyInterpreterState *interp)

static int
instrument_all_executing_code_objects(PyInterpreterState *interp) {
#ifdef Py_GIL_DISABLED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ASSERT_WORLD_STOPPED(), much like we use ASSERT_WORLD_STOPPED_OR_LOCKED() instead of the #ifdef ...?

uint32_t existing_events = get_events(&interp->monitors, tool_id);
if (existing_events == events) {
_PyEval_StartTheWorld(interp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than three _PyEval_StartTheWorld(interp); calls, could you use our standard cleanup idiom of jumping to a done: label, and calling _PyEval_StartTheWorld(interp) there?

Python/instrumentation.c Show resolved Hide resolved
@markshannon
Copy link
Member

markshannon commented Mar 19, 2024

Note that it is not possible for a code object that needs to be instrumented to execute without hitting a RESUME (or specialized variant) instruction, which will perform the instrumentation.
Provided that _Py_Instrument locks the code object, then executing code objects can safely read the instrumentation data structures without locks or atomics.

@DinoV
Copy link
Contributor Author

DinoV commented Mar 19, 2024

Note that it is not possible for a code object that needs to be instrumented to execute without hitting a RESUME (or specialized variant) instruction, which will perform the instrumentation. Provided that _Py_Instrument locks the code object, then executing code objects can safely read the instrumentation data structures without locks or atomics.

I don't think locking alone is sufficient, unless you're referring to stopping the world to lock. If a code object starts running on one thread, hits resume, and another thread enables tracing then calls that function it's going to start mutating the code object. I'm just concerned that stopping the world to instrument every new code object is going to be extremely expensive, but if that seems like an okay performance cost to you then I'm happy to do that, it is much simpler :)

@gvanrossum
Copy link
Member

@DinoV Did you actually request a review from me or did I just get that message because I'm a code owner?

"""Runs once after the test is done"""
pass

def test_instrumention(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_instrumention(self):
def test_instrumentation(self):

@DinoV
Copy link
Contributor Author

DinoV commented Mar 27, 2024

@DinoV Did you actually request a review from me or did I just get that message because I'm a code owner?

I think you just got it because you're a code owner.

@brandtbucher
Copy link
Member

brandtbucher commented Mar 29, 2024

Ah, interesting. Thanks for digging that up.

Easy solution would be to move the CHECK_EVAL_BREAKER(); after the Py_INCREF(executor); at the end with a comment. Arguably more "correct" solution would be to check the opcode after the CHECK_EVAL_BREAKER(); and re-run the instruction if it's anything other than ENTER_EXECUTOR.

I'm slightly leaning towards the former, since the tier two code should bail immediately anyways upon seeing it's been invalidated, and it keeps the redundant check for the opcode out of the happy path. To the user, it would appear as if the jump happened just before tracing was enabled, rather than just after. I think that's okay.

@brandtbucher
Copy link
Member

Hm, actually, I think that could leak the incref'ed executor if the eval breaker check raises (I forgot there's a goto error hidden in that macro).

Python/bytecodes.c Outdated Show resolved Hide resolved
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few random things I noticed while reading the PR

Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Python/instrumentation.c Outdated Show resolved Hide resolved
Python/legacy_tracing.c Outdated Show resolved Hide resolved
@DinoV
Copy link
Contributor Author

DinoV commented Apr 2, 2024

Hm, actually, I think that could leak the incref'ed executor if the eval breaker check raises (I forgot there's a goto error hidden in that macro).

Okay, i've gone with re-checking the opcode approach. It looks like in addition to checking the opcode we've also got to check the oparg as we can seemingly end up just updating the oparg.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(You're not waiting for me are you?)

@DinoV
Copy link
Contributor Author

DinoV commented Apr 4, 2024

(You're not waiting for me are you?)

Nope, I'm more hoping to get a review from @markshannon and/or @colesbury unless you feel particularly familiar with this area :)

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Python/instrumentation.c Outdated Show resolved Hide resolved
Python/instrumentation.c Show resolved Hide resolved
@DinoV
Copy link
Contributor Author

DinoV commented Apr 15, 2024

@markshannon Do you want to take a look at this or are you happy with where it's at?

@DinoV DinoV merged commit 07525c9 into python:main Apr 19, 2024
55 of 59 checks passed
@DinoV DinoV deleted the nogil_settrace branch May 31, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants