Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-111997: C-API for signalling monitoring events #116413

Merged
merged 102 commits into from
May 4, 2024

Conversation

iritkatriel
Copy link
Member

@iritkatriel iritkatriel commented Mar 6, 2024

@iritkatriel iritkatriel requested a review from scoder March 7, 2024 15:39
Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly that the CodeLike type is expected to be provided by user code? How do we guarantee its struct layout then? Why not define a public object struct for it that users can use C-struct inheritance with? (I.e. if they need more fields on their side, they can put the public struct into the first field of their own struct.)

Python/instrumentation.c Show resolved Hide resolved
Include/internal/pycore_instruments.h Outdated Show resolved Hide resolved
@iritkatriel
Copy link
Member Author

Do I understand correctly that the CodeLike type is expected to be provided by user code?

I think we should provide CodeLike in some form, but for now I put it in the text file so we can talk about how and where we want it to be. The offset to location mapping should also be part of this.

I don't see why this public API should live in an internal header file. And why the underscore names? We only use(d) them for non-public functions.

@markshannon did you intend for these to be private?

@markshannon
Copy link
Member

@markshannon did you intend for these to be private?

My thinking was that it is better to keep any function private unless we need it to be public. It sounds like they need to be public.

How about this?

  • The declaration of the implementation function, _PyMonitoring_FirePyStartEvent(), goes in the semi-private cpython/ folder.
  • We expose an inline function calling for the unstable API and a non-inline one for the stable API in the public header.

Something like this:

#ifndef Py_LIMITED_API
static inline int
PyMonitoring_FirePyStartEvent(PyMonitoringState *state, PyObject *codelike, uint32_t offset)
{
    if (state->active)
        return _PyMonitoring_FirePyStartEvent(state, codelike, offset);
    else
        return 0;
}
#else
extern int
PyMonitoring_FirePyStartEvent(PyMonitoringState *state, PyObject *codelike, uint32_t offset);
#endif

@markshannon
Copy link
Member

The "code like" object is entirely opaque to the monitoring machinery. As far as it is concerned, a location is a pair of an object and an integer index.
Debuggers, profilers, etc need to present that object, int as a meaningful location. If the object is a code object, they can already do that using co.co_filename and co.co_positions().
The point of a "code like" object is to provide the necessary interface for tools (debuggers, profilers, etc) to turn object, int into a meaningful source location.

So, no C struct, but a Python abstract base class.

@markshannon markshannon reopened this Mar 12, 2024
Include/monitoring.h Outdated Show resolved Hide resolved
Include/monitoring.h Outdated Show resolved Hide resolved
Python/instrumentation.c Outdated Show resolved Hide resolved
@encukou
Copy link
Member

encukou commented May 1, 2024

So, the only events that should happen with the current exception set are RAISE, RERAISE, STOP_ITERATION, C_RAISE and UNWIND.

That leaves EXCEPTION_HANDLED and PY_THROW.

@markshannon
Copy link
Member

Looking through the code for the interpeter implementation of firing events about exceptions, it looks like the exception is usually passed explicitly.

And considering that the concept of there being one (and only one) "current exception" doesn't always make sense, I think all API functions for events where the callback receives the exception should have an exception parameter.

@iritkatriel
Copy link
Member Author

Looking through the code for the interpeter implementation of firing events about exceptions, it looks like the exception is usually passed explicitly.

What do you mean? I see all those functions call do_monitor_exc without an exception, and it calls PyErr_GetRaisedException to get the exception.

@markshannon
Copy link
Member

What do you mean?

Good question. I mean that the exception is passed explicitly to any callback function.

From PEP 669:

func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any

do_monitor_exc may not take an exception as an argument, but it calls _Py_call_instrumentation_arg() which does.

Passing the exception as an argument might require the user of the API to do a bit more work, but it removes a lot of uncertainty about the "current exception".

For example, consider a C extension that is about to raise an exception. To raise an exception, it must have a reference to that exception. Since it has a reference to the exception is straightforward to pass it as an argument:

PyObject *exception; /* Strong reference to the exception to be raised */
if (PyMonitoring_FireRaiseEvent(state, codelike, offset, exception) < 0) {
    /* Would could chain the exceptions, but I'm lazy, so I'm just going to discard it */
    Py_DECREF(exception);
    return NULL;
}
PyErr_SetRaisedException(exception);
return NULL;

@iritkatriel
Copy link
Member Author

The c api also passes the exception to the callbacks. It's the Fire functions that we removed it from.

@scoder
Copy link
Contributor

scoder commented May 2, 2024 via email

@iritkatriel
Copy link
Member Author

We need to merge this pretty much now, so last call for strong objections. Otherwise we can iterate via bugfixes.

@markshannon
Copy link
Member

No strong objections.

@scoder
Copy link
Contributor

scoder commented May 3, 2024

the only events that should happen with the current exception set are RAISE, RERAISE, STOP_ITERATION, C_RAISE and UNWIND.

Does STOP_ITERATION really belong into this group? Isn't the idea rather that generators do not need to create a StopIteration in order to save time on termination?

Does CPython really create a StopIteration instance here just for monitoring it?

EDIT: it seems that it does:

cpython/Python/bytecodes.c

Lines 289 to 317 in 5248596

macro(END_FOR) = POP_TOP;
tier1 inst(INSTRUMENTED_END_FOR, (receiver, value -- receiver)) {
/* Need to create a fake StopIteration error here,
* to conform to PEP 380 */
if (PyGen_Check(receiver)) {
PyErr_SetObject(PyExc_StopIteration, value);
if (monitor_stop_iteration(tstate, frame, this_instr)) {
ERROR_NO_POP();
}
PyErr_SetRaisedException(NULL);
}
DECREF_INPUTS();
}
pure inst(END_SEND, (receiver, value -- value)) {
Py_DECREF(receiver);
}
tier1 inst(INSTRUMENTED_END_SEND, (receiver, value -- value)) {
if (PyGen_Check(receiver) || PyCoro_CheckExact(receiver)) {
PyErr_SetObject(PyExc_StopIteration, value);
if (monitor_stop_iteration(tstate, frame, this_instr)) {
ERROR_NO_POP();
}
PyErr_SetRaisedException(NULL);
}
Py_DECREF(receiver);
}

@scoder
Copy link
Contributor

scoder commented May 3, 2024

That said, this is a detail that can still be decided during the beta phase, so feel free to merge this PR as it is if there's remaining need for discussion.

@markshannon
Copy link
Member

markshannon commented May 3, 2024

Does CPython really create a StopIteration instance here just for monitoring it?

Yes, but only if you ask for it (it has zero cost if you don't ask for it, thanks to the adaptive interpreter)

https://github.com/python/cpython/blob/main/Python/bytecodes.c#L291

@markshannon
Copy link
Member

markshannon commented May 3, 2024

It seems that you already found that 🙂

@scoder
Copy link
Contributor

scoder commented May 3, 2024

Does CPython really create a StopIteration instance here just for monitoring it?

Yes, but only if you ask for it (it has zero cost if you don't ask for it, thanks to the adaptive interpreter)

Ok … then I think it would be best if CPython's monitoring infrastructure created a StopIteration instance automatically when it detects that it needs to notify about it without having one. It's in the best position to detect whether it really needs an exception instance or not. Event sources shouldn't be forced to a) figure that out themselves or b) create an instance just for it to be thrown away.

Basically, move the existing code from the byte code handlers all the way to the other side into the notification mechanism.

@scoder
Copy link
Contributor

scoder commented May 3, 2024

Basically, move the existing code from the byte code handlers all the way to the other side into the notification mechanism.

That might suggest that the signature for the StopIteration event should not receive an exception or use the current one, but should better receive the StopIteration value directly (with or without a live exception).

@iritkatriel
Copy link
Member Author

I'll merge, please create an issue to iron out the StopIteration business.

@iritkatriel iritkatriel enabled auto-merge (squash) May 4, 2024 08:01
@iritkatriel
Copy link
Member Author

I'll merge, please create an issue to iron out the StopIteration business.

Or discuss it on #111997.

@iritkatriel iritkatriel merged commit 85af789 into python:main May 4, 2024
36 checks passed
@scoder
Copy link
Contributor

scoder commented May 7, 2024

In order to avoid any confusion about blocker tickets during the beta release, I created a new ticket to discuss the StopIteration behaviour/functions here: #118692

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
scoder referenced this pull request May 19, 2024
…on (GH-103083)

* The majority of the monitoring code is in instrumentation.c

* The new instrumentation bytecodes are in bytecodes.c

* legacy_tracing.c adapts the new API to the old sys.setrace and sys.setprofile APIs
}
assert(args[2] == NULL);
args[2] = offset_obj;
Py_ssize_t nargsf = nargs | PY_VECTORCALL_ARGUMENTS_OFFSET;
Copy link
Contributor

@scoder scoder May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect. We call call_one_instrument() below, which passes nargsf on into _PyObject_VectorcallTstate() as a Py_ssize_t, but that actually wants a size_t, thus receiving a huge positive value:

#8  0x00005555558b0e7f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775811, args=0x7fffffffd048, callable=0x7ffff650d380, tstate=0x555555c2a050 <_PyRuntime+294032>) at ./Include/internal/pycore_call.h:168
#9  call_one_instrument (event=5, tool=<optimized out>, nargsf=-9223372036854775805, args=0x7fffffffd048, tstate=0x555555c2a050 <_PyRuntime+294032>, interp=<optimized out>) at Python/instrumentation.c:907
#10 capi_call_instrumentation (state=state@entry=0x7fffffffd0fa, codelike=codelike@entry=0x7ffff744cba0, offset=offset@entry=5, args=args@entry=0x7fffffffd040, nargs=<optimized out>, nargs@entry=3, event=event@entry=5) at Python/instrumentation.c:2457
#11 0x00005555558b6893 in _PyMonitoring_FireLineEvent (state=state@entry=0x7fffffffd0fa, codelike=codelike@entry=0x7ffff744cba0, offset=offset@entry=5, lineno=lineno@entry=5) at Python/instrumentation.c:2569
#12 0x00007ffff65f152c in PyMonitoring_FireLineEvent (lineno=5, offset=5, codelike=0x7ffff744cba0, state=0x7fffffffd0fa) at /opt/python3.13/include/python3.13d/cpython/monitoring.h:162

I think we should be using a size_t here directly and cast nargs before or-ing it.

That problem already seems to exist in the original sys.monitoring implementation. I've now reported it in 411b1692811b#r142178099

Edit: I now figured out that it's a minor issue, though. It gives the correct results. It still seems error prone to pass a Py_ssize_t around here because it no longer has a signed value from the moment where we or it with the argument flag.

Comment on lines +2569 to +2571
PyObject *args[4] = { NULL, NULL, NULL, lno };
int res= capi_call_instrumentation(state, codelike, offset, args, 3,
PY_MONITORING_EVENT_LINE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation, the LINE event listeners only receive two arguments, not three.
https://docs.python.org/3.13/library/sys.monitoring.html#callback-function-arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants