-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45256: Avoid C calls for more Python to Python calls. #28937
Conversation
Did anyone discover what caused the slight slowdown in the initial PR's benchmark? |
…onsume the argument references regardless of whether they succeed or fail.
Maybe the additional tests for consuming references? |
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 7c0f498 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
🤖 New build scheduled with the buildbot fleet by @pablogsal for commit 03e7ad9 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
@@ -5480,7 +5459,7 @@ initialize_locals(PyThreadState *tstate, PyFrameConstructor *con, | |||
_PyErr_Format(tstate, PyExc_TypeError, | |||
"%U() keywords must be strings", | |||
con->fc_qualname); | |||
goto fail; | |||
goto kw_fail; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit confusing. How this is this cleaning the positional arguments in the case where some positional args have been copied and we fail to copy some of the keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the positional arguments have been copied at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly, so if the call fails you need to increment those references so the cleanup succeeds, no? This is because if the references have been stolen, we own the references back to the positional arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If steal_args
is true
, then we always consume the references. #28937 (comment)
By this point the references to all positional arguments have been consumed, as have all references of kwargs up to i
(exclusive).
kw_fail
consumes the references to kwargs from i
(inclusive) to kwcount
(exclusive) so it can them jump to fail_late
as all references will have been consumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, wait a minute, I misread what we are doing! We are consuming references now, while before we were increasing them so the stack cleans them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we are avoiding the double free by shrinking the stack here, no?
STACK_SHRINK(stackadj);
// The frame has stolen all the arguments from the stack,
// so there is no need to clean them up.
Py_XDECREF(kwnames);
Py_DECREF(function);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -1702,6 +1702,11 @@ _PyEval_EvalFrameDefault(PyThreadState *tstate, InterpreterFrame *frame, int thr | |||
switch (opcode) { | |||
#endif | |||
|
|||
/* Variables used for making calls */ | |||
PyObject *kwnames; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put this into a struct instead of having more locals? It makes it a bit more clean to read and contextualize and it will have the same performance as long as the struct is stack allocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to rely on all compilers being able to do perfect escape analysis here?
I'd rather leave these as local variables so the compiler can easily allocate them to registers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way that given the size of this function this is going to end in registers. Indeed: I checked with GCC, clang, ICC and xlc in all different optimization level and not a single one places these locals on registers.
Up to you anyway, I don't feel super strongly about it but I think it helps with clarity and organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you determine which of these were in registers? Once in SSA form, there are 15 (I think) variables here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdb/dbx + breakpoint for this function + disas
+ info registers
. I checked if any of the values in these locals are stored or partially stored in registers.
I only checked x86-64 thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minor comments, otherwise LGTM.
Great work! I have not checked for refleaks, will do this afterwards but I also scheduled a buildbot run
I ran the buildbots earlier. All good except the Gentoo ones failing for tkinter stuff. |
Unfortunately, seems that this commit has broken the AMD64 FreeBSD Shared 3.x buildbot: https://buildbot.python.org/all/#/builders/483/builds/1003/steps/5/logs/stdio The buildbot was green until we merged this |
The BSD entry in https://devguide.python.org/experts/ is empty. Not only that, but the gdb tests are decidedly uninformative when they fail. I'll see what I can come up with. |
I am taking a look as well, seems that the problem is that it cannot find
|
This is one of the errors:
|
So seems that it was not able to find the previous python frame for some reason |
Something is going on, I logged into the buildbot (you need to ask koobs for access by writting to koobs@freebsd.org) and indeed many commands are broken:
while on my Linux system:
|
@markshannon I know the problem. The problem is this code: cpython/Tools/gdb/libpython.py Lines 1804 to 1813 in d02ffd1
This is not correct anymore when the frames are inlined as that frame is the top frame of all of them. The problem is that in the buildbot, the frame is optimized so it goes into this fallback. |
I will try to prepare a PR for this. |
Extends the approach used for
CALL_FUNCTION
toCALL_FUNCTION_KW
,CALL_METHOD
andCALL_METHOD_KW
.Also modifies
initialize_locals
and_PyTuple_FromArraySteal
to have the same behavior w.r.t. reference counting regardless of whether they succeed or fail.https://bugs.python.org/issue45256