-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-104584: Remove ip_offset
and simplify EXIT_TRACE
#108961
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the chatty review -- I have two angles, one about still being able to make sense of the debug output, another about whether _PUSH_FRAME
is the right place to set prev_instr
from the new code object. And some mumblings about wanting @markshannon to review this too (I'm not sure I 100% follow his reasoning for the LOAD
/STORE
macros he recently added to ceval_macros.h.)
@@ -668,7 +667,7 @@ translate_bytecode_to_trace( | |||
oparg = orig_oparg & 0xF; | |||
break; | |||
case OPARG_SAVE_IP: // op==SAVE_IP; oparg=next instr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the comment.
@@ -503,7 +502,7 @@ translate_bytecode_to_trace( | |||
top: // Jump here after _PUSH_FRAME | |||
for (;;) { | |||
RESERVE_RAW(2, "epilogue"); // Always need space for SAVE_IP and EXIT_TRACE | |||
ADD_TO_TRACE(SAVE_IP, INSTR_IP(instr, code), 0); | |||
ADD_TO_TRACE(SAVE_IP, 0, (uintptr_t)instr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask to keep (redundantly) the INSTR_IP(...)
argument in all cases? It's helpful for debugging -- the 64-bit hex operand is useless, but the code-relative offset can be related directly to the output from dis
, and I regularly do that.
If you worry about the cost of computing INSTR_IP()
when no debugging is selected, maybe define the macro differently (or define another macro)?
Also, I think the operand may have to be printed in hex (both here and in executor.c) -- debug output like 106858786520180
is not very helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will ultimately want to change from instruction pointer back to the offset (as we had in 3.9 and earlier), so let's keep the redundant (for now) macro.
The reason for wanting the offset, not the pointer is that we expect writes to frame->instr_ptr
to be rare, thanks to optimization, most reads actually want the offset, and it means we only need a 32 bit operand (less copying and cache pressure).
@@ -707,15 +706,13 @@ translate_bytecode_to_trace( | |||
PyUnicode_AsUTF8(new_code->co_qualname), | |||
PyUnicode_AsUTF8(new_code->co_filename), | |||
new_code->co_firstlineno); | |||
ADD_TO_TRACE(SAVE_IP, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Because you added an assignment to frame->prev_instr
to _PUSH_FRAME
, you don't need this dummy SAVE_IP
any more. Same when bailing on version mismatch below and ditto if func == NULL
. Right?
@@ -731,7 +728,6 @@ translate_bytecode_to_trace( | |||
2 * INSTR_IP(instr, code)); | |||
goto top; | |||
} | |||
ADD_TO_TRACE(SAVE_IP, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look twice before understanding how we even get here. :-) I think we could use a comment here explaining that here func == NULL
. (And/or a DPRINT()
call?)
#define LOAD_IP() \ | ||
do { ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive; } while (0) | ||
do {} while (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious what @markshannon thinks of this change. It makes me think that the "abstractions" embodied by these LOAD/STORE
macros are not as solid as we thought. And I wonder if the line that sets prev_instr
should also be a macro. (Note that the _SP
macros don't actually differ by tier.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me. LOAD_IP
should be a no-op in tier 2 execution. The IP should be virtual.
(In this context "virtual" means tracked by the region selector (translate_bytecode_to_trace
) not at runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _SP
may not differ by tier, but the copy-and-patch compiler will need to treat them differently from the tier 2 interpreter.
_PyFrame_SetStackPointer(frame, stack_pointer); | ||
Py_DECREF(self); | ||
return frame; | ||
goto deoptimize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In uops debug mode (PYTHONUOPSDEBUG=2
or higher) this would (confusingly, IMO) print DEOPT: [Opcode ...]
. Doesn't that bother you? Or do you consider EXIT_TRACE
as just another form of deoptimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It definitely is annoying. Actual deoptimizations are rare, but exiting a trace is common. So I get a lot of log spam containing endless repetitions of
DEOPT: [Opcode 300, operand 0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tier 2 only, so it doesn't matter much what the implementation is. I can see it changing quite a few times in the near future.
Whatever it is, it shouldn't have the context dependent frame->prev_instr--; // Back up to just before destination
@@ -3006,6 +3006,7 @@ dummy_func( | |||
tstate->py_recursion_remaining--; | |||
LOAD_SP(); | |||
LOAD_IP(); | |||
frame->prev_instr = _PyCode_CODE(_PyFrame_GetCode(frame)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment in ceval_macros.c -- perhaps this should also be a macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, frame->prev_instr
is set during frame creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we want to reuse this code for SEND
we don't want to be messing with frame->prev_instr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary,
frame->prev_instr
is set during frame creation.
It's set to an index of -1 (since the frame hasn't started yet), which is invalid. So we either need to set it like this, or at least do an increment.
We don't need to keep
ip_offset
in a local, since it's known statically during trace construction. The only time that we don't know it is when pushing a frame for an unknown code object, so modify_PUSH_FRAME
to set theprev_instr
of the frame being pushed (this doesn't have any overhead, since it was always being followed by aSAVE_IP
anyways). Otherwise, we just set the operand ofSAVE_IP
to the actual pointer that should be stored.Also,
EXIT_TRACE
behaves the same as an unconditionalgoto deoptimize
. Changing it to just do that instead makes things simpler, since all returns from tier two happen at labels (rather than tied up in the uop handlers).