Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-104584: Remove ip_offset and simplify EXIT_TRACE #108961

Closed
wants to merge 1 commit into from

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Sep 5, 2023

We don't need to keep ip_offset in a local, since it's known statically during trace construction. The only time that we don't know it is when pushing a frame for an unknown code object, so modify _PUSH_FRAME to set the prev_instr of the frame being pushed (this doesn't have any overhead, since it was always being followed by a SAVE_IP anyways). Otherwise, we just set the operand of SAVE_IP to the actual pointer that should be stored.

Also, EXIT_TRACE behaves the same as an unconditional goto deoptimize. Changing it to just do that instead makes things simpler, since all returns from tier two happen at labels (rather than tied up in the uop handlers).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the chatty review -- I have two angles, one about still being able to make sense of the debug output, another about whether _PUSH_FRAME is the right place to set prev_instr from the new code object. And some mumblings about wanting @markshannon to review this too (I'm not sure I 100% follow his reasoning for the LOAD/STORE macros he recently added to ceval_macros.h.)

@@ -668,7 +667,7 @@ translate_bytecode_to_trace(
oparg = orig_oparg & 0xF;
break;
case OPARG_SAVE_IP: // op==SAVE_IP; oparg=next instr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the comment.

@@ -503,7 +502,7 @@ translate_bytecode_to_trace(
top: // Jump here after _PUSH_FRAME
for (;;) {
RESERVE_RAW(2, "epilogue"); // Always need space for SAVE_IP and EXIT_TRACE
ADD_TO_TRACE(SAVE_IP, INSTR_IP(instr, code), 0);
ADD_TO_TRACE(SAVE_IP, 0, (uintptr_t)instr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask to keep (redundantly) the INSTR_IP(...) argument in all cases? It's helpful for debugging -- the 64-bit hex operand is useless, but the code-relative offset can be related directly to the output from dis, and I regularly do that.

If you worry about the cost of computing INSTR_IP() when no debugging is selected, maybe define the macro differently (or define another macro)?

Also, I think the operand may have to be printed in hex (both here and in executor.c) -- debug output like 106858786520180 is not very helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will ultimately want to change from instruction pointer back to the offset (as we had in 3.9 and earlier), so let's keep the redundant (for now) macro.

The reason for wanting the offset, not the pointer is that we expect writes to frame->instr_ptr to be rare, thanks to optimization, most reads actually want the offset, and it means we only need a 32 bit operand (less copying and cache pressure).

@@ -707,15 +706,13 @@ translate_bytecode_to_trace(
PyUnicode_AsUTF8(new_code->co_qualname),
PyUnicode_AsUTF8(new_code->co_filename),
new_code->co_firstlineno);
ADD_TO_TRACE(SAVE_IP, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Because you added an assignment to frame->prev_instr to _PUSH_FRAME, you don't need this dummy SAVE_IP any more. Same when bailing on version mismatch below and ditto if func == NULL. Right?

@@ -731,7 +728,6 @@ translate_bytecode_to_trace(
2 * INSTR_IP(instr, code));
goto top;
}
ADD_TO_TRACE(SAVE_IP, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look twice before understanding how we even get here. :-) I think we could use a comment here explaining that here func == NULL. (And/or a DPRINT() call?)

Comment on lines 398 to +399
#define LOAD_IP() \
do { ip_offset = (_Py_CODEUNIT *)_PyFrame_GetCode(frame)->co_code_adaptive; } while (0)
do {} while (0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious what @markshannon thinks of this change. It makes me think that the "abstractions" embodied by these LOAD/STORE macros are not as solid as we thought. And I wonder if the line that sets prev_instr should also be a macro. (Note that the _SP macros don't actually differ by tier.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me. LOAD_IP should be a no-op in tier 2 execution. The IP should be virtual.

(In this context "virtual" means tracked by the region selector (translate_bytecode_to_trace) not at runtime)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _SP may not differ by tier, but the copy-and-patch compiler will need to treat them differently from the tier 2 interpreter.

_PyFrame_SetStackPointer(frame, stack_pointer);
Py_DECREF(self);
return frame;
goto deoptimize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In uops debug mode (PYTHONUOPSDEBUG=2 or higher) this would (confusingly, IMO) print DEOPT: [Opcode ...]. Doesn't that bother you? Or do you consider EXIT_TRACE as just another form of deoptimization?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely is annoying. Actual deoptimizations are rare, but exiting a trace is common. So I get a lot of log spam containing endless repetitions of

DEOPT: [Opcode 300, operand 0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tier 2 only, so it doesn't matter much what the implementation is. I can see it changing quite a few times in the near future.
Whatever it is, it shouldn't have the context dependent frame->prev_instr--; // Back up to just before destination

@@ -3006,6 +3006,7 @@ dummy_func(
tstate->py_recursion_remaining--;
LOAD_SP();
LOAD_IP();
frame->prev_instr = _PyCode_CODE(_PyFrame_GetCode(frame));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment in ceval_macros.c -- perhaps this should also be a macro?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, frame->prev_instr is set during frame creation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we want to reuse this code for SEND we don't want to be messing with frame->prev_instr

Copy link
Member Author

@brandtbucher brandtbucher Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, frame->prev_instr is set during frame creation.

It's set to an index of -1 (since the frame hasn't started yet), which is invalid. So we either need to set it like this, or at least do an increment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review interpreter-core (Objects, Python, Grammar, and Parser dirs) skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants