Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: limited call inlining #116257

Closed
wants to merge 12 commits into from
Closed

Conversation

gvanrossum
Copy link
Member

Hey @Fidget-Spinner, I had a rainy day, so here's a very crude implementation of my idea for call inlining without frame reconstruction (only for some very limited cases). I learned a bunch of stuff -- our call sequence is rather verbose, and there are rare cases where self_of_null may unexpectedly be non-NULL.

I hit an example of the latter, which crashed before I added _GUARD_NOT_METHOD as a hack, but something there feels shady. (I think this is the case where the abstract interpreter the check if (sym_is_null(self_or_null) || sym_is_not_null(self_or_null)) in the code block for _INIT_CALL_PY_EXACT_ARGS doesn't succeed. Maybe I should move the inlining code into the abstract interpreter, so it can use this same check?)

Anyway, what do you think?

@gvanrossum gvanrossum marked this pull request as draft March 3, 2024 01:09
@gvanrossum gvanrossum marked this pull request as ready for review March 3, 2024 01:19
@gvanrossum gvanrossum requested review from CAM-Gerlach and removed request for CAM-Gerlach March 3, 2024 01:19
@gvanrossum gvanrossum marked this pull request as draft March 3, 2024 01:19
@gvanrossum
Copy link
Member Author

So here's one case. We have a trace that initially contains this sequence:

  47 ADD_TO_TRACE: _GUARD_TYPE_VERSION (23, target=164, operand=2022a)
  48 ADD_TO_TRACE: _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT (23, target=164, operand=0)
  49 ADD_TO_TRACE: _GUARD_KEYS_VERSION (23, target=164, operand=9c)
  50 ADD_TO_TRACE: _LOAD_ATTR_METHOD_WITH_VALUES (23, target=164, operand=101c7f590)
  51 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=174, operand=135fec404)
  52 ADD_TO_TRACE: _LOAD_FAST (0, target=174, operand=0)
  53 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=175, operand=135fec406)
  54 ADD_TO_TRACE: _LOAD_FAST (3, target=175, operand=0)
  55 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=176, operand=135fec408)
  56 ADD_TO_TRACE: _LOAD_FAST (5, target=176, operand=0)
  57 ADD_TO_TRACE: _LOAD_FAST (6, target=176, operand=0)
  58 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=177, operand=135fec40a)
  59 ADD_TO_TRACE: _CHECK_PEP_523 (4, target=177, operand=0)
  60 ADD_TO_TRACE: _CHECK_FUNCTION_EXACT_ARGS (4, target=177, operand=13c7)
  61 ADD_TO_TRACE: _CHECK_STACK_SPACE (4, target=177, operand=0)
  62 ADD_TO_TRACE: _INIT_CALL_PY_EXACT_ARGS (4, target=177, operand=0)
  63 ADD_TO_TRACE: _SAVE_RETURN_OFFSET (4, target=177, operand=0)
  64 ADD_TO_TRACE: _PUSH_FRAME (4, target=177, operand=0)
Continuing in Interpolation.before_read (/Users/guido/cpython/Lib/configparser.py:340) at byte offset 0
  65 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=0, operand=101c721d8)
  66 ADD_TO_TRACE: _RESUME_CHECK (0, target=0, operand=0)
  67 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=1, operand=101c721da)
  68 ADD_TO_TRACE: _LOAD_FAST (4, target=1, operand=0)
  69 ADD_TO_TRACE: _CHECK_VALIDITY_AND_SET_IP (0, target=2, operand=101c721dc)
  70 ADD_TO_TRACE: _POP_FRAME (0, target=2, operand=0)
Returning to RawConfigParser._join_multiline_values (/Users/guido/cpython/Lib/configparser.py:1048) at byte offset 362

Note the _LOAD_ATTR_METHOD_WITH_VALUES which loads a bound method and unpacks it into (callable, self). (But this is not the only way we can end up in this state.)

After abstract interpretation it looks like this (I think that's unchanged -- still 47-70):

  47 INL: _GUARD_TYPE_VERSION (23, target=164, operand=2022a)
  48 INL: _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT (23, target=164, operand=0)
  49 INL: _GUARD_KEYS_VERSION (23, target=164, operand=9c)
  50 INL: _LOAD_ATTR_METHOD_WITH_VALUES (23, target=164, operand=101c7f590)
  51 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=174, operand=135fec404)
  52 INL: _LOAD_FAST (0, target=174, operand=0)
  53 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=175, operand=135fec406)
  54 INL: _LOAD_FAST (3, target=175, operand=0)
  55 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=176, operand=135fec408)
  56 INL: _LOAD_FAST (5, target=176, operand=0)
  57 INL: _LOAD_FAST (6, target=176, operand=0)
  58 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=177, operand=135fec40a)
  59 INL: _NOP (4, target=177, operand=0)
  60 INL: _CHECK_FUNCTION_EXACT_ARGS (4, target=177, operand=13c7)
  61 INL: _CHECK_STACK_SPACE (4, target=177, operand=0)
  62 INL: _INIT_CALL_PY_EXACT_ARGS (4, target=177, operand=0)
  63 INL: _SAVE_RETURN_OFFSET (4, target=177, operand=0)
  64 INL: _PUSH_FRAME (4, target=177, operand=101c7f590)
  65 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=0, operand=101c721d8)
  66 INL: _RESUME_CHECK (0, target=0, operand=0)
  67 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=1, operand=101c721da)
  68 INL: _LOAD_FAST (4, target=1, operand=0)
  69 INL: _CHECK_VALIDITY_AND_SET_IP (0, target=2, operand=101c721dc)
  70 INL: _POP_FRAME (0, target=2, operand=101ccd550)

Then my call inliner goes to work, and then the peepholer (last). The final optimized code now has this sequence:

  39 OPTIMIZED: _GUARD_TYPE_VERSION (23, target=5, operand=2022a)
  40 OPTIMIZED: _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT (23, target=164, operand=0)
  41 OPTIMIZED: _GUARD_KEYS_VERSION (23, target=164, operand=9c)
  42 OPTIMIZED: _LOAD_ATTR_METHOD_WITH_VALUES (23, target=164, operand=101c7f590)
  43 OPTIMIZED: _CHECK_VALIDITY (0, target=174, operand=135fec404)
  44 OPTIMIZED: _LOAD_FAST_0 (0, target=174, operand=0)
  45 OPTIMIZED: _LOAD_FAST_3 (3, target=175, operand=0)
  46 OPTIMIZED: _LOAD_FAST_5 (5, target=176, operand=0)
  47 OPTIMIZED: _LOAD_FAST_6 (6, target=176, operand=0)
  48 OPTIMIZED: _CHECK_FUNCTION_EXACT_ARGS (4, target=177, operand=13c7)
  49 OPTIMIZED: _CHECK_STACK_SPACE (4, target=177, operand=0)
  50 OPTIMIZED: _GUARD_NOT_METHOD (4, target=177, operand=0)
  51 OPTIMIZED: _COPY (0, target=1, operand=0)
  52 OPTIMIZED: _ADJUST_STUFF (4, target=2, operand=101ccd550)

Note that at 51 we have _COPY (0), which triggers an assert. The _GUARD_NOT_METHOD at 50 prevents that from happening by deoptimizing. Always, in this case, because the _LOAD_ATTR_METHOD_WITH_VALUES makes this deterministic, but I can't look that far back -- if I were doing this as part of the abstract interpreter, I could. There are also cases where it is not deterministic -- if I could detect those, I could just not inline the call.

All in all it looks like I should integrate this with the abstract interpreter, but I don't think I'll have time this weekend.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Mar 3, 2024

In my call inlining design, I have one pass before the abstract interpreter that marks the leaf frames as inlineable, (see _PUSH_FRAME_INLINEABLE's occurences and function_decide_inlineable, then the call inlining is just the instruction _PUSH_FRAME_INLINEABLE in the abstract interpreter. In your case, we could also add a requirement that the call body must consist solely of non-escaping instructions and data-only instructions.

Also, how do you ensure I know technically there won't be segfault because you still have _CHECK_STACK_SPACE, but that feels kinda iffy.

Branch: main...Fidget-Spinner:cpython:tier2_inliner

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Mar 3, 2024

Do you mind if I take your stuff and merge it into mine, then send both our work combined as a PR?

Some stuff that the other branch has that this PR doesn't have yet:

  1. COPY might not be the best because it is dependent on the current stack level, which requires calculation throughout the entire lifetime of the function for all its LOAD_FAST.
  2. Handles STORE_FAST.
  3. Handles growing the stack.

Your PR has more tests, and is closer to what we want for 3.13. So I can merge both approaches and get something that works for more cases than just cast.

@gvanrossum
Copy link
Member Author

Sure, go ahead.

@gvanrossum
Copy link
Member Author

Closing in favor of gh-116290.

@gvanrossum gvanrossum closed this Mar 4, 2024
@gvanrossum gvanrossum deleted the inline-calls branch March 7, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants