Skip to content

Don't inline slow path functions in the interpreter loop #132336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mpage opened this issue Apr 9, 2025 · 5 comments
Open

Don't inline slow path functions in the interpreter loop #132336

mpage opened this issue Apr 9, 2025 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement

Comments

@mpage
Copy link
Contributor

mpage commented Apr 9, 2025

Feature or enhancement

Proposal:

I was looking at the IPA inlining dump as part of investigating the root cause of gh-132295 and noticed that we're inlining a number of slow path functions related to specialization, instrumentation, and error handling in the interpreter loop when performing LTO (full list here):

_Py_Specialize_CallKw
_Py_Specialize_CompareOp
_Py_Specialize_ContainsOp
_Py_Specialize_ForIter
_Py_Specialize_LoadGlobal
_Py_Specialize_LoadSuperAttr
_Py_Specialize_Send
_Py_Specialize_StoreAttr
_Py_Specialize_StoreSubscr
_Py_Specialize_ToBool
_Py_Specialize_UnpackSequence
_Py_call_instrumentation
_Py_call_instrumentation_instruction
_Py_call_instrumentation_line
get_exception_handler

I suspect these are being inlined because we use unit tests for collecting PGO data. These should be executed relatively infrequently for "normal" Python code. We should mark them as noinline.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

@mpage mpage added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Apr 9, 2025
Yhg1s pushed a commit that referenced this issue Apr 10, 2025
…oop as noinline (#132337)

Mark a few functions used by the interpreter loop as noinline

These are all the slow path and should not be inlined into the interpreter
loop. Unfortunately, they end up being inlined with LTO and the current PGO
task.
@thesamesam
Copy link
Contributor

The cold attribute may be a fit here.

@Yhg1s
Copy link
Member

Yhg1s commented Apr 10, 2025

As I understand it, cold is ignored when using PGO, at least in GCC, so I don't think that would have the desired effect.

@thesamesam
Copy link
Contributor

Ah, you're right:

When profile feedback is available, via -fprofile-use, cold functions are automatically detected and this attribute is ignored.

@markshannon
Copy link
Member

What we do or do not want to inline depends on whether it is a tail-calling build or not.

For the tailcalling, many of the above function are on the hot path of the instruction they are part of.
For example, get_exception_handler is only called in one place, it is on the hot/only path of exception_unwind. The instrumentation functions in the INSTRUMENTED instruction, likewise.

For the normal build, the size of PyEval_EvalDefault is the main concern and maybe we should be choosing functions based on size, as much as whether they are cold.

Maybe using better profiles is the way to do this?

@picnixz picnixz added the performance Performance or resource usage label Apr 11, 2025
@mpage
Copy link
Contributor Author

mpage commented Apr 11, 2025

What we do or do not want to inline depends on whether it is a tail-calling build or not.

This is a good point.

Maybe using better profiles is the way to do this?

I think we should do both. For the non-tail-calling interpreter we should mark the specialization functions as noinline, since that's more reliable than trying to coax a black box into doing what we want. I also think we should continue marking the instrumentation and exception handling functions as noinline since we know that those should be rare / exceptional paths.

seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…eter loop as noinline (python#132337)

Mark a few functions used by the interpreter loop as noinline

These are all the slow path and should not be inlined into the interpreter
loop. Unfortunately, they end up being inlined with LTO and the current PGO
task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants