-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance for switch in ceval.c when using MSVC #91719
Comments
Apparently a switch on an 8-bit quantity where all cases are present generates a more efficient jump (doing only one indexed memory load instead of two). So we make opcode and use_tracing uint8_t, and generate a macro full of extra `case NNN:` lines for all unused opcodes. See faster-cpython/ideas#321 (comment)
Fixed by #91718. |
Move the following API from Include/opcode.h (public C API) to a new Include/internal/pycore_opcode.h header file (internal C API): * EXTRA_CASES * _PyOpcode_Caches * _PyOpcode_Deopt * _PyOpcode_Jump * _PyOpcode_OpName * _PyOpcode_RelativeJump
[3.12] REPORT: Efficient Details of the switch in e48ac9c (release build)
Example changes of
As for performance advantage over loading memory twice, the examples above has little for me. Each of the following was faster on PGO:
|
Thanks for the detailed and clear report. It sounds like in the long term we would be okay without depending on the
I'm not sure how come any of those would make a difference given that we never execute the default case. Or am I not following you? |
Current jump behaviors using Equivalent to if-else with 6 cases
Single memory load with 7 to 63 cases
Double memory load with 64 to 181 cases
Single memory load with 182 or more cases
Another behavior without MSVC seems to check whether existing cases are continuous or not, rather than their count. The following shows an extra memory load:
|
Do you want to reopen this, or create a new issue? |
The faster-cpython site would be better for more information. |
… the dispatching in ceval.c (pythonGH-94364) (cherry picked from commit ea39b77) Co-authored-by: neonene <53406459+neonene@users.noreply.github.com>
… the dispatching in ceval.c (python#94364)
We've received reports of MSVC not generating optimal code, e.g. gh-89279.
One possible improvement would be to get the big switch statement in ceval.c to generate better code. It's been rumored that MSVC will generate essentially a computed goto if all cases are filled.
See my investigations at faster-cpython/ideas#321 (comment)
The text was updated successfully, but these errors were encountered: