bpo-46939: Specialize calls to Python classes #31707

Fidget-Spinner · 2022-03-06T13:51:27Z

CALL_X supporting __init__ inlining:

CALL_PY_EXACT_ARGS
CALL_PY_WITH_DEFAULTS
CALL

Micro benchmark using pyperf (Windows, PGO):

-m pyperf timeit -s "
>> class A:
>>  def __init__(self, a):
>>   self.a =a
>> " "A(1)" -o ..\pyperf-output\spec_py_class.json

Mean +- std dev: [spec_py_class_main] 159 ns +- 5 ns -> [spec_py_class] 106 ns +- 3 ns: 1.51x faster

https://bugs.python.org/issue46939

Fidget-Spinner · 2022-03-06T13:53:20Z

I nearly went insane writing and debugging this. The general idea follows Mark's comments in faster-cpython/ideas#267 (comment). However, since we're specializing __init__, we need to return self instead of the None that __init__ normally returns. This required a significant amount of hackery.

Python/ceval.c

Fidget-Spinner

I've added some comments on things I've tested against and had to debug. For reviewers in case some parts aren't so clear.

Fidget-Spinner · 2022-03-06T16:26:16Z

Python/ceval.c

@@ -5617,6 +5688,7 @@ MISS_WITH_OPARG_COUNTER(STORE_SUBSCR)

 error:
        call_shape.kwnames = NULL;
+        call_shape.init_pass_self = false;


Note to reviewers: We don't set frame->self = NULL here because that means exceptions will destroy self. E.g. consider this:

class A: def __init__(self): try: A.a # Kaboom! except AttributeError: pass for _ in range(10): print(A())

Fidget-Spinner · 2022-03-06T16:28:04Z

Include/internal/pycore_frame.h

@@ -118,6 +119,7 @@ _PyFrame_InitializeSpecials(
    frame->f_state = FRAME_CREATED;
    frame->is_entry = false;
    frame->is_generator = false;
+    frame->self = NULL;


Note to reviewers: tied to frame state instead of some cache/call_shape so that subsequent nested calls don't destroy self (and we can identify which frame the self belongs to). Consider the following code:

class Tokenizer: def __init__(self): self.__next() # Kaboom! def __next(self): pass for _ in range(10): print(Tokenizer())

Python/ceval.c

brandtbucher · 2022-03-06T20:15:31Z

Nice! Haven't had a chance to review this yet (I can probably get to it tomorrow, though).

Unfortunately, I have another, larger PR open that heavily conflicts with this one. Is it okay if this waits until #31709 is merged? The caching in this PR will need to be reworked a bit to use the new inline caching mechanism, but it shouldn't be too difficult.

Python/ceval.c

markshannon · 2022-03-07T11:44:00Z

Python/ceval.c

+            DEOPT_IF(cls_t->tp_new != PyBaseObject_Type.tp_new, PRECALL);
+            STAT_INC(PRECALL, hit);
+
+            PyObject *self = _PyObject_New_Vector(cls_t, &PEEK(original_oparg),


If the specializer only specialized for classes that don't override __new__, you can avoid calling __new__ and just construct the object directly.

_PyObject_New_Vector does some argument checking. https://github.com/python/cpython/pull/31707/files#diff-1decebeef15f4e0b0ce106c665751ec55068d4d1d1825847925ad4f528b5b872R4525

Come to think of it, if we verify that the argcount is right at specialization time, do we need to re-verify at runtime? Would it be safe to call tp_alloc directly? It seems that the only thing that could change is the kwds dict, but even then that's only used for argument count checking again.

Python/ceval.c

markshannon · 2022-03-07T12:01:54Z

This feels a bit fragile to me, but it is an interesting alternative to my approach of pushing an additional cleanup frame (https://github.com/python/cpython/compare/main...faster-cpython:specialize-calls-to-normal-python-classes?expand=1)

This PR should be faster for calls to Python classes, but the extra frame field will have a cost for calls to Python functions.

Fidget-Spinner · 2022-03-07T13:23:07Z

This feels a bit fragile to me

Indeed, hence why I nearly went insane :). One very easy way to trip over is that all new CALL_PY_* opcodes must also handle this special case, like how the rest are currently doing so. I've added a CALL_PY_FRAME_PASS_SELF() macro so that it's more obvious to people. Most of our test suite will crash (I've tried this) if you forget to add it. Python class creation is an integral part of Python after all.

P.S. Do your benchmarks say anything? I'm really sorry but right now I can't benchmark.

P.P.S. That's an interesting approach that I can't currently wrap my head around. Sorry if I created any duplicate work.

…_calls

Fidget-Spinner · 2022-03-08T15:57:26Z

I've updated the first comment with a micro benchmark using pyperf on Windows with PGO. The results show 50% speedup for class creation over main.

…_calls

Fidget-Spinner · 2022-03-09T15:39:41Z

Pyperformance shows a speedup in object-heavy workloads. 1% speedup on average:

Slower (10):
- fannkuch: 603 ms +- 6 ms -> 621 ms +- 4 ms: 1.03x slower
- sqlite_synth: 3.39 us +- 0.06 us -> 3.49 us +- 0.13 us: 1.03x slower
- unpickle_list: 7.42 us +- 0.15 us -> 7.63 us +- 0.15 us: 1.03x slower
- mako: 17.7 ms +- 0.1 ms -> 18.1 ms +- 0.1 ms: 1.02x slower
- logging_simple: 8.91 us +- 0.12 us -> 9.09 us +- 0.14 us: 1.02x slower
- django_template: 55.7 ms +- 0.7 ms -> 56.8 ms +- 2.0 ms: 1.02x slower
- pickle: 15.9 us +- 0.5 us -> 16.1 us +- 0.2 us: 1.02x slower
- pidigits: 297 ms +- 1 ms -> 302 ms +- 1 ms: 1.01x slower
- logging_format: 10.1 us +- 0.2 us -> 10.2 us +- 0.2 us: 1.01x slower
- unpickle: 19.9 us +- 0.3 us -> 20.1 us +- 0.2 us: 1.01x slower

Faster (20):
- raytrace: 486 ms +- 4 ms -> 448 ms +- 3 ms: 1.08x faster
- chaos: 113 ms +- 1 ms -> 104 ms +- 1 ms: 1.08x faster
- float: 126 ms +- 1 ms -> 119 ms +- 2 ms: 1.06x faster
- scimark_lu: 193 ms +- 6 ms -> 184 ms +- 1 ms: 1.05x faster
- scimark_sor: 191 ms +- 2 ms -> 184 ms +- 2 ms: 1.04x faster
- chameleon: 11.3 ms +- 0.2 ms -> 10.9 ms +- 0.1 ms: 1.03x faster
- deltablue: 6.57 ms +- 0.07 ms -> 6.38 ms +- 0.06 ms: 1.03x faster
- telco: 10.4 ms +- 0.4 ms -> 10.1 ms +- 0.2 ms: 1.03x faster
- dulwich_log: 123 ms +- 1 ms -> 119 ms +- 1 ms: 1.03x faster
- scimark_fft: 538 ms +- 12 ms -> 523 ms +- 3 ms: 1.03x faster
- json_dumps: 19.6 ms +- 0.3 ms -> 19.2 ms +- 0.2 ms: 1.02x faster
- pathlib: 30.9 ms +- 0.4 ms -> 30.1 ms +- 0.5 ms: 1.02x faster
- nbody: 150 ms +- 2 ms -> 147 ms +- 2 ms: 1.02x faster
- tornado_http: 207 ms +- 4 ms -> 203 ms +- 3 ms: 1.02x faster
- go: 232 ms +- 2 ms -> 228 ms +- 2 ms: 1.02x faster
- unpack_sequence: 75.2 ns +- 0.9 ns -> 74.0 ns +- 0.7 ns: 1.02x faster
- pyflate: 711 ms +- 9 ms -> 700 ms +- 6 ms: 1.02x faster
- regex_effbot: 5.32 ms +- 0.06 ms -> 5.24 ms +- 0.06 ms: 1.02x faster
- regex_compile: 224 ms +- 1 ms -> 221 ms +- 1 ms: 1.01x faster
- pickle_list: 7.18 us +- 0.07 us -> 7.10 us +- 0.08 us: 1.01x faster

Benchmark hidden because not significant (27): 2to3, crypto_pyaes, hexiom, html5lib, json_loads, logging_silent, meteor_contest, nqueens, pickle_dict, pickle_pure_python, python_startup, python_startup_no_site, regex_dna, regex_v8, richards, scimark_monte_carlo, scimark_sparse_mat_mult, spectral_norm, sympy_expand, sympy_integrate, sympy_sum, sympy_str, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

…ize_py_class_calls

brandtbucher · 2022-11-18T21:16:48Z

Should this be closed in favor of #99331?

Fidget-Spinner · 2022-11-19T06:40:03Z

I'll close it when that one's merged

Specialize calls to Python classes

43112e0

Fidget-Spinner requested a review from markshannon as a code owner March 6, 2022 13:51

bedevere-bot added the awaiting core review label Mar 6, 2022

the-knights-who-say-ni added the CLA signed label Mar 6, 2022

Add news

09a7180

Fidget-Spinner commented Mar 6, 2022

View reviewed changes

Python/ceval.c Show resolved Hide resolved

Fix segfaults, refleaks, readjust sys tests size

d74d294

Fidget-Spinner commented Mar 6, 2022

View reviewed changes

Fidget-Spinner added 2 commits March 7, 2022 01:09

fix refleak on frame exit due to exception

0583f6b

Use non-function pointer since it's faster

bb78e9c

JelleZijlstra reviewed Mar 6, 2022

View reviewed changes

Python/ceval.c Outdated Show resolved Hide resolved

brandtbucher self-requested a review March 7, 2022 00:49

Address Jelle's review (use vectorcall for new)

b570e92

markshannon reviewed Mar 7, 2022

View reviewed changes

Python/ceval.c Outdated Show resolved Hide resolved

markshannon reviewed Mar 7, 2022

View reviewed changes

Python/ceval.c Outdated Show resolved Hide resolved

Fidget-Spinner added 2 commits March 7, 2022 21:15

Address Mark's reviews (remove func version check)

efad70f

Use a macro for passing self

4d6a06b

Fidget-Spinner added 4 commits March 8, 2022 17:18

Merge remote-tracking branch 'upstream/main' into specialize_py_class…

5b526af

…_calls

Use inline caching

de3a406

Regenerate frozenmain

30a0659

Fix test_dis

db09eef

Fidget-Spinner added 2 commits March 9, 2022 00:02

Merge remote-tracking branch 'upstream/main' into specialize_py_class…

0d7f59e

…_calls

Drop generators from specialization

28f9dbb

markshannon mentioned this pull request Mar 9, 2022

Better specialization of calls, post introduction of PRECALL. faster-cpython/ideas#267

Closed

Fidget-Spinner added 2 commits March 12, 2022 11:30

Merge branch 'main' of https://github.com/python/cpython into special…

d11362f

…ize_py_class_calls

Use _Py_SET_OPCODE

c04f5da

ezio-melotti removed the CLA signed label Jul 13, 2022

Fidget-Spinner mentioned this pull request Apr 10, 2022

Specialize calls for Python classes #91095

Closed

brandtbucher removed their request for review November 18, 2022 21:17

markshannon closed this Jun 22, 2023

Uh oh!

bpo-46939: Specialize calls to Python classes #31707

bpo-46939: Specialize calls to Python classes #31707

Uh oh!

Conversation

Fidget-Spinner commented Mar 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner commented Mar 6, 2022

Uh oh!

Uh oh!

Fidget-Spinner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Mar 6, 2022

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Mar 6, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brandtbucher commented Mar 6, 2022

Uh oh!

Uh oh!

markshannon Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Mar 7, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markshannon commented Mar 7, 2022

Uh oh!

Fidget-Spinner commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner commented Mar 9, 2022

Uh oh!

brandtbucher commented Nov 18, 2022

Uh oh!

Fidget-Spinner commented Nov 19, 2022

Uh oh!

Uh oh!

Fidget-Spinner commented Mar 6, 2022 •

edited

Loading

Fidget-Spinner left a comment •

edited

Loading

Fidget-Spinner commented Mar 7, 2022 •

edited

Loading

Fidget-Spinner commented Mar 8, 2022 •

edited

Loading