Allow custom quickening of calls #432

mdboom · 2022-07-26T14:38:15Z

Discussed in #428

^{Originally posted by mdboom July 15, 2022}
I'm starting to wonder what it would take to extend adaptive quickening to a deeply-polymorphic data structure library such as Numpy. Some of this started in discussions with @markshannon so I won't take all the credit/blame :)

In some sense, this is a further generalization of the original idea proposed in Allow custom BINARY_OP.

Unlike that proposal, which is still based on Python types, a general "dispatch to specialized function calls based on the Python types of the arguments" wouldn't be very helpful for Numpy. Numpy almost uses a single Python type ndarray, and most function calls dispatch to a specific C function implementation based on the ndarray's dtype, (C-like numeric type, but could also be a struct), stride and shape. This mapping is very specific, and something only Numpy can really address, but it has similar properties in that for the same call site, it would be typical to see the same dtype/stride/shape triple appearing over and over again, so calculating how to perform the computation each time is redundant work.

It's probably not worth discussing implementations yet, however here's my hand-wavy assumptions about what might be possible:

We introduce a new PyMethodDef type called METH_SPECIALIZABLE, which has extra parameters to handle the a new quickening API. There would also need to be some metadata to specify how many cache entries the method needs so space could be allocated in the bytecode.
When the CALL_ADAPTIVE bytecode specializes and it finds a METH_SPECIALIZABLE callable, it passes an argument to the method to tell it to also specialize itself based on the current arguments. The method could fill in a buffer of cache entries, one of which, in most cases, would be a function pointer to a more specialized call.
Subsequent calls to the method would go directly to the more specialized path. There would also be an API to unquicken if the type assumptions no longer hold.

Opportunity sizing:

I've run one simple experiment to see if this would be worth the effort. I hacked Numpy so that the results of all of the delegation within the np.multiply function are "cached" to a static variable after 8 calls, and from then on, only the core computation is performed each time (see my HACK-cache branch). Then I timed how long it takes to multiply two arrays of different sizes together:

timeit.timeit("np.multiply(x, y)", f"import numpy as np; x = np.ones({size}) * 2; y = np.ones({size}) * 4", number=int(1_000))

This gives us a sort of "best case scenario" if the scheme outlined above is even possible. This obviously ignores the overhead of the specialization / unspecialization checks.

Array size	Numpy upstream	Caching fork	speedup
1	0.00064420	0.00026666	2.42x
10	0.00064755	0.00027479	2.36x
100	0.00073137	0.00034967	2.09x
1000	0.00147700	0.00102378	1.44x
10000	0.00908844	0.00855760	1.06x
100000	0.10119196	0.09983327	1.01x
1000000	1.16631011	1.34759164	0.87x
10000000	22.09005169	15.67308645	1.41x

This seems really promising for small arrays where the dispatch calculation dominates. These are an important class of operations -- "scalar" arrays are really common in real-world code. It certainly would be reasonable for Numpy to not quicken when the arrays are larger than a certain threshold.

The next experiment I hope to run is to take some real-world benchmarks and see how often call sites are non-polymorphic as a way to estimate how often we could expect specific call sites to be quickened.

The text was updated successfully, but these errors were encountered:

mdboom closed this as completed Jul 26, 2022

mdboom mentioned this issue Sep 18, 2024

Cross Module Quickening Paper #696

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom quickening of calls #432

Allow custom quickening of calls #432

mdboom commented Jul 26, 2022

Allow custom quickening of calls #432

Allow custom quickening of calls #432

Comments

mdboom commented Jul 26, 2022

Discussed in #428