Add benchmarks for C function call overhead of various alternatives #370

mdboom · 2022-05-05T19:05:10Z

It might be useful to benchmark the overhead of making C extension function calls using some of the popular approaches.

There is already a set of benchmarks available, with details in this blog post. It checks the function call overhead of the following:

extension: Python/C API with PyArg_ParseTuple
extension_manunpack: Python/C API with PyArg_UnpackTuple
cython
swig
boost
ctypes

I hacked around with this during the PyCon sprints, and made my own fork of these benchmarks that:

Adds error checking to extension_manunpack (seems like an unfair advantage without it)
Adds a benchmark for the METH_FASTCALL API.
Uses the current Cython git main to have something that works with py3.11
Uses more iterations and all of pyperf's advice for stable benchmarking.

With that, there are some interesting results:

The benchmarks just call a C extension function 5,000,000 time in a Python loop, so for many of these, the speedup is most likely just to improved loop performance. Maybe we want to control for that.
SWIG adds a Python wrapper around everything, so it's not surprising it sees the most significant speedup in 3.11
There seems to be a regression in ctypes, which is maybe concerning...?

The text was updated successfully, but these errors were encountered:

FreddieWitherden · 2022-05-05T21:50:05Z

Interesting results! One thing to note on the ctypes side is that, at least in my experience, explicitly setting the argument/return types is associated with a very real degree of overhead in lieu of letting ctypes deduce everything on the fly. It might be interesting to include this in the benchmark also.

mdboom · 2022-05-05T21:56:09Z

Great callout, @FreddieWitherden. In this case, this is letting ctypes deduce everything on the fly, but I agree, the manual typing approach should be benchmarked as well.

mdboom · 2022-05-06T13:40:08Z

With the ctypes performance regression in python/cpython#92356 fixed, and a new benchmark added for "ctypes with specified argument types" the results are now:

FreddieWitherden · 2022-05-06T14:31:42Z

Great to see things with ctypes are now faster with 3.11 than 3.10!

In terms of the argument spec, my understanding is that the overhead comes from the fact we end up calling one function (usually a Python function) per argument to perform the coercion. Overhead therefore scales with ~ nargs. Given many arguments are 'simple' (integers, floats, pointers to structures etc.) it should be possible to inline and special-case most of the standard types (https://github.com/python/cpython/blob/main/Modules/_ctypes/callproc.c#L1191 is the source of the overhead).

wjakob · 2022-05-06T14:43:43Z

Nanobind also contains a few benchmarks of this type: https://github.com/wjakob/nanobind (It uses vectorcalls internally, which are unfortunately not part of the stable API)

gvanrossum · 2022-05-06T14:59:55Z

In terms of the argument spec, my understanding is that the overhead comes from the fact we end up calling one function (usually a Python function) per argument to perform the coercion. Overhead therefore scales with ~ nargs.

Can you create a new issue to discuss speeding up ctypes?

Fidget-Spinner · 2022-05-08T15:17:14Z

@mdboom mixing some different argument/call forms into the bench might be interesting too. Looking at your benchmarks, they're all fixed argument counts. IIRC, the 3.11 specializer doesn't optimize for a few call types. So C function calls with *args/**kwargs function may have suffered slightly. How common these call signatures are, I have no clue.

markshannon · 2022-05-09T10:25:11Z

@wjakob Vectorcall is explicitly designed to be usable by third-part extensions, so can be considered part of the stable API (and ABI), https://peps.python.org/pep-0590

mdboom moved this from Todo to In Progress in Fancy CPython Board May 5, 2022

mdboom self-assigned this May 5, 2022

mdboom mentioned this issue May 5, 2022

3.11: Performance regression in ctypes python/cpython#92356

Closed

FreddieWitherden mentioned this issue May 6, 2022

Extra overhead in ctypes when setting argtypes #372

Closed

mdboom mentioned this issue May 10, 2022

Add benchmarks for ctypes function call overhead python/pyperformance#197

Open

mdboom added the benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc. label Aug 2, 2022

mdboom moved this from In Progress to Todo in Fancy CPython Board Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for C function call overhead of various alternatives #370

Add benchmarks for C function call overhead of various alternatives #370

mdboom commented May 5, 2022 •

edited

Loading

FreddieWitherden commented May 5, 2022

mdboom commented May 5, 2022

mdboom commented May 6, 2022

FreddieWitherden commented May 6, 2022

wjakob commented May 6, 2022

gvanrossum commented May 6, 2022

Fidget-Spinner commented May 8, 2022 •

edited

Loading

markshannon commented May 9, 2022

Add benchmarks for C function call overhead of various alternatives #370

Add benchmarks for C function call overhead of various alternatives #370

Comments

mdboom commented May 5, 2022 • edited Loading

FreddieWitherden commented May 5, 2022

mdboom commented May 5, 2022

mdboom commented May 6, 2022

FreddieWitherden commented May 6, 2022

wjakob commented May 6, 2022

gvanrossum commented May 6, 2022

Fidget-Spinner commented May 8, 2022 • edited Loading

markshannon commented May 9, 2022

mdboom commented May 5, 2022 •

edited

Loading

Fidget-Spinner commented May 8, 2022 •

edited

Loading