Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for C function call overhead of various alternatives #370

Open
mdboom opened this issue May 5, 2022 · 8 comments
Open

Add benchmarks for C function call overhead of various alternatives #370

mdboom opened this issue May 5, 2022 · 8 comments
Assignees
Labels
benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc.

Comments

@mdboom
Copy link
Contributor

mdboom commented May 5, 2022

It might be useful to benchmark the overhead of making C extension function calls using some of the popular approaches.

There is already a set of benchmarks available, with details in this blog post. It checks the function call overhead of the following:

  • extension: Python/C API with PyArg_ParseTuple
  • extension_manunpack: Python/C API with PyArg_UnpackTuple
  • cython
  • swig
  • boost
  • ctypes

I hacked around with this during the PyCon sprints, and made my own fork of these benchmarks that:

  • Adds error checking to extension_manunpack (seems like an unfair advantage without it)
  • Adds a benchmark for the METH_FASTCALL API.
  • Uses the current Cython git main to have something that works with py3.11
  • Uses more iterations and all of pyperf's advice for stable benchmarking.

With that, there are some interesting results:

image

  1. The benchmarks just call a C extension function 5,000,000 time in a Python loop, so for many of these, the speedup is most likely just to improved loop performance. Maybe we want to control for that.
  2. SWIG adds a Python wrapper around everything, so it's not surprising it sees the most significant speedup in 3.11
  3. There seems to be a regression in ctypes, which is maybe concerning...?
@mdboom mdboom moved this from Todo to In Progress in Fancy CPython Board May 5, 2022
@mdboom mdboom self-assigned this May 5, 2022
@FreddieWitherden
Copy link

Interesting results! One thing to note on the ctypes side is that, at least in my experience, explicitly setting the argument/return types is associated with a very real degree of overhead in lieu of letting ctypes deduce everything on the fly. It might be interesting to include this in the benchmark also.

@mdboom
Copy link
Contributor Author

mdboom commented May 5, 2022

Great callout, @FreddieWitherden. In this case, this is letting ctypes deduce everything on the fly, but I agree, the manual typing approach should be benchmarked as well.

@mdboom
Copy link
Contributor Author

mdboom commented May 6, 2022

With the ctypes performance regression in python/cpython#92356 fixed, and a new benchmark added for "ctypes with specified argument types" the results are now:

image

@FreddieWitherden
Copy link

Great to see things with ctypes are now faster with 3.11 than 3.10!

In terms of the argument spec, my understanding is that the overhead comes from the fact we end up calling one function (usually a Python function) per argument to perform the coercion. Overhead therefore scales with ~ nargs. Given many arguments are 'simple' (integers, floats, pointers to structures etc.) it should be possible to inline and special-case most of the standard types (https://github.com/python/cpython/blob/main/Modules/_ctypes/callproc.c#L1191 is the source of the overhead).

@wjakob
Copy link

wjakob commented May 6, 2022

Nanobind also contains a few benchmarks of this type: https://github.com/wjakob/nanobind (It uses vectorcalls internally, which are unfortunately not part of the stable API)

@gvanrossum
Copy link
Collaborator

In terms of the argument spec, my understanding is that the overhead comes from the fact we end up calling one function (usually a Python function) per argument to perform the coercion. Overhead therefore scales with ~ nargs.

Can you create a new issue to discuss speeding up ctypes?

@Fidget-Spinner
Copy link
Collaborator

Fidget-Spinner commented May 8, 2022

@mdboom mixing some different argument/call forms into the bench might be interesting too. Looking at your benchmarks, they're all fixed argument counts. IIRC, the 3.11 specializer doesn't optimize for a few call types. So C function calls with *args/**kwargs function may have suffered slightly. How common these call signatures are, I have no clue.

@markshannon
Copy link
Member

@wjakob Vectorcall is explicitly designed to be usable by third-part extensions, so can be considered part of the stable API (and ABI), https://peps.python.org/pep-0590

@mdboom mdboom added the benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc. label Aug 2, 2022
@mdboom mdboom moved this from In Progress to Todo in Fancy CPython Board Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Anything related to measurement: Adding new benchmarks, benchmarking infrastructure etc.
Projects
Development

No branches or pull requests

6 participants