-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmarks for C function call overhead of various alternatives #370
Comments
Interesting results! One thing to note on the |
Great callout, @FreddieWitherden. In this case, this is letting ctypes deduce everything on the fly, but I agree, the manual typing approach should be benchmarked as well. |
With the ctypes performance regression in python/cpython#92356 fixed, and a new benchmark added for "ctypes with specified argument types" the results are now: |
Great to see things with ctypes are now faster with 3.11 than 3.10! In terms of the argument spec, my understanding is that the overhead comes from the fact we end up calling one function (usually a Python function) per argument to perform the coercion. Overhead therefore scales with ~ nargs. Given many arguments are 'simple' (integers, floats, pointers to structures etc.) it should be possible to inline and special-case most of the standard types (https://github.com/python/cpython/blob/main/Modules/_ctypes/callproc.c#L1191 is the source of the overhead). |
Nanobind also contains a few benchmarks of this type: https://github.com/wjakob/nanobind (It uses vectorcalls internally, which are unfortunately not part of the stable API) |
Can you create a new issue to discuss speeding up ctypes? |
@mdboom mixing some different argument/call forms into the bench might be interesting too. Looking at your benchmarks, they're all fixed argument counts. IIRC, the 3.11 specializer doesn't optimize for a few call types. So C function calls with |
@wjakob Vectorcall is explicitly designed to be usable by third-part extensions, so can be considered part of the stable API (and ABI), https://peps.python.org/pep-0590 |
It might be useful to benchmark the overhead of making C extension function calls using some of the popular approaches.
There is already a set of benchmarks available, with details in this blog post. It checks the function call overhead of the following:
extension
: Python/C API withPyArg_ParseTuple
extension_manunpack
: Python/C API withPyArg_UnpackTuple
cython
swig
boost
ctypes
I hacked around with this during the PyCon sprints, and made my own fork of these benchmarks that:
extension_manunpack
(seems like an unfair advantage without it)METH_FASTCALL
API.With that, there are some interesting results:
The text was updated successfully, but these errors were encountered: