Skip to content

[FEAT] Should include more comprehensive benchmarking (primarily performance) #2760

Open
@EricCousineau-TRI

Description

@EricCousineau-TRI

Motivation

At present (b7dfe5c), pybind11 proper only benchmarks compile-time and and artifact size for one given test setup (which tests arguments, simple inheritance, but that's about it, I think); the results of which can be seen here:
https://pybind11.readthedocs.io/en/stable/benchmark.html
https://github.com/pybind/pybind11/blob/v2.6.1/docs/benchmark.rst

However, it may difficult to objectively and concretely judge the performance impact of a PR, and weigh that against the value of the feature / issue resolution. Generally, benchmarking is done on an ad-hoc basis (totes works, but may make it difficult for less creative people like myself ;)

Primary motivating issues / PRs:

Secondary:

Fuzzy Scoping + Steps

  1. Establish a baseline benchmarking setup that touches on the core "hotpath" features for performance. Only track metrics for a given version of pybind11 (out of scope: other binding approaches)
  • Initialization time (e.g. dlopen - ish stuff, pybind11 internals upstart, binding registration, ...)
  • Run time (function calls, type conversions, casting, etc.)
  • Comparison "axes": later CPython versions, pybind11 versions / PR branches / forks
  • OS: For now, just Linux ('cause that's all I use ;)
  1. Add additional metrics (e.g. memory leaks / usage, do a redux on compile-time + size)
  2. Ideally, provide guidance on what pybind11 finds the most important (how to weigh compile-time, size, speed, memory, etc.)
  3. (Stretch) Possibly compare against other approaches (Boost.Python, Cython, SWIG, cppyy / Julia / LLVM-esque stuff, etc.)

Given that performance benchmarks can be a P.I.T.A. (e.g. how to OS + interrupts, hardware capacity / abstractions, blah blah), ideally decisions should be made about relative performance on the same machine. Ideally, we should also publish some metrics for a given config to give people a "feel" for the performance, as was done for compile time.

Suggested Solution Artifacts

  • Identify code repository
    • Perhaps github.com/pybind/pybind-benchmarks ?
    • Alt.: In-tree (which may make it hard to compare across versions...)
  • Identify good tooling for performance benchmarking

@wjakob @rwgk @rhaschke @YannickJadoul @bstaletic @henryiii @ax3l
Can I ask what y'all think? Is this redundant w.r.t. what we already have?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions