Callback interface speedup #1497

- Added the `benchmarks` fixture. This runs a set of benchmark tests and uses `criterion` to analyze the results. Added some benchmarks for calling functions and callback interfaces. - Added `run_script()` function, which is a more generic version of `run_test()`. - uniffi_testing: Don't add `--test` when running `cargo build` to find the dylibs. It's not needed for the fixture/example tests and it breaks the benchmark tests.

The main goal here is to make the logic live in regular, non-templated, code. This makes it easier to understand what's going on and hopefully easier to change. One issue with this approach is that we need 3 functions depending on the method return/throws types. I think it's worth it to have clearer code and I also hope to merge these into 1 function. We could use the same move as in PR#1469: define a new `FfiConverter` method like `callback_interface_method_return()` that handles the specifics then have a default implementation, with a specialized implementation for `Result<>` and `()`.

The goal here is to lower the overhead of callback interface calls, specifically targeting things like logging calls which can be called many times in a short timespan. Since this is an ABI change, it's not backwards compatible and therefore requires a breaking version bump before it can be the default. The initial ABI change was fairly minor: the argument RustBuffer is now passed by reference instead of by value. This means the foreign code doesn't need to use an FFI call to destroy it. For Kotlin, and maybe other languages, it also seems to be slightly faster to use a reference because that's less bits that need to be pushed to/popped from the stack. This brought some speedups to python and kotlin, and a minor speedup to Swift that only passes the noise thresholdin the no-args-void-return case: callbacks/python-callbacks-basic time: [37.277 µs 37.299 µs 37.320 µs] change: [-10.692% -10.532% -10.379%] (p = 0.00 < 0.05) Performance has improved. callbacks/kotlin-callbacks-basic time: [15.533 µs 15.801 µs 16.069 µs] change: [-35.654% -31.481% -27.245%] (p = 0.00 < 0.05) Performance has improved. callbacks/swift-callbacks-no-args-void-return time: [454.96 ns 455.64 ns 456.71 ns] change: [-35.872% -35.792% -35.710%] (p = 0.00 < 0.05) Performance has improved.

If a callback interface method has a void return, there's no need to overwrite the value in the output buffer. Rust has already initialized an empty buffer. Also, there's no need to construct a RustBuffer in the first place. Refactored the callback interface code to make this possible without too much spaghettification. This gives us a decent speedup on Python/Kotlin: Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe

This changes things so we send the arguments as a data/len component rather than a RustBuffer struct directly. For some reason this is faster on Python/Kotlin, although it's not clear why. Here's some possibilities: - cytpes and JNA take a performance hit when dealing with structs vs primitive values. - RustBuffer has a third component: capacity, that we don't need. By not sending this component, it reduces the stack size. In any case, this really does speed up Python/Kotlin. Swift doesn't seem to be affected. Benchmarking callbacks/python-callbacks-void-return Benchmarking callbacks/python-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/python-callbacks-void-return: Collecting 100 samples in estimated 10.050 s (813k iterations) Benchmarking callbacks/python-callbacks-void-return: Analyzing callbacks/python-callbacks-void-return time: [12.184 µs 12.219 µs 12.266 µs] change: [-26.774% -26.526% -26.286%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe Benchmarking callbacks/kotlin-callbacks-void-return Benchmarking callbacks/kotlin-callbacks-void-return: Warming up for 3.0000 s Benchmarking callbacks/kotlin-callbacks-void-return: Collecting 100 samples in estimated 10.007 s (1.9M iterations) Benchmarking callbacks/kotlin-callbacks-void-return: Analyzing callbacks/kotlin-callbacks-void-return time: [5.1736 µs 5.2086 µs 5.2430 µs] change: [-32.627% -26.167% -20.263%] (p = 0.00 < 0.05) Performance has improved. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) low severe 6 (6.00%) low mild 1 (1.00%) high mild 3 (3.00%) high severe

Most of these are doc fixes. The one behavior change is that the callback return values were updated. If we're breaking the ABI we might as well make these be a bit more logical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback interface speedup #1497

Callback interface speedup #1497

Commits on Mar 23, 2023

Commits on Mar 24, 2023