-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
The performance of array reductions in nanops/bottleneck
can be significantly improved upon for large data using numba
. The improvements are due to two factors:
- single-pass algorithms when null values are present and avoiding any copies.
- multi-threading over chunked of array or over an axis in a single axis reduction.
This screenshot demonstrates a potential 4x improvement on a DataFrame
of 10-million rows and 5 columns of various types.

I am running the code on a features branch, and all unit tests for the feature branch are passing locally.
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation
The hardware is a new MacBook Pro with 8 cores.
The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows).
The caveat is that all JIT-compilation is already completed.
I have carried out a more comprehensive performance comparison and these results hold up.
Similarly to bottleneck
, these codepaths can be toggled on and off.