ENH: Performance enhancements to nanops using `numba`

The performance of array reductions in `nanops/bottleneck` can be significantly improved upon for large data using `numba`. The improvements are due to two factors: 
 - single-pass algorithms when null values are present and avoiding any copies. 
 - multi-threading over chunked of array or over an axis in a single axis reduction. 

This screenshot demonstrates a potential 4x improvement on a `DataFrame` of 10-million rows and 5 columns of various types. 

<img width="778" height="526" alt="Image" src="https://github.com/user-attachments/assets/8d1a67bf-181d-4d08-b021-2711d131b7cf" />

I am running the code on a features branch, and all unit tests for the feature branch are passing locally. 
https://github.com/eoincondron/pandas/tree/nanops-numba-implementation

The hardware is a new MacBook Pro with 8 cores. 

The performance is still slightly better at 1-million rows and is even greater at larger magnitudes (8x at 100 million rows). 
The caveat is that all JIT-compilation is already completed. 
I have carried out a more comprehensive performance comparison and these results hold up. 

Similarly to `bottleneck`, these codepaths can be toggled on and off. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Performance enhancements to nanops using `numba` #62449

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: Performance enhancements to nanops using numba #62449

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ENH: Performance enhancements to nanops using `numba` #62449