-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
EnhancementNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arraysNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsReduction Operationssum, mean, min, max, etc.sum, mean, min, max, etc.
Milestone
Description
Similarly as we now have masked implementations for sum, prod, min and max for the nullable integer array (first PR #30982, now lives at https://github.com/pandas-dev/pandas/blob/master/pandas/core/array_algos/masked_reductions.py), we can add one for the mean
reduction as well.
Very rough check gives a nice speed-up:
In [27]: arr = pd.array(np.random.randint(0, 1000, 1_000_000), dtype="Int64")
In [28]: arr[np.random.randint(0, 1_000_000, 1000)] = pd.NA
In [30]: arr._reduce("mean")
Out[30]: 499.27095868772903
In [31]: %timeit arr._reduce("mean")
7.26 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [32]: arr._data.sum(where=~arr._mask, dtype="float64") / (~arr._mask).sum()
Out[32]: 499.27095868772903
In [33]: %timeit arr._data.sum(where=~arr._mask, dtype="float64") / (~arr._mask).sum()
2.08 ms ± 6.89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The nanmean
version lives here: https://github.com/pandas-dev/pandas/blob/master/pandas/core/nanops.py#L517
And as reference, numpy is also adding a version that accepts a mask: numpy/numpy#15852 (which could be used in the future, and as inspiration for the implementation now).
Metadata
Metadata
Assignees
Labels
EnhancementNA - MaskedArraysRelated to pd.NA and nullable extension arraysRelated to pd.NA and nullable extension arraysNumeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operationsReduction Operationssum, mean, min, max, etc.sum, mean, min, max, etc.