Skip to content

Backport PR #42838 on branch 1.3.x (REGR: Series.nlargest with masked arrays) #42975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Fixed regressions
- Fixed regression where :meth:`pandas.read_csv` raised a ``ValueError`` when parameters ``names`` and ``prefix`` were both set to None (:issue:`42387`)
- Fixed regression in comparisons between :class:`Timestamp` object and ``datetime64`` objects outside the implementation bounds for nanosecond ``datetime64`` (:issue:`42794`)
- Fixed regression in :meth:`.Styler.highlight_min` and :meth:`.Styler.highlight_max` where ``pandas.NA`` was not successfully ignored (:issue:`42650`)
- Regression in :meth:`Series.nlargest` and :meth:`Series.nsmallest` with nullable integer or float dtype (:issue:`41816`)
- Fixed regression in :meth:`pandas.Series.quantile` with :class:`pandas.Int64Dtype` (:issue:`42626`)

.. ---------------------------------------------------------------------------
Expand Down
16 changes: 16 additions & 0 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
DtypeObj,
FrameOrSeriesUnion,
Scalar,
final,
)
from pandas.util._decorators import doc

Expand Down Expand Up @@ -1215,12 +1216,15 @@ def __init__(self, obj, n: int, keep: str):
def compute(self, method: str) -> FrameOrSeriesUnion:
raise NotImplementedError

@final
def nlargest(self):
return self.compute("nlargest")

@final
def nsmallest(self):
return self.compute("nsmallest")

@final
@staticmethod
def is_valid_dtype_n_method(dtype: DtypeObj) -> bool:
"""
Expand Down Expand Up @@ -1259,6 +1263,18 @@ def compute(self, method: str) -> Series:

dropped = self.obj.dropna()

if is_extension_array_dtype(dropped.dtype):
# GH#41816 bc we have dropped NAs above, MaskedArrays can use the
# numpy logic.
from pandas.core.arrays import BaseMaskedArray

arr = dropped._values
if isinstance(arr, BaseMaskedArray):
ser = type(dropped)(arr._data, index=dropped.index, name=dropped.name)

result = type(self)(ser, n=self.n, keep=self.keep).compute(method)
return result.astype(arr.dtype)

# slow method
if n >= len(self.obj):
ascending = method == "nsmallest"
Expand Down
16 changes: 16 additions & 0 deletions pandas/tests/series/methods/test_nlargest.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,3 +211,19 @@ def test_nlargest_boolean(self, data, expected):
result = ser.nlargest(1)
expected = Series(expected)
tm.assert_series_equal(result, expected)

def test_nlargest_nullable(self, any_nullable_numeric_dtype):
# GH#42816
dtype = any_nullable_numeric_dtype
arr = np.random.randn(10).astype(dtype.lower(), copy=False)

ser = Series(arr.copy(), dtype=dtype)
ser[1] = pd.NA
result = ser.nlargest(5)

expected = (
Series(np.delete(arr, 1), index=ser.index.delete(1))
.nlargest(5)
.astype(dtype)
)
tm.assert_series_equal(result, expected)