Skip to content

Commit

Permalink
BUG: Fix min_count issue for groupby.sum (#32914)
Browse files Browse the repository at this point in the history
* Add test

* Check for null

* Release note

* Update and comment

* Update test

* Hack

* Try a different casting

* No pd

* Only for add

* Undo

* Release note

* Fix

* Space

* maybe_cast_result

* float -> Int

* Less if

Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
  • Loading branch information
dsaxton and simonjayhawkins authored May 9, 2020
1 parent c4eb840 commit 6f5614b
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 11 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -752,6 +752,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrameGroupby.transform` produces incorrect result with transformation functions (:issue:`30918`)
- Bug in :meth:`GroupBy.count` causes segmentation fault when grouped-by column contains NaNs (:issue:`32841`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` produces inconsistent type when aggregating Boolean series (:issue:`32894`)
- Bug in :meth:`DataFrameGroupBy.sum` and :meth:`SeriesGroupBy.sum` where a large negative number would be returned when the number of non-null values was below ``min_count`` for nullable integer dtypes (:issue:`32861`)
- Bug in :meth:`SeriesGroupBy.quantile` raising on nullable integers (:issue:`33136`)
- Bug in :meth:`SeriesGroupBy.first`, :meth:`SeriesGroupBy.last`, :meth:`SeriesGroupBy.min`, and :meth:`SeriesGroupBy.max` returning floats when applied to nullable Booleans (:issue:`33071`)
- Bug in :meth:`DataFrameGroupBy.agg` with dictionary input losing ``ExtensionArray`` dtypes (:issue:`32194`)
Expand Down
15 changes: 4 additions & 11 deletions pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from pandas.errors import AbstractMethodError
from pandas.util._decorators import cache_readonly

from pandas.core.dtypes.cast import maybe_cast_result
from pandas.core.dtypes.common import (
ensure_float64,
ensure_int64,
Expand Down Expand Up @@ -548,17 +549,6 @@ def _cython_operation(
if mask.any():
result = result.astype("float64")
result[mask] = np.nan
elif (
how == "add"
and is_integer_dtype(orig_values.dtype)
and is_extension_array_dtype(orig_values.dtype)
):
# We need this to ensure that Series[Int64Dtype].resample().sum()
# remains int64 dtype.
# Two options for avoiding this special case
# 1. mask-aware ops and avoid casting to float with NaN above
# 2. specify the result dtype when calling this method
result = result.astype("int64")

if kind == "aggregate" and self._filter_empty_groups and not counts.all():
assert result.ndim != 2
Expand All @@ -582,6 +572,9 @@ def _cython_operation(
elif is_datetimelike and kind == "aggregate":
result = result.astype(orig_values.dtype)

if is_extension_array_dtype(orig_values.dtype):
result = maybe_cast_result(result=result, obj=orig_values, how=how)

return result, names

def aggregate(
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1014,3 +1014,20 @@ def test_apply_to_nullable_integer_returns_float(values, function):
result = groups.agg([function])
expected.columns = MultiIndex.from_tuples([("b", function)])
tm.assert_frame_equal(result, expected)


def test_groupby_sum_below_mincount_nullable_integer():
# https://github.com/pandas-dev/pandas/issues/32861
df = pd.DataFrame({"a": [0, 1, 2], "b": [0, 1, 2], "c": [0, 1, 2]}, dtype="Int64")
grouped = df.groupby("a")
idx = pd.Index([0, 1, 2], dtype=object, name="a")

result = grouped["b"].sum(min_count=2)
expected = pd.Series([pd.NA] * 3, dtype="Int64", index=idx, name="b")
tm.assert_series_equal(result, expected)

result = grouped.sum(min_count=2)
expected = pd.DataFrame(
{"b": [pd.NA] * 3, "c": [pd.NA] * 3}, dtype="Int64", index=idx
)
tm.assert_frame_equal(result, expected)

0 comments on commit 6f5614b

Please sign in to comment.