Skip to content

Commit 332f0df

Browse files
ZrahayZrahay
andauthored
BUG: Validate numeric_only parameter in groupby aggregations (#62842)
Co-authored-by: Zrahay <shubhangsinha4@gmail.com>
1 parent 9a5f846 commit 332f0df

File tree

3 files changed

+22
-3
lines changed

3 files changed

+22
-3
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1152,6 +1152,7 @@ Groupby/resample/rolling
11521152
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
11531153
- Bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
11541154
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
1155+
- Bug in :meth:`.DataFrameGroupBy` reductions where non-Boolean values were allowed for the ``numeric_only`` argument; passing a non-Boolean value will now raise (:issue:`62778`)
11551156
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
11561157
- Bug in :meth:`.Series.rolling` when used with a :class:`.BaseIndexer` subclass and computing min/max (:issue:`46726`)
11571158
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)

pandas/core/groupby/groupby.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ class providing the base-class of operations.
7676
ensure_dtype_can_hold_na,
7777
)
7878
from pandas.core.dtypes.common import (
79+
is_bool,
7980
is_bool_dtype,
8081
is_float_dtype,
8182
is_hashable,
@@ -109,9 +110,7 @@ class providing the base-class of operations.
109110
SparseArray,
110111
)
111112
from pandas.core.arrays.string_ import StringDtype
112-
from pandas.core.arrays.string_arrow import (
113-
ArrowStringArray,
114-
)
113+
from pandas.core.arrays.string_arrow import ArrowStringArray
115114
from pandas.core.base import (
116115
PandasObject,
117116
SelectionMixin,
@@ -1756,6 +1755,9 @@ def _cython_agg_general(
17561755
# Note: we never get here with how="ohlc" for DataFrameGroupBy;
17571756
# that goes through SeriesGroupBy
17581757

1758+
if not is_bool(numeric_only):
1759+
raise ValueError("numeric_only accepts only Boolean values")
1760+
17591761
data = self._get_data_to_aggregate(numeric_only=numeric_only, name=how)
17601762

17611763
def array_func(values: ArrayLike) -> ArrayLike:

pandas/tests/groupby/test_reductions.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1520,3 +1520,19 @@ def test_groupby_std_datetimelike():
15201520
exp_ser = Series([td1 * 2, td1, td1, td1, td4], index=np.arange(5))
15211521
expected = DataFrame({"A": exp_ser, "B": exp_ser, "C": exp_ser})
15221522
tm.assert_frame_equal(result, expected)
1523+
1524+
1525+
def test_mean_numeric_only_validates_bool():
1526+
# GH#62778
1527+
1528+
df = DataFrame({"A": range(5), "B": range(5)})
1529+
1530+
msg = "numeric_only accepts only Boolean values"
1531+
with pytest.raises(ValueError, match=msg):
1532+
df.groupby(["A"]).mean(["B"])
1533+
1534+
with pytest.raises(ValueError, match=msg):
1535+
df.groupby(["A"]).mean(numeric_only="True")
1536+
1537+
with pytest.raises(ValueError, match=msg):
1538+
df.groupby(["A"]).mean(numeric_only=1)

0 commit comments

Comments
 (0)