[BUG] Sum of grouped bool has inconsistent dtype #32894

rhshadrach · 2020-03-21T19:22:49Z

closes Sum of grouped bool column has inconsistent type #7001
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Would appreciate any feedback on this attempt.

The strategy is to modify the dtype after the aggregation is computed in certain cases when casting. In order for this to work, the cast functions need to be made aware of how the data was aggregated. I've added an optional "how" argument to maybe_downcast_numeric and _try_cast. Because this dtype change is needed in two places, I've added the function groupby_result_dtype to dtypes/common.py to handle the logic.

I wasn't sure where the mapping information needed by groupby_result_dtype should be stored. Currently it is in the function itself, but maybe there is a better place for it.

If this is a good approach, it could potentially be expanded for other aggregations and datatypes. One thought is that perhaps groupby(-).mean() should always return a float for numeric types.

WillAyd

Thanks for taking a stab at this - the code paths here are tricky but for general feedback on your approach I think this should all be self-contained within groupby, i.e. not leaking into the dtype / inference modules

pandas/tests/groupby/test_groupby.py

pandas/core/dtypes/common.py

pandas/core/dtypes/cast.py

pandas/core/groupby/generic.py

pandas/core/groupby/groupby.py

pandas/tests/groupby/test_groupby.py

pandas/core/groupby/groupby.py

jreback · 2020-03-24T19:57:09Z

@rhshadrach if you can update

rhshadrach · 2020-03-24T23:54:36Z

@jreback Failure is unrelated, ready for another review:

##[error]./pandas/tests/indexes/timedeltas/test_constructors.py:173:String has a space at the beginning instead of the end of the previous string.

jreback

looks pretty good. small comments, ping on green.

cc @TomAugspurger @jorisvandenbossche @WillAyd @jbrockmendel if any comments

pandas/core/arrays/__init__.py

pandas/core/dtypes/cast.py

WillAyd

lgtm - certainly a more logical placement of things

Addresses: GH7001

- Reverted maybe_downcast_numeric to its original state - Parameterized tests

- Removed unnecessary import of try_cast_to_ea from arrays.__init__

rhshadrach · 2020-03-25T15:06:49Z

@jreback ping

jreback

lgtm. small follow request.

pandas/core/dtypes/cast.py

jreback

actually need a whatsnew note for this. if you can fix the out-of-context comment and change the name of the try_cast_to_ea as suggested. ping on green.

rhshadrach · 2020-03-26T21:50:35Z

@jreback ping

jreback

changes look good. one more item. ping on green and can get this in.

pandas/core/arrays/categorical.py

pandas/core/dtypes/cast.py

…n_array

rhshadrach · 2020-03-26T23:35:06Z

@jreback ping

jreback · 2020-03-26T23:46:08Z

very nice @rhshadrach keep em coming!

pandas/core/dtypes/cast.py

rhshadrach · 2020-03-28T22:53:41Z

Thanks for the feedback @jbrockmendel. I'll clean these type hitns up in a subsequent PR.

simonjayhawkins · 2020-04-04T12:51:57Z

I'll clean these type hitns up in a subsequent PR.

FYI mypy error: pandas\core\dtypes\cast.py:333:18: error: "type" has no attribute "_from_sequence"

so maybe worth adding type hints to maybe_cast_to_extension_array and add a subclass check to the instance check

WillAyd requested changes Mar 21, 2020

View reviewed changes

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

WillAyd added Dtype Conversions Unexpected or buggy dtype conversions Groupby labels Mar 21, 2020

jreback requested changes Mar 22, 2020

View reviewed changes

pandas/core/dtypes/common.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

pandas/core/groupby/generic.py Outdated Show resolved Hide resolved

rhshadrach force-pushed the groupby_bool branch from a6d9162 to bd34d30 Compare March 22, 2020 15:40

jreback requested changes Mar 22, 2020

View reviewed changes

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

pandas/core/groupby/groupby.py Outdated Show resolved Hide resolved

jreback mentioned this pull request Mar 24, 2020

BUG: Fix min_count issue for groupby.sum #32914

Merged

5 tasks

rhshadrach force-pushed the groupby_bool branch from bd34d30 to 004e2dc Compare March 24, 2020 23:08

jreback requested changes Mar 25, 2020

View reviewed changes

pandas/core/arrays/__init__.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

jreback added this to the 1.1 milestone Mar 25, 2020

WillAyd approved these changes Mar 25, 2020

View reviewed changes

rhshadrach added 4 commits March 25, 2020 08:06

[BUG] Aggregated bool has inconsistent dtype

465a69b

Addresses: GH7001

Moved function groupby_result_dtype to _GroupBy._result_dtype

3bc35f8

- Reverted maybe_downcast_numeric to its original state - Parameterized tests

Moved cast functions to core.dtypes.cast, tests to test_aggregate

e6f2408

Added/improved docstrings and type hints

8902484

- Removed unnecessary import of try_cast_to_ea from arrays.__init__

rhshadrach force-pushed the groupby_bool branch from 004e2dc to 8902484 Compare March 25, 2020 12:07

Removed references to aggregation in docstrings.

15028eb

jreback approved these changes Mar 26, 2020

View reviewed changes

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

jreback requested changes Mar 26, 2020

View reviewed changes

Renamed try_cast_to_ea, updated comment, whatsnew

3c484f3

jreback requested changes Mar 26, 2020

View reviewed changes

pandas/core/arrays/categorical.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Outdated Show resolved Hide resolved

pandas/core/dtypes/cast.py Show resolved Hide resolved

Fixed import from refactor, restricted type in maybe_cast_to_extensio…

9b32d0e

…n_array

jreback approved these changes Mar 26, 2020

View reviewed changes

jreback merged commit 1a57596 into pandas-dev:master Mar 26, 2020

jbrockmendel reviewed Mar 28, 2020

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

jbrockmendel reviewed Mar 28, 2020

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

jbrockmendel reviewed Mar 28, 2020

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

jbrockmendel reviewed Mar 28, 2020

View reviewed changes

pandas/core/dtypes/cast.py Show resolved Hide resolved

rhshadrach mentioned this pull request Apr 4, 2020

CLN: Add/refine type hints to some functions in core.dtypes.cast #33286

Merged

5 tasks

rhshadrach deleted the groupby_bool branch July 11, 2020 16:02

Uh oh!

[BUG] Sum of grouped bool has inconsistent dtype #32894

[BUG] Sum of grouped bool has inconsistent dtype #32894

Uh oh!

Conversation

rhshadrach commented Mar 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jreback commented Mar 24, 2020

Uh oh!

rhshadrach commented Mar 24, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Mar 25, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Mar 26, 2020

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhshadrach commented Mar 26, 2020

Uh oh!

jreback commented Mar 26, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhshadrach commented Mar 28, 2020

Uh oh!

simonjayhawkins commented Apr 4, 2020

Uh oh!

Uh oh!

rhshadrach commented Mar 21, 2020 •

edited

Loading