-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[BUG] Sum of grouped bool has inconsistent dtype #32894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this - the code paths here are tricky but for general feedback on your approach I think this should all be self-contained within groupby, i.e. not leaking into the dtype / inference modules
a6d9162
to
bd34d30
Compare
@rhshadrach if you can update |
bd34d30
to
004e2dc
Compare
@jreback Failure is unrelated, ready for another review:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. small comments, ping on green.
cc @TomAugspurger @jorisvandenbossche @WillAyd @jbrockmendel if any comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - certainly a more logical placement of things
Addresses: GH7001
- Reverted maybe_downcast_numeric to its original state - Parameterized tests
- Removed unnecessary import of try_cast_to_ea from arrays.__init__
004e2dc
to
8902484
Compare
@jreback ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. small follow request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually need a whatsnew note for this. if you can fix the out-of-context comment and change the name of the try_cast_to_ea as suggested. ping on green.
@jreback ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good. one more item. ping on green and can get this in.
@jreback ping |
very nice @rhshadrach keep em coming! |
Thanks for the feedback @jbrockmendel. I'll clean these type hitns up in a subsequent PR. |
FYI mypy error: so maybe worth adding type hints to |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Would appreciate any feedback on this attempt.
The strategy is to modify the dtype after the aggregation is computed in certain cases when casting. In order for this to work, the cast functions need to be made aware of how the data was aggregated. I've added an optional "how" argument to maybe_downcast_numeric and _try_cast. Because this dtype change is needed in two places, I've added the function groupby_result_dtype to dtypes/common.py to handle the logic.
I wasn't sure where the mapping information needed by groupby_result_dtype should be stored. Currently it is in the function itself, but maybe there is a better place for it.
If this is a good approach, it could potentially be expanded for other aggregations and datatypes. One thought is that perhaps groupby(-).mean() should always return a float for numeric types.