-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Don't cast nullable Boolean to float in groupby #33089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if is_extension_array_dtype(dtype) and dtype.kind != "M": | ||
# The result may be of any type, cast back to original | ||
# type if it's compatible. | ||
if len(result) and isinstance(result[0], dtype.type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check also blocks things like the conversion of float array to nullable integer which was a problem here as well: #32914
Thanks @dsaxton for the PR. This also fixes the regression in #32194. can you add test for that case and also test for IntDtype, see #33071 (comment) |
I've marked this 1.0.4 for now. |
@simonjayhawkins Added some new tests. Should I move the release note to 1.0.4 (and add something for the |
thanks
not sure if a 1.0.4 is planned. @TomAugspurger ? |
I’m not sure either.
… On Mar 28, 2020, at 09:44, Simon Hawkins ***@***.***> wrote:
Added some new tests.
thanks
Should I move the release note to 1.0.4
not sure if a 1.0.4 is planned. @TomAugspurger ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will look soon
if len(result) and isinstance(result[0], dtype.type): | ||
cls = dtype.construct_array_type() | ||
result = maybe_cast_to_extension_array(cls, result, dtype=dtype) | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again instead of expanding this check, i would completely remove it; it be encompassed in maybe_cast_result_type, which is the point of that function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can completely move this logic into maybe_cast_result_dtype
; e.g., for something like agg(pd.Series.nunique)
performed on a categorical, we end up with integer counts with an original dtype of categorical, but we don't actually know "how" we got there, so we can't say that the dtype should still be integer based on the kind of operation that was performed.
The datetime check seems to be another issue with not being able to introspect a user-provided function like agg(lambda g: g.iloc[0].year)
and know that the output should still be an int and not a datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is tagged as 1.0.4 at the moment just in case we do another patch release. If so, the changes in this PR should be kept to a minimum?
we are not tagging for 1.04 |
@dsaxton this looks good, can you merge master and ping on green. |
Thanks @dsaxton |
…e Boolean to float in groupby)
…to float in groupby) (#34023) Co-authored-by: Daniel Saxton <2658661+dsaxton@users.noreply.github.com>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff