Skip to content

BUG: Don't cast nullable Boolean to float in groupby #33089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 7, 2020
Merged

BUG: Don't cast nullable Boolean to float in groupby #33089

merged 18 commits into from
Apr 7, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Mar 28, 2020

if is_extension_array_dtype(dtype) and dtype.kind != "M":
# The result may be of any type, cast back to original
# type if it's compatible.
if len(result) and isinstance(result[0], dtype.type):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check also blocks things like the conversion of float array to nullable integer which was a problem here as well: #32914

@simonjayhawkins
Copy link
Member

Thanks @dsaxton for the PR. This also fixes the regression in #32194. can you add test for that case and also test for IntDtype, see #33071 (comment)

@simonjayhawkins simonjayhawkins added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Groupby Regression Functionality that used to work in a prior pandas version labels Mar 28, 2020
@simonjayhawkins simonjayhawkins added this to the 1.0.4 milestone Mar 28, 2020
@simonjayhawkins
Copy link
Member

I've marked this 1.0.4 for now.

@dsaxton
Copy link
Member Author

dsaxton commented Mar 28, 2020

Thanks @dsaxton for the PR. This also fixes the regression in #32194. can you add test for that case and also test for IntDtype, see #33071 (comment)

@simonjayhawkins Added some new tests. Should I move the release note to 1.0.4 (and add something for the agg(dict) case)?

@simonjayhawkins
Copy link
Member

Added some new tests.

thanks

Should I move the release note to 1.0.4

not sure if a 1.0.4 is planned. @TomAugspurger ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 28, 2020 via email

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will look soon

if len(result) and isinstance(result[0], dtype.type):
cls = dtype.construct_array_type()
result = maybe_cast_to_extension_array(cls, result, dtype=dtype)
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again instead of expanding this check, i would completely remove it; it be encompassed in maybe_cast_result_type, which is the point of that function

Copy link
Member Author

@dsaxton dsaxton Mar 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can completely move this logic into maybe_cast_result_dtype; e.g., for something like agg(pd.Series.nunique) performed on a categorical, we end up with integer counts with an original dtype of categorical, but we don't actually know "how" we got there, so we can't say that the dtype should still be integer based on the kind of operation that was performed.

The datetime check seems to be another issue with not being able to introspect a user-provided function like agg(lambda g: g.iloc[0].year) and know that the output should still be an int and not a datetime.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tagged as 1.0.4 at the moment just in case we do another patch release. If so, the changes in this PR should be kept to a minimum?

@jreback
Copy link
Contributor

jreback commented Mar 30, 2020

we are not tagging for 1.04
it’s would take quite a bit to backport

@jreback jreback modified the milestones: 1.0.4, 1.1 Mar 30, 2020
@jreback
Copy link
Contributor

jreback commented Apr 7, 2020

@dsaxton this looks good, can you merge master and ping on green.

@simonjayhawkins simonjayhawkins merged commit 6ae710a into pandas-dev:master Apr 7, 2020
@simonjayhawkins
Copy link
Member

Thanks @dsaxton

@dsaxton dsaxton deleted the result-casting branch April 7, 2020 13:26
@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.0.4 May 6, 2020
simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request May 6, 2020
simonjayhawkins added a commit that referenced this pull request May 7, 2020
…to float in groupby) (#34023)

Co-authored-by: Daniel Saxton <2658661+dsaxton@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
4 participants