-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Use correct ExtensionArray reductions in DataFrame reductions #35254
Conversation
def func(values): | ||
if is_extension_array_dtype(values.dtype): | ||
return extract_array(values)._reduce(name, skipna=skipna, **kwds) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not use the same exact path for non extension arrays
these if/then for extension arrays are terrible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'd have to put the analogous check in each of the nanops.nanfoo
functions, which I know @jorisvandenbossche wouldn't like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is func
now effectively the same as blk_func
on L8575?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, doing it here is IMO much cleaner than in each nanops function.
is func now effectively the same as blk_func on L8575?
Not exactly, the axis handling is different (also blk_func
knows it gets an array, so the extract_array stuff is not needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is ok as a short term check, but seeing these constant if is_extension_array do one thing, else do something else basically means the api is missing lots of things. I would much rather fix it than keep doing this kind of thing. This makes code extremly hard to read and understand.
I remember being very excited when we removed the is_ndarray check here and this because much simpler, now we are going backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this might be a path to a short-term compromise ;)
One problem is that frame_apply
path is only taken for if not self._is_homogeneous_type and filter_type is None: ...
.
So that means eg a DataFrame with only columns of the same EAdtype will still take the incorrect path. So maybe we can edit that condition to include something like or self._any_extension_type
.
I am also not sure why the filter_type="bool" functions (any/all) cannot take this path. That might give problems for the "boolean" dtype.
using |
Yah, the longer-term solution I have in mind is #34714 that all cases will go through. Is tinkering with the is_homogeneous_type check necessary for this PR, or can we punt on that until after 1.1? |
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
@jbrockmendel hope you don't mind, but I cherry-picked some commits with tests from me and Simon from the other PR onto this branch. Depending on whether we do the any_extension_type thing, we might need to xfail them, though
I would personally prefer to do this here as well, as otherwise you get a rather surprising inconsistency which underlying operation is actually ran for DataFrames with EAs depending on wether other dtypes are present or not. I pushed the tiny change for that as well in the last commit, to see what CI says, but we can easily undo that again. |
The one failure on the py37_np18 build seems to be unrelated .. Have we seen that elsewhere as well? |
yep, although AFAIK we don't have an issue for this. see #34906 (comment) and #35086 (comment) if i see this failure on a PR, I normally just re-run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
@simonjayhawkins I think the general whatsnew note added by Brock already covers that as well (can maybe just add the issue number) |
Merging in a few hours if no objections (cc @jreback, I think you're the only one without an explicit +1) |
@jorisvandenbossche your commits look good, thanks for the heads up |
thanks @jbrockmendel |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff