Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: fix DataFrame reduction with EA columns and numeric_only=True #33761

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Apr 24, 2020

Closes #33256

For the block-wise path in DataFrame._reduce, we ensure to use the EA reduction for blocks holding EAs, instead of passing the EA to the op function (which is typically the nanops nan function).

This needs #33758 to be merged, since that PR fixes the test that is failing here.

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Numeric Operations Arithmetic, Comparison, and Logical operations ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 24, 2020
return op(values, axis=0, skipna=skipna, **kwds)
return op(values, axis=1, skipna=skipna, **kwds)
if isinstance(values, ExtensionArray):
return values._reduce(name, skipna=skipna, **kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is _reduce part of the interface?

if we're avoiding nanops, can/should we make those functions either a) not handle EA cases or b) dispatch to _reduce?

@jorisvandenbossche
Copy link
Member Author

is _reduce part of the interface?

yes (and the default is to raise an error saying the reduction is not supported)

if we're avoiding nanops, can/should we make those functions either a) not handle EA cases or b) dispatch to _reduce?

Since _reduce is the interface that Series calls, and thus IMO also DataFrame should call for EA columsn (which I am doing here for the block wise path, and trying to do in general in #32867 (also for numeric_only=None), then in principle, I think we can/should always call that for EAs, and should not need to call a nanops function for an EA. Which means that in principle nanops would not need to handle EAs.

Of course, we could also decide on always calling nanops, and putting the logic to call _reduce for EAs in nanops. That's a choice we can make.
(but in either way, I think the core algorithmic code of nanops should not need to handle EAs, as the EA._reduce can pass the correct ndarray to nanops and reinterpret the result)

@jorisvandenbossche
Copy link
Member Author

More comments on this? (apart from that I need to rebase + fix the tests (update error message match))

@jreback
Copy link
Contributor

jreback commented Apr 30, 2020

seems ok to me

@jbrockmendel
Copy link
Member

ill take another look once rebased+green, but tentatively looks good

@jorisvandenbossche
Copy link
Member Author

Updated on master

@jorisvandenbossche
Copy link
Member Author

(travis failure is an unrelated URLError connection error in test_read_from_http_url)

@jbrockmendel
Copy link
Member

LGTM

@jorisvandenbossche jorisvandenbossche merged commit 32dd55c into pandas-dev:master May 1, 2020
@jorisvandenbossche jorisvandenbossche deleted the gh-33256-numeric_only branch May 1, 2020 18:19
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 1, 2020
@simonjayhawkins
Copy link
Member

xref #33300 to track.

simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request May 5, 2020
…uction with EA columns and numeric_only=True)
simonjayhawkins added a commit that referenced this pull request May 5, 2020
…h EA columns and numeric_only=True) (#34000)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@simonjayhawkins simonjayhawkins modified the milestones: 1.1, 1.0.4 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Numeric Operations Arithmetic, Comparison, and Logical operations Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: DataFrame.mean(numeric_only=True) raises AttributeError on v1.0.3
4 participants