Skip to content

A unified method dispatch protocol for ExtensionArrays #26935

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jun 19, 2019 · 5 comments
Closed

A unified method dispatch protocol for ExtensionArrays #26935

ghost opened this issue Jun 19, 2019 · 5 comments

Comments

@ghost
Copy link

ghost commented Jun 19, 2019

What I'd like to have is:

  1. a single, unified interface for dispatching Series methods
  2. which supports EA
  3. for all methods
  4. and allows every method to be overridden by EA, as-needed
  5. supports NA
  6. doesn't require EA authors to implement some methods to signature-compatible with numpy (round), and signature compatible with a competing pandas version in other (sum via _reduce)
  7. Clearly namespaces pandas functions, which also indicates cooperation/opt-in (perhaps via duck typing), so that
    7a. its clear which methods implements pandas interface
    7b. there are no methods which are named like a numpy method, but with a pandas signature, or vice versa
    7c. when reading an EA's code, you don't have to guess or figure out which code is
    a pandas method and which is an internal method for the EA (like the poorly named _reduce)
  8. provides good error messages for unimplemented methods.
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 19, 2019

(sidenote: it is great that you are exploring all those things, really! That is very welcome, and also needed, as you will have noticed many areas here are still not fully explored (the current EAs we ourselves worked with mostly don't need those numpy funcs). But I would recommend slowing down a bit the number of discussion issues you are opening (bug reports is something else of course), and try to focus our energy to the current open discussions. We all have limited time, and having yet another new issue before we had time to respond to the previous one will not help in that, and it will also frustrate you of having to wait more. This is not always an easy aspect of open source, but a reality we have to cope with!)

But let's use this issue now to have the overview discussion related to methods dispatch of Series to EA.

@jorisvandenbossche
Copy link
Member

Some comments on the analysis side first:

_reduce for pandas. which duplicates many but not all numpy ufuncs, Extends them, but has incompatible signatures with them.

_reduce does not cover ufuncs, but it covers a bunch of reductions (so they rather overlap with numpy's method-name-based dispatch and now __array_function__.
They indeed have an incompatible signature (we have eg skipna), and for that reason I think we cannot rely on a numpy dispatch method here.

Other methods are neither pure wrappers of numpy, nor dispatch to an overridable method (like _reduce), so in the current (lack of) "model" they cannot be overriden by EA authors at all. For example Series.cumprod(skipna=True) (accum_func is np.prod.accumulate, which uses numpy's dispatch logic)

I don't think np.prod.accumulate exists? (prod is not a ufunc) You mean np.mutliply.accumulate ?
But to the actual issue: this is certainly something we didn't yet really think about.

@jorisvandenbossche
Copy link
Member

The point holds though, it's the way skipna=True is handled via _make_cum_func in the snippet, which only works with ndarrays, because of the use of np.putmask which required ndarrays.

I suppose np.putmask would also fall under __array_function__ protocol, so could be made to work on an EA?

@ghost
Copy link
Author

ghost commented Jun 19, 2019

I just checked, yes, np.putmask triggers __array_function__. Which is surprising becauseI checked before adding it to the list, and method-dispatch override doesn't seem to work for it. I've asked numpy about it: numpy/numpy#13806.

@ghost
Copy link
Author

ghost commented Jun 20, 2019

I get the feeling that ExtensionArray were a great opportunity to reduce the coupling between pandas and numpyand, instead, with the trajectory dispatch is taking, more and more of numpy's conventions and legacy is being baked in to a whole new generation of pandas extensions. :(

@ghost ghost closed this as completed Jul 8, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant