Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExtensionArray.argsort and nargsort #27218

Closed
TomAugspurger opened this issue Jul 3, 2019 · 7 comments
Closed

ExtensionArray.argsort and nargsort #27218

TomAugspurger opened this issue Jul 3, 2019 · 7 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@TomAugspurger
Copy link
Contributor

47ffcd6 changes how we handle sorting EAs. We've made nargsort compatible with extension arrays. It now calls ExtensionArray._values_for_argsort and then applies pandas nargsort sorting algorithm to the values.

This may have broken the (implied) ability for ExtensionArray's to "bring their own" sorting algorithm. For example, you can imagine a GPU-backed EA that has a very fast sorting algorithm available. While ExtensionArray.argsort can use that custom sorting algo, pandas may not. Anywhere that uses nargsort(obj) will now essentially do nargsort(thing._values_for_argsort()) for EAs.

@jorisvandenbossche
Copy link
Member

My current thinking is that nargsort should dispatch to EA.argsort() instead of using np.argsort if an EA is passed in. And then nargsort can further deal with ascending / descending and position of NaNs (something that EA.argsort does not handle).

Of course, that means that EA.argsort will call nargsort on its "values to be sorted".
So something like nargsort(EA) -> EA.argsort() -> nargsort(EA._values_for_argsort) with default settings -> the result of this is further processed in nargsort for final result of nargsort(EA)

So we should maybe see how this works for the missing values (eg for the EA, only for the non-missing values EA.argsort is called? or do we skip the separation of the values in missing / non-missing as we do for np.argsort for EAs?)

@jreback jreback modified the milestones: 0.25.0, 1.0 Jul 17, 2019
@jbrockmendel jbrockmendel added the ExtensionArray Extending pandas with custom dtypes or arrays. label Jul 23, 2019
@TomAugspurger
Copy link
Contributor Author

Is this still relevant now that NumPy has the array_function protocol? If an array implements that get to use their own sorting algo.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Jan 2, 2020
@TomAugspurger
Copy link
Contributor Author

Pushing off 1.0, though this issue is a bit unclear to me now.

@jbrockmendel
Copy link
Member

This may have broken the (implied) ability for ExtensionArray's to "bring their own" sorting algorithm

So something like nargsort(EA) -> EA.argsort() -> nargsort(EA._values_for_argsort) with default settings -> the result of this is further processed in nargsort for final result of nargsort(EA)

Agreed on both counts. A recent attempt to change nargsort/EA.argsort to work this way broke the world. Have either of you tried doing this?

@jbrockmendel
Copy link
Member

@jorisvandenbossche have you looked at this recently? IIUC this is a blocker for #43840. You seem to have the strongest opinions w/r/t to argsort (xref #33941, #33942, #37815) and i don't fully grok them, would probably be made clearer with an implementation.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel
Copy link
Member

nargsort now dispatches to an EA's argsort method, so I think this is closable?

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jul 17, 2023
@mroeschke
Copy link
Member

Yeah it appears this is covered in the EA interface so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

5 participants