-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Improve performance for arrow dtypes in monotonic join #51365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
no objection here, but eventually we ought to find a way to do this dispatch without special-casing inside the index code (i.e. implement something at the EA level) hows is perf affected on multi-chunk pyarrow objs? |
Arrays have 2 million entries, initial performance 380ms, on this pr
2 chunks |
@jbrockmendel ok to merge? |
pandas/core/indexes/base.py
Outdated
elif isinstance(self.values, ArrowExtensionArray): | ||
import pyarrow as pa | ||
|
||
return type(self.values)(pa.array(result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_from_sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.