-
-
Notifications
You must be signed in to change notification settings - Fork 19.2k
TST: simplify pyarrow tests, make mode work with temporal dtypes #50688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| ) | ||
| def test_getitem_scalar(self, data): | ||
| super().test_getitem_scalar(data) | ||
| # In the base class we expect data.dtype.type; but this (intentionally) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke might it make sense to reconsider this? im not sure what context the current PyArrow.lib.DataType is really useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, yeah I am not exactly sure how ExtensionDtype.type is used internally, but I think it would make sense if it returns the native python type. Would be nice if there was a nice way to derive that from numpy_dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the main use case that comes to mind is if i want to do e.g. dtype.type(0) to get an object of the correct type
pandas/core/arrays/arrow/array.py
Outdated
| if pa.types.is_temporal(pa_type): | ||
| nbits = pa_type.bit_width | ||
| dtype = f"int{nbits}[pyarrow]" | ||
| obj = cast(ArrowExtensionArrayT, self.astype(dtype, copy=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just self._data.cast to int type, continue the logic below, and then most_common.cast back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could. i preferred this way since it was self-contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect going through pa.array.cast may be more performant than astype. Mind checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated + green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like _mode here is still using astype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in some cases this needs an int32 and in some cases an int64, which makes the .cast version extra-complicated
|
If I am understanding apache/arrow#33685, then in pyarrow 11 we won't have to do the the int64 workaround? |
I don't know if that will affect mode directly. |
|
updated to use .cast instead of .astype |
|
Thanks @jbrockmendel |
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.