-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: de-duplicate Block.should_store and related #33028
CLN: de-duplicate Block.should_store and related #33028
Conversation
wow, cleans it up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments on changes to the array values methods
return np.asarray(self.values.astype("datetime64[ns]", copy=False)) | ||
|
||
def array_values(self) -> ExtensionArray: | ||
return DatetimeArray._simple_new(self.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no the same as self._holder(self.values)
as you did above, this was calling _simple_new
on purpose for efficience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. My thought was that since the DTA constructor is relatively strict, the overhead wouldn't be too bad. Can revert if you have a strong opinion. (and/or we could cache_readonly array_values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need to look back or time it again to be sure about the exact impact, but as far as I remember, I did this change when optimizing extract_array
, and this was done for a reason
values = self.values.ravel() | ||
result = self._holder(values).astype(object) | ||
return result.reshape(self.values.shape) | ||
return self._holder(self.values).astype(object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here on why _holder
can handle 2D?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
if self.ndim == 2: | ||
# Ensure that our shape is correct for DataFrame. | ||
# ExtensionArrays are always 1-D, even in a DataFrame when | ||
# the analogous NumPy-backed column would be a 2-D ndarray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
@jreback thoughts on using the new default should_store implementation (return is_dtype_equal(self.dtype, other.dtype)` for the remaining places that special case it (ExtensionBlock, ComplexBlock)? For ExtensionBlock im pretty sure what we have now is just wrong, while for ComplexBlock it becomes equivalent to the new default if we were to add an itemsize check, which seems reasonable |
yes i think you can |
OK next pass |
Thanks! |
FooBlock.should_store is equivalent to
is_dtype_equal(self.dtype, other.dtype)
for almost all of our Block subclasses. This makes that the implementation for the base class, so we define it in fewer places. xref #32878 on remaining cases.