-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: call _block_shape from EABlock.make_block #33308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# block.shape is incorrect for "2D" ExtensionArrays | ||
# We can't, and don't need to, reshape. | ||
values = values.reshape(tuple((1,) + shape)) | ||
values = values.reshape(tuple((1,) + shape)) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think OK to use cast after using is_* to avoid the #type: ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we now using cast
instead of type: ignore
? i went with the latter bc its easier to grep for and out on a list of Things To Fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://pandas.pydata.org/docs/development/contributing.html#style-guidelines specifically mentions the is_* case and suggests a refactor is the preferred route.
i went with the latter bc its easier to grep for and out on a list of Things To Fix
for me, this is OK. but I believe Jeff and Will oppose type:ignore as a reason to revist later
. In the past, i've been in the minority and struggled to get PRs accepted. If you can see the benefits of a fix later strategy with type hints, now could be a good time to resurface this discussion.
pyre accepts this is a reasonable need for gradual typing, see https://pyre-check.org/docs/error-suppression.html#suppression-comment-types. it has pyre-fixme
and pyre-ignore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel #32785 was opened for this discussion. see #32730 (comment) cc @jorisvandenbossche
if not is_extension_array_dtype(values): | ||
# TODO: https://github.com/pandas-dev/pandas/issues/23023 | ||
shape = values.shape | ||
if not is_extension_array_dtype(values.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed? (as you are calling the check in the block constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we get a bunch of test failures without this check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i think we DO want to remove this from inside block_shape though, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the whole point of block_shape, im not clear on what you're suggesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have a check IN the Block constructor AND in the _block_shape function itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should have 1 or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we dont have a reshape call in the constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously, i think we should have 2D EAs and not have to hassle with any of this
@@ -244,6 +244,8 @@ def make_block(self, values, placement=None) -> "Block": | |||
""" | |||
if placement is None: | |||
placement = self.mgr_locs | |||
if self.is_extension: | |||
values = _block_shape(values, ndim=self.ndim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel is this not a conditional check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a conditional check, but self.is_extension
doesnt tell us anything about whether values
is an EA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are calling .make_block from an EA block, don't we by definition know that we have extension block values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. e.g. if we call pd.isna(frame)
that will end up with something like:
res_values = pd.isna(block.values)
return block.make_block(res_values)
which will be ndarray[bool]
@@ -244,6 +244,8 @@ def make_block(self, values, placement=None) -> "Block": | |||
""" | |||
if placement is None: | |||
placement = self.mgr_locs | |||
if self.is_extension: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that is this necessary this (if), or are you saying that this is a perf issue? (e.g. we don't want to always call _block_shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM we are calling _block_shape in a bunch of places before calling ExtensionBlock.make_block
. This is just moving all of those calls to one place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to always call
_block_shape
With some corner-case-caveats, we only want to call it if self.is_extension, and we pretty much always have to call it if self.is_extension (otherwise we could use make_block_same_class
thanks |
No description provided.