Skip to content

REF: call _block_shape from EABlock.make_block #33308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 10, 2020

Conversation

jbrockmendel
Copy link
Member

No description provided.

# block.shape is incorrect for "2D" ExtensionArrays
# We can't, and don't need to, reshape.
values = values.reshape(tuple((1,) + shape))
values = values.reshape(tuple((1,) + shape)) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think OK to use cast after using is_* to avoid the #type: ignore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we now using cast instead of type: ignore? i went with the latter bc its easier to grep for and out on a list of Things To Fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pandas.pydata.org/docs/development/contributing.html#style-guidelines specifically mentions the is_* case and suggests a refactor is the preferred route.

i went with the latter bc its easier to grep for and out on a list of Things To Fix

for me, this is OK. but I believe Jeff and Will oppose type:ignore as a reason to revist later. In the past, i've been in the minority and struggled to get PRs accepted. If you can see the benefits of a fix later strategy with type hints, now could be a good time to resurface this discussion.

pyre accepts this is a reasonable need for gradual typing, see https://pyre-check.org/docs/error-suppression.html#suppression-comment-types. it has pyre-fixme and pyre-ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel #32785 was opened for this discussion. see #32730 (comment) cc @jorisvandenbossche

@simonjayhawkins simonjayhawkins added the Refactor Internal refactoring of code label Apr 6, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Apr 6, 2020
if not is_extension_array_dtype(values):
# TODO: https://github.com/pandas-dev/pandas/issues/23023
shape = values.shape
if not is_extension_array_dtype(values.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed? (as you are calling the check in the block constructor).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we get a bunch of test failures without this check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i think we DO want to remove this from inside block_shape though, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the whole point of block_shape, im not clear on what you're suggesting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have a check IN the Block constructor AND in the _block_shape function itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should have 1 or the other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we dont have a reshape call in the constructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously, i think we should have 2D EAs and not have to hassle with any of this

@@ -244,6 +244,8 @@ def make_block(self, values, placement=None) -> "Block":
"""
if placement is None:
placement = self.mgr_locs
if self.is_extension:
values = _block_shape(values, ndim=self.ndim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is this not a conditional check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a conditional check, but self.is_extension doesnt tell us anything about whether values is an EA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are calling .make_block from an EA block, don't we by definition know that we have extension block values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. e.g. if we call pd.isna(frame) that will end up with something like:

res_values = pd.isna(block.values)
return block.make_block(res_values)

which will be ndarray[bool]

@@ -244,6 +244,8 @@ def make_block(self, values, placement=None) -> "Block":
"""
if placement is None:
placement = self.mgr_locs
if self.is_extension:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my point is that is this necessary this (if), or are you saying that this is a perf issue? (e.g. we don't want to always call _block_shape

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM we are calling _block_shape in a bunch of places before calling ExtensionBlock.make_block. This is just moving all of those calls to one place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to always call _block_shape

With some corner-case-caveats, we only want to call it if self.is_extension, and we pretty much always have to call it if self.is_extension (otherwise we could use make_block_same_class

@jreback jreback merged commit 5f2cdf8 into pandas-dev:master Apr 10, 2020
@jreback
Copy link
Contributor

jreback commented Apr 10, 2020

thanks

@jbrockmendel jbrockmendel deleted the cln-block_shape branch April 10, 2020 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants