-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EA: implement+test EA.view #27633
EA: implement+test EA.view #27633
Conversation
pandas/core/arrays/base.py
Outdated
------- | ||
ExtensionArray | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been inconsistent about this, be in general I've tried to keep docstrings user-facing. For implementation notes I've just used regular comments.
What are the consequences of .view
returning self?
Do we have any restrictions on this being zero-copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any restrictions on this being zero-copy?
The test added in this PR will fail if we don't get an actual view.
What are the consequences of
.view
returning self?
view
is going to be used by the default implementation of reshape
, so returning self would cause all kinds of trouble.
The default implementation should Just Work as long as self[:]
returns a view, which should be the case anyway (JSONArray ATM returns a copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert this to normal comments as Tom mentioned?
SGTM.
…On Mon, Jul 29, 2019 at 2:12 PM jbrockmendel ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/core/arrays/base.py
<#27633 (comment)>:
> @@ -867,6 +867,24 @@ def copy(self) -> ABCExtensionArray:
"""
raise AbstractMethodError(self)
+ def view(self, dtype=None) -> ABCExtensionArray:
+ """
+ Return a view on the array.
+
+ Returns
+ -------
+ ExtensionArray
+
+ Notes
Do we have any restrictions on this being zero-copy?
The test added in this PR will fail if we don't get an actual view.
What are the consequences of .view returning self?
view is going to be used by the default implementation of reshape, so
returning self would cause all kinds of trouble.
The default implementation should Just Work as long as self[:] returns a
view, which should be the case anyway (JSONArray ATM returns a *copy*)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#27633?email_source=notifications&email_token=AAKAOIQSXEZTI25OC3NMILDQB46K5A5CNFSM4IHOGMHKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB74RX5Y#discussion_r308392321>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOISH6L6AHLGM6ZNIXLLQB46K5ANCNFSM4IHOGMHA>
.
|
Can you be a bit more specific about this "all kings of trouble" ? |
@@ -1773,9 +1766,10 @@ def view(self): | |||
Returns | |||
------- | |||
view : Categorical | |||
Returns `self`! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the doc-string (or just inherit it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it)
|
||
def test_view(self, data): | ||
# view with no dtype should return a shallow copy, *not* the same | ||
# object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need to also test with a dtype != None? (e.g. that this raises NIE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to make sure the kwarg is accepted, sure
would fail |
Why would that fail if view would return self instead of a new object with a view on the same data? |
I could have been clearer: the default reshape is going to look like:
So returning |
@@ -1773,9 +1766,10 @@ def view(self): | |||
Returns | |||
------- | |||
view : Categorical | |||
Returns `self`! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above, dtype needs to be added (but best to just remove the doc-string completely and inherit it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ex the comment on the test. ping on green.
test simplified, green |
pandas/core/arrays/base.py
Outdated
@@ -354,7 +355,7 @@ def ndim(self) -> int: | |||
""" | |||
Extension Arrays are only allowed to be 1-dimensional. | |||
""" | |||
return 1 | |||
return len(self.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this change will help. There can be EAs out there that define this themselves and override this with a hardcoded 1, so we will still need to define a wrapper I think?
(so therefore I would maybe rather leave it as is, to ensure we cover that use case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure I understand the problem here. is there a case in which this wont be correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed you made this change because for the other PR you are patching self.shape
to return (N, 1) or (1, N), and so with this change, ndim automatically follows that.
But in general, you can't rely on the fact that self.ndim already is correctly following self.shape, so you will always have to patch ndim as well.
(exact terminology of "patching" might not be fully reflect to other PR on 2D EAs, need to update myself on that, but hope to give the idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert so we'll handle it in the next pass
pandas/core/arrays/base.py
Outdated
------- | ||
ExtensionArray | ||
|
||
Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert this to normal comments as Tom mentioned?
def view(self, dtype=None): | ||
if dtype is not None: | ||
raise NotImplementedError(dtype) | ||
return self._constructor(values=self._codes, dtype=self.dtype, fastpath=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default implementation does not work here? (or is this more efficient?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more efficient, yes (note the fastpath kwarg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note for that? (eg "override base implementation to use fastpath")
With the specified `dtype`. | ||
""" | ||
if dtype is None or dtype is self.dtype: | ||
return type(self)(self._data, dtype=self.dtype) | ||
return self._data.view(dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not returning an EA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, the current implementation is only used to return an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But so that is "violating" the spec? (it should return a new EA (not self), but not an ndarray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but its already in place and used extensively. I guess we could alter the spec to allow returning ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could alter the spec to allow returning ndarray
Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no?
Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency?
lgtm. ping on resolution of @jorisvandenbossche comments. |
@jorisvandenbossche i think ive addressed your comments. let me know if i missed anything |
pandas/core/arrays/base.py
Outdated
|
||
Returns | ||
------- | ||
ExtensionArray or np.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not document this. For implementors, it is simply an EA that it should be. I think our own array not doing can be seen as an historical artifact that ideally will be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
With the specified `dtype`. | ||
""" | ||
if dtype is None or dtype is self.dtype: | ||
return type(self)(self._data, dtype=self.dtype) | ||
return self._data.view(dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could alter the spec to allow returning ndarray
Sorry to further bother on this PR, but I would not alter the spec. For the interface, it should just be a new EA of the same type, no?
Shouldn't we (ideally, at some point) change our own implementation to return an EA as well for consistency?
@jbrockmendel can you also answer to my comment / questions? (and not just follow it; I would like to have discussion about this) |
The remaining question/comment I see is about DTA/TDA/PA view sometimes returning ndarray and the possibility of making it conform by always returning EA. I like that idea eventually, but we're a ways away from that in terms of having PandasArray support throughout the codebase. In a number of places where we currently do DTA.view, we expect to get an ndarray back. |
@@ -862,6 +863,27 @@ def copy(self) -> ABCExtensionArray: | |||
""" | |||
raise AbstractMethodError(self) | |||
|
|||
def view(self, dtype=None) -> Union[ABCExtensionArray, np.ndarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would this return a np.ndarray OR an EA for an EA? (is this @jorisvandenbossche question)?
when / why would this be the case? this is pretty confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the same as Joris's question. The answer is that it is probably better to return EA-only, but ATM DTA/TDA/PA have existing implementations that return ndarray
But I don't think we need to wait on all-EA to have EA.view to consistently return an EA? For those cases where you expect an array, you can explicitly access the numpy array and take a view of that (or in the different order)? Eg one case where it is called is pandas/pandas/core/arrays/datetimes.py Line 222 in 9c37226
We can use |
@jorisvandenbossche I'll try this locally and see what it would take to make make it EA-only. |
Yah this causes 230 test failures. I'm on board with the idea of fixing these, bout would like to do so in a separate PR(s) so as to keep momentum here. |
It might be that they are all coming from a rather limited number of call sites (the number of failures does not always indicate the number of lines to change to fix it ;))
That's fine for me (although I would prefer to see it done before the next release, as this is kind-of public API). Do you open an issue for it? |
Skimming through the test output, all I can confidently state is that the number of relevant call sites is less than 230. |
@jorisvandenbossche opened #27831 to change DTA.view signature. Anything else? |
Nope, all good! |
Broken off from #27142, plus some type annotations