-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Rework DataArray internals #648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@shoyer - I read this mainly trying to get a better idea of the internal DataArray data model. The code itself looks great. My main two comments on the refactor are:
All in all, impressive work. |
Indeed, I wonder if it would make sense to decouple DataArray from Dataset by storing the state on two (protected) attributes:
The main downside is that we add a bit more redundant code (e.g., to loop over all variables in |
As a newbie, 👍. I took some time to figure out why a
Low confidence, but you could have a common ancestor ( |
@@ -233,11 +242,13 @@ def subset(dim, label): | |||
del coords[dim] | |||
return Dataset(variables, coords, self.attrs) | |||
|
|||
def _to_dataset_whole(self, name): | |||
def _to_dataset_whole(self, name=None): | |||
if name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but name = name or self.name
might be clearer / more pythonic than the repeated if name is None:
@shoyer - do you have a feel for how difficult it would be to go the |
Hmm. Might not be so bad now that I've already gone through the trouble of thinking what these new tests should look like. I'll give it a shot tonight and see how it goes... |
I realize now that changing the internal representation for DataArray doesn't mean we need to rewrite how every routine works. We can still convert dataarrays to a dataset when convenient -- it just means we'll need to use a method to do so instead of modifying def copy(self, deep=True):
ds = self._dataset.copy(deep=deep)
return self._with_replaced_dataset(ds) and instead we could simply write: def copy(self, deep=True):
ds = self._to_temp_dataset().copy(deep=deep)
return self._new_from_temp_dataset(ds) However, going forward it will give us more flexibility for how to write DataArray methods. For example, it might actually be clearer to write: def copy(self, deep=True):
variable = self.variable.copy(deep=deep)
coords = OrderedDict((k, v.copy(deep=deep))
for k, v in self._coords.items())
return type(self)(variable, coords, name=name, fastpath=True) |
OK, latest commit changes DataArray's internals to rely on |
0aeea33
to
edea054
Compare
0e9b656
to
96eeb13
Compare
This is ready for review if anyone wants to take another look. |
dims = OrderedDict(zip(self.dims, self.shape)) | ||
return self._dataset._to_dataframe(dims) | ||
unique_name = '__unique_name_identifier_z98xfz98xugfg73ho__' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- needed explanation. Let me know if the comments I added help.
@shoyer - I don't have any more inline comments. There is one failing test and there are merge conflicts, once those are addressed, I'll take one more brief look. |
5325ff9
to
c00a72b
Compare
Fixes GH367 Fixes GH634 The internal data model used by :py:class:`~xray.DataArray` has been rewritten to fix several outstanding issues (:issue:`367`, :issue:`634`, `this stackoverflow report`_). Internally, ``DataArray`` is now implemented in terms of ``._variable`` and ``._coords`` attributes instead of holding variables in a ``Dataset`` object.
24b90c3
to
f368046
Compare
Rebased and tests are passing. |
lgtm, go ahead and merge. |
👏 |
Fixes #367
Fixes #634
Fixes #649
The internal data model used by
DataArray
has been rewritten to fix several outstanding issues (#367, #634 and this stackoverflow report). Namely, if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.This means that creating a DataArray with the same
name
as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here's the old behavior:and the new behavior (compare the values of the
x
coordinate):It's also no longer possible to convert a DataArray to a Dataset with
DataArray.to_dataset
if it is unnamed. This will now raiseValueError
. If the array is unnamed, you need to supply thename
argument.