-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DataTree.coords.__setitem__
by adding DataTreeCoordinates
class
#9451
Changes from 33 commits
704db79
417e3e9
9562e92
0e7de82
839858f
9370b9b
9b50567
c466f8d
0397eca
85bb221
7802c63
1bf5082
e8620cf
1108504
0a7201b
51e11bc
7ecdd16
dfcdb6d
3278153
f672c5e
7fb1622
897b7c4
b5a56f4
fdae5bc
9dc845a
6595fe9
ed87554
c155bc1
217cb84
540bb0f
7126efa
8486227
12f24df
8f09c93
d23105f
978e05e
bd47575
8ef94df
10b8a78
b9ede22
540a825
b30d5e0
639ad07
0a9a328
80bc0bd
a366bf6
4d352bd
4626fa8
ea8a195
af94af4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,6 +36,7 @@ | |
from xarray.core.common import DataWithCoords | ||
from xarray.core.dataarray import DataArray | ||
from xarray.core.dataset import Dataset | ||
from xarray.core.datatree import DataTree | ||
|
||
# Used as the key corresponding to a DataArray's variable when converting | ||
# arbitrary DataArray objects to datasets | ||
|
@@ -197,12 +198,12 @@ class Coordinates(AbstractCoordinates): | |
|
||
Coordinates are either: | ||
|
||
- returned via the :py:attr:`Dataset.coords` and :py:attr:`DataArray.coords` | ||
properties | ||
- returned via the :py:attr:`Dataset.coords`, :py:attr:`DataArray.coords`, | ||
and :py:attr:`DataTree.coords` properties, | ||
- built from Pandas or other index objects | ||
(e.g., :py:meth:`Coordinates.from_pandas_multiindex`) | ||
(e.g., :py:meth:`Coordinates.from_pandas_multiindex`), | ||
- built directly from coordinate data and Xarray ``Index`` objects (beware that | ||
no consistency check is done on those inputs) | ||
no consistency check is done on those inputs), | ||
|
||
Parameters | ||
---------- | ||
|
@@ -771,14 +772,6 @@ def _drop_coords(self, coord_names): | |
del self._data._indexes[name] | ||
self._data._coord_names.difference_update(coord_names) | ||
|
||
def _drop_indexed_coords(self, coords_to_drop: set[Hashable]) -> None: | ||
assert self._data.xindexes is not None | ||
new_coords = drop_indexed_coords(coords_to_drop, self) | ||
for name in self._data._coord_names - new_coords._names: | ||
del self._data._variables[name] | ||
self._data._indexes = dict(new_coords.xindexes) | ||
self._data._coord_names.intersection_update(new_coords._names) | ||
|
||
def __delitem__(self, key: Hashable) -> None: | ||
if key in self: | ||
del self._data[key] | ||
|
@@ -796,6 +789,114 @@ def _ipython_key_completions_(self): | |
] | ||
|
||
|
||
class DataTreeCoordinates(Coordinates): | ||
""" | ||
Dictionary like container for coordinates of a DataTree node (variables + indexes). | ||
|
||
This collection can be passed directly to the :py:class:`~xarray.Dataset` | ||
and :py:class:`~xarray.DataArray` constructors via their `coords` argument. | ||
This will add both the coordinates variables and their index. | ||
""" | ||
|
||
# TODO: This only needs to be a separate class from `DatasetCoordinates` because DataTree nodes store their variables differently | ||
# internally than how Datasets do, see https://github.com/pydata/xarray/issues/9203. | ||
|
||
# TODO should inherited coordinates be here? It would be very hard to allow updating them... | ||
# But actually maybe the ChainMap approach would make this work okay?? | ||
|
||
_data: DataTree | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inheriting from
Mostly we have got away with these discrepancies so far but inheritance here is probably going to mean I need to either cc @headtr1ck There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually this wasn't very difficult to work around, it mostly just required adding a |
||
|
||
__slots__ = ("_data",) | ||
|
||
def __init__(self, datatree: DataTree): | ||
self._data = datatree | ||
|
||
@property | ||
def _names(self) -> set[Hashable]: | ||
return set(self._data._coord_variables) | ||
|
||
@property | ||
def dims(self) -> Frozen[Hashable, int]: | ||
# TODO is there a potential bug here? What happens if a dim is only present on data variables? | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return self._data.dims | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be fine. Dimensions on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Really? This (on In [1]: import xarray as xr
In [2]: ds = xr.Dataset({'a': ('x', [0, 1])})
In [3]: ds
Out[3]:
<xarray.Dataset> Size: 16B
Dimensions: (x: 2)
Dimensions without coordinates: x
Data variables:
a (x) int64 16B 0 1
In [4]: ds.coords
Out[4]:
Coordinates:
*empty*
In [5]: ds.coords.dims
Out[5]: FrozenMappingWarningOnValuesAccess({'x': 2}) I mean the fact no-one has raised this before means it probably isn't of much consequence, but it does seem incorrect / misleading. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Raised #9466 to track this |
||
|
||
@property | ||
def dtypes(self) -> Frozen[Hashable, np.dtype]: | ||
"""Mapping from coordinate names to dtypes. | ||
|
||
Cannot be modified directly, but is updated when adding new variables. | ||
|
||
See Also | ||
-------- | ||
Dataset.dtypes | ||
""" | ||
return Frozen({n: v.dtype for n, v in self._data._coord_variables.items()}) | ||
|
||
@property | ||
def variables(self) -> Mapping[Hashable, Variable]: | ||
return Frozen(self._data._coord_variables) | ||
|
||
def __getitem__(self, key: Hashable) -> DataArray: | ||
if key not in self._data._coord_variables: | ||
raise KeyError(key) | ||
item = self._data[key] # type: ignore[index] # see https://github.com/pydata/xarray/issues/8836 | ||
|
||
# TODO perhaps instead use an internal `DataTree` getter method which always returns DataArrays here? | ||
return cast(T_DataArray, item) | ||
|
||
def to_dataset(self) -> Dataset: | ||
"""Convert these coordinates into a new Dataset""" | ||
return self._data.to_dataset()._copy_listed(self._names) | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def _update_coords( | ||
self, coords: dict[Hashable, Variable], indexes: Mapping[Any, Index] | ||
) -> None: | ||
shoyer marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# create updated node (`.to_dataset` makes a copy so this doesn't modify in-place) | ||
node_ds = self._data.to_dataset(inherited=False) | ||
node_ds.coords._update_coords(coords, indexes) | ||
|
||
from xarray.core.datatree import _check_alignment | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# check consistency *before* modifying anything in-place | ||
# TODO can we clean up the signature of _check_alignment to make this less awkward? | ||
if self._data.parent is not None: | ||
parent_ds = self._data.parent._to_dataset_view(rebuild_dims=False) | ||
else: | ||
parent_ds = None | ||
_check_alignment(self._data.path, node_ds, parent_ds, self._data.children) | ||
|
||
# assign updated attributes | ||
coord_variables = { | ||
k: v for k, v in node_ds.variables.items() if k in node_ds._coord_names | ||
} | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self._data._node_coord_variables = coord_variables | ||
self._data._node_dims = node_ds._dims | ||
self._data._node_indexes = node_ds._indexes | ||
|
||
def _drop_coords(self, coord_names): | ||
# should drop indexed coordinates only | ||
for name in coord_names: | ||
del self._data._node_coord_variables[name] | ||
del self._data._node_indexes[name] | ||
|
||
def __delitem__(self, key: Hashable) -> None: | ||
if key in self: | ||
del self._data[key] # type: ignore[arg-type] # see https://github.com/pydata/xarray/issues/8836 | ||
else: | ||
raise KeyError( | ||
f"{key!r} is not in coordinate variables {tuple(self.keys())}" | ||
) | ||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def _ipython_key_completions_(self): | ||
"""Provide method for the key-autocompletions in IPython.""" | ||
return [ | ||
key | ||
for key in self._data._ipython_key_completions_() | ||
if key in self._data._coord_variables | ||
] | ||
|
||
|
||
class DataArrayCoordinates(Coordinates, Generic[T_DataArray]): | ||
"""Dictionary like container for DataArray coordinates (variables + indexes). | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see an easy way to make assignment of coordinates to parent nodes work. I think it would be better just to have
dt.coords
constructed from inherited coords, but not allow setting coordinates further up the tree. I believe that's consistent with the behaviour we currently have fordt.__setitem__
anyway.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!