How to treat name of root node? #81
Description
In #76 I refactored the tree structure to use a path-like syntax. This includes referring to the root of a tree as "/"
, same as in cd /
in a unix-like filesystem.
This makes accessing nodes and variables of nodes quite neat, because you can reference nodes via absolute or relative paths:
In [23]: from datatree.tests.test_datatree import create_test_datatree
In [24]: dt = create_test_datatree()
In [25]: dt['set2/a']
Out[25]:
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x
In [26]: dt['/set2/a']
Out[26]:
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x
In [27]: dt['./set2/a']
Out[27]:
<xarray.DataArray 'a' (x: 2)>
array([2, 3])
Dimensions without coordinates: x
This refactor also made DataTree objects only optionally have a name, as opposed to be before when they were required to have a name. (They still have a .name
attribute now, it just can be None
.)
In [28]: dt.name
Normally this doesn't matter, because when assigned a .parent
a node's .name
property will just point to the key under which it is stored as a child. This echoes the way an unnamed DataArray
can be stored in a Dataset
.
In [29]: import xarray as xr
In [30]: ds = xr.Dataset()
In [31]: da = xr.DataArray(0)
In [32]: ds['foo'] = da
In [33]: ds['foo'].name
Out[33]: 'foo'
However this means that the root node of a tree is no longer required to have a name in general.
This is good because
-
As a user you normally don't care about the name of the root when manipulating the tree, only the names of the nodes,
-
It makes the
__init__
signature simpler asname
is no longer a required arg, -
It most closely echoes how filepaths work (the filesystem root
"/"
doesn't have another name), -
Roundtripping from Zarr/netCDF files still seems to work (see
test_io.py
), -
Roundtripping from dictionaries still works if the root node is unnamed
In [35]: d = {node.path: node.ds for node in dt.subtree} In [36]: roundtrip = DataTree.from_dict(d) In [37]: roundtrip Out[37]: DataTree('None', parent=None) │ Dimensions: (y: 3, x: 2) │ Dimensions without coordinates: y, x │ Data variables: │ a (y) int64 6 7 8 │ set0 (x) int64 9 10 ├── DataTree('set1') │ │ Dimensions: () │ │ Data variables: │ │ a int64 0 │ │ b int64 1 │ ├── DataTree('set1') │ └── DataTree('set2') ├── DataTree('set2') │ │ Dimensions: (x: 2) │ │ Dimensions without coordinates: x │ │ Data variables: │ │ a (x) int64 2 3 │ │ b (x) float64 0.1 0.2 │ └── DataTree('set1') └── DataTree('set3') In [38]: dt.equals(roundtrip) Out[38]: True
But it's bad because
-
Roundtripping from dictionaries doesn't work anymore if the root node is named
In [39]: dt2 = dt In [40]: dt2.name = "root" In [41]: d2 = {node.path: node.ds for node in dt2.subtree} In [42]: roundtrip2 = DataTree.from_dict(d2) In [43]: roundtrip2 Out[43]: DataTree('None', parent=None) │ Dimensions: (y: 3, x: 2) │ Dimensions without coordinates: y, x │ Data variables: │ a (y) int64 6 7 8 │ set0 (x) int64 9 10 ├── DataTree('set1') │ │ Dimensions: () │ │ Data variables: │ │ a int64 0 │ │ b int64 1 │ ├── DataTree('set1') │ └── DataTree('set2') ├── DataTree('set2') │ │ Dimensions: (x: 2) │ │ Dimensions without coordinates: x │ │ Data variables: │ │ a (x) int64 2 3 │ │ b (x) float64 0.1 0.2 │ └── DataTree('set1') └── DataTree('set3') In [44]: dt2.equals(roundtrip2) Out[44]: False
-
The signature of the
DataTree.from_dict
becomes a bit weird because if you want to name the root node the only way to do it is to pass a separatename
argument, i.e.In [45]: dt3 = DataTree.from_dict(d, name='root') In [46]: dt3 Out[46]: DataTree('root', parent=None) ├── DataTree('set1') │ │ Dimensions: () │ │ Data variables: │ │ a int64 0 │ │ b int64 1 │ ├── DataTree('set1') │ └── DataTree('set2') ├── DataTree('set2') │ │ Dimensions: (x: 2) │ │ Dimensions without coordinates: x │ │ Data variables: │ │ a (x) int64 2 3 │ │ b (x) float64 0.1 0.2 │ └── DataTree('set1') └── DataTree('set3')
What do we think about this behaviour? Does this seem like a good design, or annoyingly finicky?
@jhamman I notice that in the code you wrote for the io you put a note about not being able to specify a root group for the tree. Is that related to this question? Do you have any other thoughts on this?