-
Notifications
You must be signed in to change notification settings - Fork 41
map over multiple subtrees #29
Description
I realised that part of the reason that arithmetic (#24) and ufuncs (#25) don't yet work is because the map_over_subtree decorator currently only maps over a single subtree.
This works fine for mapping unary functions such as .isel, because they only accept one tree-like argument (i.e. self for the .isel method). However for any type of binary function such as add(dt1, dt2) then pairs of respective nodes in each tree need to be operated on together, as result_ds = add(dt1[node].ds, dt2[node].ds), before the output tree is built up from the results.
In the most general case we need to be able to map functions like
def func(*args, **kwargs)
# do stuff involving multiple Dataset objects
return output_treeswhere any number of the args and kwargs could be DataTrees, and output_trees could be a list of any number of DataTrees.
To implement this the map_over_subtree decorator has to become a lot more general. It needs to
- Identify which of
argsandkwargsare DataTree objects, - Check that all of those trees are isomorphic to one another, (EDIT: this was implemented in Check isomorphism #31)
- Walk along the nodes of all N trees simultaneously,
- Pass the respective N nodes from that position in each tree to
func, as Datasets, without losing their position in*args,**kwargs, - Use the M output Datasets from
functo rebuild M DataTree objects (which all have the same structure as the input trees), and return them.
We therefore have to decide what we mean by "isomorphic". The strictest definition would be that all node names are the same, so that
dt_1:
DataNode('foo')
| Data A
+---DataNode('bar')
+ Data B
could be mapped alongside
dt_2:
DataNode('foo')
| Data C
+---DataNode('bar')
+ Data D
but not alongside
dt_3:
DataNode('baz')
| Data C
+---DataNode('woz')
+ Data D
A more lenient definition would be that each node's ordered set of children must each have the same number of children as it's counterpart in the other tree. (In other words the tree structure must be the same, but the node names need not be. This requires the children to be ordered to avoid ambiguities.) This definition would allow dt_3 to be mapped over alongside dt_1 or dt_2 (or both simultaneously for a func that accepts 3 Dataset arguments).