Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

map over multiple subtrees #29

Closed
TomNicholas opened this issue Aug 26, 2021 · 1 comment · Fixed by #32
Closed

map over multiple subtrees #29

TomNicholas opened this issue Aug 26, 2021 · 1 comment · Fixed by #32
Labels
enhancement New feature or request

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Aug 26, 2021

I realised that part of the reason that arithmetic (#24) and ufuncs (#25) don't yet work is because the map_over_subtree decorator currently only maps over a single subtree.

This works fine for mapping unary functions such as .isel, because they only accept one tree-like argument (i.e. self for the .isel method). However for any type of binary function such as add(dt1, dt2) then pairs of respective nodes in each tree need to be operated on together, as result_ds = add(dt1[node].ds, dt2[node].ds), before the output tree is built up from the results.

In the most general case we need to be able to map functions like

def func(*args, **kwargs)
    # do stuff involving multiple Dataset objects
    return output_trees

where any number of the args and kwargs could be DataTrees, and output_trees could be a list of any number of DataTrees.

To implement this the map_over_subtree decorator has to become a lot more general. It needs to

  1. Identify which of args and kwargs are DataTree objects,
  2. Check that all of those trees are isomorphic to one another, (EDIT: this was implemented in Check isomorphism #31)
  3. Walk along the nodes of all N trees simultaneously,
  4. Pass the respective N nodes from that position in each tree to func, as Datasets, without losing their position in *args, **kwargs,
  5. Use the M output Datasets from func to rebuild M DataTree objects (which all have the same structure as the input trees), and return them.

We therefore have to decide what we mean by "isomorphic". The strictest definition would be that all node names are the same, so that

dt_1:
DataNode('foo')
|   Data A
+---DataNode('bar')
    +   Data B

could be mapped alongside

dt_2:
DataNode('foo')
|   Data C
+---DataNode('bar')
    +   Data D

but not alongside

dt_3:
DataNode('baz')
|   Data C
+---DataNode('woz')
    +   Data D

A more lenient definition would be that each node's ordered set of children must each have the same number of children as it's counterpart in the other tree. (In other words the tree structure must be the same, but the node names need not be. This requires the children to be ordered to avoid ambiguities.) This definition would allow dt_3 to be mapped over alongside dt_1 or dt_2 (or both simultaneously for a func that accepts 3 Dataset arguments).

@TomNicholas TomNicholas added the enhancement New feature or request label Aug 26, 2021
This was referenced Aug 27, 2021
@TomNicholas
Copy link
Member Author

Closed via #32

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant