-
Notifications
You must be signed in to change notification settings - Fork 41
Tree "broadcasting" #199
Comments
Motivation:When using datatree objects for analysis of hierarchical data, one sometimes wants to apply operations to only parts of a tree, subtrees, or all trees matching a condition. These operations can be achieved in a granular fashion using @jbusecke's work using datatree to analyse CMIP6 models provides multiple examples of use cases, for example calculating climate anomaly by subtracting a model-specific historical bias from an ensemble of many models, scenarios, and parameters. Current behaviour:Currently I made tree-tree operations error in all but a couple of cases.
When you multiply (I'll use multiplication as a stand-in for any binary operation) a tree by a dataset then currently the dataset is used to multiply every single non-empty node in the tree. The same thing also happens with a scalar or single array, which I think is intuitive. (Note that whilst this operation should be commutative, until #146 is fixed then
This currently multiplies the datasets contained in each node node-wise. This only makes sense if the two trees are isomorphic. Otherwise you don't have an obvious one-to-one correspondence between pairs of nodes across the two trees. This generalises to Constraints / requirements:
Non-goals:
|
Some useful concepts:
|
Ideas:I discussed this algorithm design problem at length with @jbusecke , @cmdupuis3, and my friends Peter (a graph theory math PhD student) and Galen (a sociologist who works with graphs). If anyone else has input it would be appreciated. We are still in progress, but we think we concluded a few things:
|
@shoyer suggested that as any automatic tree-broadcasting method should be composed of well-defined individual steps, it would be wise to expose public API for those steps first. Then we can see if (a) those are sufficient for people's needs, and (b) if users generally agree that tree broadcasting should work in one specific way, or whether opinions diverge. A specific example would be making functions for
|
I think that's similar to what I was saying, along the lines of separating your graph operations from your data operations. If we're just looking at the primitives (rather than going for "convenience"), we don't have to worry about what kind of algebra the user is expecting. You'd probably want those primitives for a more advanced API anyway. From that POV, a graph union would be simple also, you just combine all the nodes. From there, you could select what data operations you want (copying, etc.) |
Closing in favour of pydata/xarray#9596 upstream |
Currently you can perform arithmetic with datatrees, e.g.
dt + dt
. (In fact the current implementation lets you apply arbitrary operations on n trees that return 1 to n new trees, seemap_over_subtree
.)However currently these trees must have the same structure of nodes (i.e. be "isomorphic").
It would be useful to generalise tree operations to handle trees of different structure. I'm going to call this "tree broadcasting" (not to be confused with array broadcasting).
I think this is the biggest unsolved design question with datatree.
The text was updated successfully, but these errors were encountered: