Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatree: Bug with arithmetic between datasets and datatrees (e.g. ds * dt) #9365

Closed
owenlittlejohns opened this issue Aug 14, 2024 · 2 comments
Labels
bug topic-DataTree Related to the implementation of a DataTree class

Comments

@owenlittlejohns
Copy link
Contributor

What is your issue?

Originally posted by @TomNicholas in xarray-contrib/datatree#146

Arithmetic involving one DataTree and one Dataset is supposed to be node-wise, i.e. the operation involving the single Dataset is automatically applied to every node of the tree individually, returning a new tree, like this:

In [8]: ds = xr.Dataset({"a": 1})

In [9]: ds
Out[9]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    a        int64 1

In [10]: dt = DataTree(data=ds)

In [11]: dt * ds
Out[11]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        a        int64 1

However there is a bad bug:

In [12]: ds * dt
Out[12]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    a        int64 1

This didn't return a tree, so arithmetic between Datasets and DataTrees currently doesn't respect commutativity :(

Interestingly this does work fine for python scalars

In [27]: dt * 2
Out[27]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        a        int64 2

In [28]: 2 * dt
Out[28]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        a        int64 2

and for numpy arrays:

In [30]: dt * np.array(2)
Out[30]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        a        int64 2

In [31]: np.array(2) * dt
Out[31]: 
DataTree('None', parent=None)
    Dimensions:  ()
    Data variables:
        a        int64 2

I do have tests for this arithmetic behaviour but it looks like I didn't think to check commutativity in any of those tests!

I haven't looked into this deeply yet but my hunch is that it's something to do with __mul__/__rmul__ on Dataset competing for priority with __mul__/__rmul__ on DataTree. If that is the case then it might only be possible to fix it upstream in xarray by changing the behaviour of Dataset to defer to DataTree...

@owenlittlejohns owenlittlejohns added bug needs triage Issue that has not been reviewed by xarray team member topic-DataTree Related to the implementation of a DataTree class labels Aug 14, 2024
@TomNicholas TomNicholas removed the needs triage Issue that has not been reviewed by xarray team member label Aug 15, 2024
@TomNicholas
Copy link
Member

In the meeting last week @shoyer and I discussed just making this behaviour stricter, so that there was no automatic "broadcasting" of a single-node tree across a multi-node tree.

However, we could still allow DataTree * Dataset, even if we disallow DataTree<many-nodes> * DataTree<single-node>.

TomNicholas added a commit to TomNicholas/xarray that referenced this issue Oct 13, 2024
TomNicholas added a commit that referenced this issue Oct 15, 2024
* test unary op

* implement and generate unary ops

* test for unary op with inherited coordinates

* re-enable arithmetic tests

* implementation for binary ops

* test ds * dt commutativity

* ensure other types defer to DataTree, thus fixing #9365

* test for inplace binary op

* pseudocode implementation of inplace binary op, and xfail test

* remove some unneeded type: ignore comments

* return type should be DataTree

* type datatree ops as accepting dataset-compatible types too

* use same type hinting hack as Dataset does for __eq__ not being same as Mapping

* ignore return type

* add some methods to api docs

* don't try to import DataTree.astype in API docs

* test to check that single-node trees aren't broadcast

* return NotImplemented

* remove pseudocode for inplace binary ops

* map_over_subtree -> map_over_datasets
@TomNicholas
Copy link
Member

Closed by #9619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

2 participants