Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiscale class not being inherited correctly #80

Closed
giovp opened this issue Aug 16, 2023 · 13 comments
Closed

multiscale class not being inherited correctly #80

giovp opened this issue Aug 16, 2023 · 13 comments

Comments

@giovp
Copy link
Contributor

giovp commented Aug 16, 2023

There are several operations that do not return a MultiscaleSpatialImage class but a datatree.DataTree. e.g.

import numpy as np
from spatial_image import to_spatial_image
from multiscale_spatial_image import to_multiscale
from copy import deepcopy

array = np.random.randint(0, 256, size=(128,128), dtype=np.uint8)
image = to_spatial_image(array)
multiscale = to_multiscale(image, [2, 4])


deepcopy(multiscale)
>>> datatree.DataTree
multiscale.copy()
>>> datatree.DataTree
multiscale.copy(deep=True)
>>> datatree.DataTree
multiscale.mean(dim="x")
>>> datatree.DataTree
multiscale.sel(x=10, method="nearest")
>>> datatree.DataTree

I've been playing around with subclassing differently DataTree without success. I wonder what's the best solution for this? I think it should either always return a MultiscaleSpatialImage or always a DataTree (and MultiscaleSpatialImage would then be a parser class, but IMHO SpatialImage should then be consistent, and it doesn't suffer from such issues).

xref scverse/spatialdata#269

@thewtex
Copy link
Contributor

thewtex commented Oct 4, 2023

@TomNicholas what do you think?

@TomNicholas
Copy link

Hi, thanks for making me aware of what you're doing here!

So DataTree was not intended to be subclassed. In fact, we currently recommend against subclassing xarray objects in general, and this this recommendation really applies to datatree too because it follows similar patterns internally.

It's worth asking why you are subclassing? Looking at the definition of MultiscaleSpatialImage here, it looks like MultiscaleSpatialImage only has one extra method added to DataTree. Adding extra methods is what xarray accessors allow you to do (without subclassing), and datatree does support those accessors.

If you really want to subclass it has been done, and if you want to improve the experience of subclassing xarray objects we would love someone to take that project on upstream.

@thewtex
Copy link
Contributor

thewtex commented Oct 9, 2023

@TomNicholas thank you for the thoughts and pointers!

What we are looking for is:

  1. the type identification (MultiscaleSpatialImage class)
  2. the to_zarr functionality
  3. possible additional functionality, attributes in the future

From what you shared it sounds like an accessor would be better, and appropriate, addressing scverse/spatialdata#269, etc.? Will 1. the type identification, still be available? Through copies, etc.? Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')? @giovp @LucaMarconato will this work for type identification in spatialdata?

@LucaMarconato
Copy link
Collaborator

Thanks for the discussion. I think this would be a good solution and it would be compatible with the type identification used in spatialdata. In fact in our case we can replace all the MultiscaleSpatialImage with DataTree and eventually all the SpatialImage with DataArray, so as long as we can still call validate() on the objects, everything should work.

One note, I find a bit inconsistent that the scales of a MultiscaleSpatialImage object are DataArray and not SpatialImage. Also a user reported this. I think that the considerations on inheriting xarray classes should also apply to SpatialImage, and therefore I would consider using DataArray with accessors.

@giovp
Copy link
Contributor Author

giovp commented Oct 10, 2023

thanks all for the discussion, agree with @LucaMarconato that type consistency across spatial-image project would be useful, seems like an accessor could be the best solution.

Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')

yes indeed, I think we might end up refactor the models to infer type based on data type + number/type of dimensions.

@TomNicholas
Copy link

Will 1. the type identification, still be available? Through copies, etc.? Then type identification is isinstance(i, DataTree) and hasattr(i, 'msi')?

@thewtex no I don't think this would work - when the accessor is registered it becomes available on all DataTree objects, so hasattr(i, 'msi') would not distinguish anything useful.

In fact in our case we can replace all the MultiscaleSpatialImage with DataTree and eventually all the SpatialImage with DataArray, so as long as we can still call validate() on the objects, everything should work.

@LucaMarconato this seems to line up with what's currently recommended. 👍

infer type based on data type + number/type of dimensions.

This would of course still work. (You wouldn't be technically inferring "Type", but you could use it to distinguish valid from invalid datasets.)

@thewtex
Copy link
Contributor

thewtex commented Oct 11, 2023

@LucaMarconato @TomNicholas excellent, thank you for the thoughts! Let's migrate to an accessor, then. @giovp already has a draft in #81 -- please take a look. 🎇 🚀 🤘

I think that the considerations on inheriting xarray classes should also apply to SpatialImage, and therefore I would consider using DataArray with accessors.

👍 agreed.

@aeisenbarth
Copy link

This issue occurs currently also when using pytest fixtures.

Pytest passes the return value of a fixture function (1) directly to test functions using the fixture. However, when another fixture (2) depends on fixture (1), it receives a deepcopy, where multiscale images are suddenly datatrees, and fail dispatching in overloaded functions.

In case someone else encounters this, here my workaround that I apply in the test function on the fixture argument:

def workaround_pytest_fixture_breaking_multiscale_spatial_image(sdata: SpatialData) -> SpatialData:
    # When a fixture depends on another fixture which returns a MultiscaleSpatialImage instance,
    # the first fixture receives a DataTree instead. The object looses the ability to be recognized
    # by isinstance(obj, MultiscaleSpatialImage).
    # Maybe this is caused by a copy/deepcopy operation.
    for name, image in sdata.images.items():
        if isinstance(image, DataTree) and not isinstance(image, MultiscaleSpatialImage):
            repaired_image = MultiscaleSpatialImage.from_dict(image.to_dict())
            sdata.images[name] = repaired_image
    for name, labels in sdata.labels.items():
        if isinstance(labels, DataTree) and not isinstance(labels, MultiscaleSpatialImage):
            repaired_labels = MultiscaleSpatialImage.from_dict(labels.to_dict())
            sdata.labels[name] = repaired_labels
    return sdata

@thewtex
Copy link
Contributor

thewtex commented Jan 2, 2024

@aeisenbarth thanks for the report 👍

In your use case, is multiscale-spatial-image 1.0.0 used? It incorporates #81 , which changes the inheritance and instance identification as described at the top of the PR.

Regardless, we should probably also have tests for the copy/deepcopy operation.

@aeisenbarth
Copy link

I just updated to multiscale-spatial-image 1.0.0, but the issue persists with latest spatialdata main branch.
Image2DModel.parse(…, scale_factors=…) calls to_multiscale and passes the DataTree (with new msi attribute) to spatialdata.transformations._utils:_get_transformations, but that can only dispatch on MultiscaleSpatialImage and does not handle DataTree. I believe just the dispatching needs to be updated everywhere in spatialdata.

@giovp
Copy link
Contributor Author

giovp commented Jan 3, 2024

hi @aeisenbarth , indeed we haven't completed the migration to msi 1.0.0 yet (unfortunately). From your message I thought it was a behaviour of the pytest fixture but I haven't investigated further.

@giovp
Copy link
Contributor Author

giovp commented Jul 4, 2024

hi @aeisenbarth I think the latest sptialdata (on pypi) supports this (thanks to scverse/spatialdata#594 ). I will close this for now but feel free to reopen!

@giovp giovp closed this as completed Jul 4, 2024
@LucaMarconato
Copy link
Collaborator

Also @aeisenbarth I have implemented a deepcopy operation that works for all the spatial elements and handle both data and metadata, you can use it with from spatialdata import deepcopy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants