Store axes as `_ARRAY_DIMENSIONS` #166

sbesson · 2022-02-04T09:09:16Z

Noticed while reviewing glencoesoftware/bioformats2raw#121, the ome_zarr.writer API currently does not store the axes metadata as _ARRAY_DIMENSIONS under the .zattrs of each resolution array.

Since this is a requirement of the spec,introduced in 0.3 - see https://ngff.openmicroscopy.org/0.3/#multiscale-md, the relevant writer APIs should be updated before releasing 0.3.0.

The text was updated successfully, but these errors were encountered:

joshmoore · 2022-02-04T09:14:33Z

👍 but if I remember correctly the status of my testing showed that attempting to access the zgroup with xarray led to an exception. The workaround suggested by @aurghs and @alexamici was to use differently named axes per resolution level > 0.

see: zarr-developers/zarr-specs#125
cc: @constantinpape

sbesson · 2022-02-04T09:21:35Z

Cheers, I wasn't aware of the limitation and that's definitely something that should be tested with the writer addition.

If some the axes names needs to be resolution-aware in the _ARRAY_DIMENSIONS (I could imagine an implementation as simple as suffixing them with the resolution index i.e.x_1, x_2), I suspect this is something we should formalize in the specification sooner rather than later given that this metadata is a requirement.

constantinpape · 2022-02-04T10:05:53Z

+1 but if I remember correctly the status of my testing showed that attempting to access the zgroup with xarray led to an exception. The workaround suggested by @aurghs and @alexamici was to use differently named axes per resolution level > 0.

Is there a concrete issue for this? I couldn't find any mention of this problem in the one you linked (zarr-developers/zarr-specs#125).

will-moore · 2022-02-04T10:34:35Z

I'm still not entirely clear from the NGFF spec and the xarray page whether _ARRAY_DIMENSIONS goes on the top .zattrs alongside "multiscales" (and does xarray know how to find the arrays?) or does it go in the .zarray? It's not in any of the sample JSON on the spec page.
Ah - re-reading... does it go in a .zattrs (which we don't create currently) alongside the .zarray?

sbesson · 2022-02-04T10:36:04Z

Ah - re-reading... does it go in a .zattrs (which we don't create currently) alongside the .zarray?

Yes exactly, this metadata is expected to be defined in the array attributes

joshmoore · 2022-02-04T10:37:18Z

Is there a concrete issue for this?

Apologies, @constantinpape, no. This happened to date on the Zarr side with setting up the funding and getting B-Open up to speed.

sbesson · 2022-02-07T12:09:48Z

I was able to reproduce the error with the following local changes

(base) sbesson@ls30630:ome-zarr-py (xarray_investigation) $ git diff  --cached
diff --git a/.isort.cfg b/.isort.cfg
index b3d36cb..367ece7 100644
--- a/.isort.cfg
+++ b/.isort.cfg
@@ -1,5 +1,5 @@
 [settings]
-known_third_party = dask,numpy,pytest,scipy,setuptools,skimage,zarr
+known_third_party = dask,numpy,pytest,scipy,setuptools,skimage,xarray,zarr
 multi_line_output = 3
 include_trailing_comma = True
 force_grid_wrap = 0
diff --git a/ome_zarr/writer.py b/ome_zarr/writer.py
index 83bbe6d..497ed32 100644
--- a/ome_zarr/writer.py
+++ b/ome_zarr/writer.py
@@ -206,8 +206,9 @@ def write_multiscale(
     datasets: List[dict] = []
     for path, data in enumerate(pyramid):
         # TODO: chunks here could be different per layer
-        group.create_dataset(str(path), data=data, chunks=chunks)
+        dataset = group.create_dataset(str(path), data=data, chunks=chunks)
         datasets.append({"path": str(path)})
+        dataset.attrs["_ARRAY_DIMENSIONS"] = [x["name"] for x in axes]
 
     if coordinate_transformations is None:
         shapes = [data.shape for data in pyramid]
diff --git a/requirements/requirements-test.txt b/requirements/requirements-test.txt
index 9cf8484..cc5774c 100644
--- a/requirements/requirements-test.txt
+++ b/requirements/requirements-test.txt
@@ -1,3 +1,4 @@
 pytest
 pytest-cov
 codecov
+xarray
diff --git a/tests/test_xarray.py b/tests/test_xarray.py
new file mode 100644
index 0000000..187681f
--- /dev/null
+++ b/tests/test_xarray.py
@@ -0,0 +1,14 @@
+import pytest
+import xarray
+
+from ome_zarr.data import create_zarr
+
+
+class TestXarray:
+    @pytest.fixture(autouse=True)
+    def initdir(self, tmpdir):
+        self.path = tmpdir.mkdir("data")
+        create_zarr(str(self.path))
+
+    def test_open_zarr(self):
+        xarray.open_zarr(str(self.path))

Running tox -e py38 tests/test_xarray.py fails with ValueError: conflicting sizes for dimension 'y': length 812 on '1' and length 1624 on {'y': '0', 'x': '0'}

The workaround mentioned in #166 (comment) i.e. suffixing the axes name using the resolution level is sufficient to let this simple test pass:

diff --git a/ome_zarr/writer.py b/ome_zarr/writer.py
index 497ed32..c5438a3 100644
--- a/ome_zarr/writer.py
+++ b/ome_zarr/writer.py
@@ -208,7 +208,7 @@ def write_multiscale(
         # TODO: chunks here could be different per layer
         dataset = group.create_dataset(str(path), data=data, chunks=chunks)
         datasets.append({"path": str(path)})
-        dataset.attrs["_ARRAY_DIMENSIONS"] = [x["name"] for x in axes]
+        dataset.attrs["_ARRAY_DIMENSIONS"] = ["%s_%s" %(x["name"], path) for x in axes]
 
     if coordinate_transformations is None:
         shapes = [data.shape for data in pyramid]

Discussing briefly with @joshmoore, I feel like we need to make a decision before we release OME-NGFF 0.4 and implement the feature across libraries. I can see three viable options:

keep the specification as it currently stand i.e. https://github.com/ome/ngff/blob/12acbb69dba91da09f162eb806c59161601dc1c0/latest/index.bs#L267-L269, implement this feature across the board and file this as an xarray bug which needs to be upstream to support this metadata-style
update the specify to mandate an alternate form like the one proposed above for storing xarray-compatible metadata in multiscales OME-NGFF datasets and update the implementations accordingly
keep investigating options and solutions but reduce the the xarray-compatibility requirement from MUST to either SHOULD or even MAY level.

/cc @aurghs @alexamici @constantinpape happy to discuss on the image.sc zulip or get on a call if that would help resolving this outstanding issue.

constantinpape · 2022-02-07T12:16:59Z

@joshmoore @sbesson I am still missing some context here about what the actual problem is. Could you please give me a few line TLDR?

constantinpape · 2022-02-07T12:29:20Z

After some discussions on zulip: let's remove _ARRAY_DIMENSIONS for now and follow up with xarray to see how we can achieve compatability.

sbesson · 2022-02-08T09:08:30Z

Closing for now, the requirement has been removed from the 0.4 specification. Various layout currently allow to use xarray together with OME Zarr datasets. Hopefully the discussion in ome/ngff#48 will converge towards a set of recommendations that can be reintroduced later allowing compatibillity.

alexamici · 2022-02-08T16:15:03Z

Sorry for joining the party late. @aurghs and I are still getting up to speed with the microscopy data formats, we come from geospatial / satellite (GeoTIFF, HDF5, etc) and climate / scientific (netCDF / CF Conventions, GRIB, etc).

As identified in our first call with @joshmoore the main issues with in the interaction between the NGFF spec and Xarray is that Xarray Zarr backend assume a few structural features for the Zarr store that make it similar to a netCDF, and the NGFF appears to break some of those assumptions.

As soon as we feel comfortable with the microscopy data format we will engage the community more.

constantinpape mentioned this issue Feb 7, 2022

Compatibility with xarray ome/ngff#48

Open

sbesson mentioned this issue Feb 7, 2022

Remove requirement for xarray-compatible metadata ome/ngff#96

Merged

sbesson added the invalid This doesn't seem right label Feb 7, 2022

sbesson closed this as completed Feb 8, 2022

This was referenced Oct 11, 2023

Include _ARRAY_DIMENSIONS metadata? fractal-analytics-platform/fractal-tasks-core#563

Closed

Clarification about _ARRAY_DIMENSIONS ome/ngff#217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store axes as `_ARRAY_DIMENSIONS` #166

Store axes as `_ARRAY_DIMENSIONS` #166

sbesson commented Feb 4, 2022

joshmoore commented Feb 4, 2022

sbesson commented Feb 4, 2022 •

edited

Loading

constantinpape commented Feb 4, 2022

will-moore commented Feb 4, 2022

sbesson commented Feb 4, 2022

joshmoore commented Feb 4, 2022

sbesson commented Feb 7, 2022

constantinpape commented Feb 7, 2022

constantinpape commented Feb 7, 2022

sbesson commented Feb 8, 2022

alexamici commented Feb 8, 2022

Store axes as _ARRAY_DIMENSIONS #166

Store axes as _ARRAY_DIMENSIONS #166

Comments

sbesson commented Feb 4, 2022

joshmoore commented Feb 4, 2022

sbesson commented Feb 4, 2022 • edited Loading

constantinpape commented Feb 4, 2022

will-moore commented Feb 4, 2022

sbesson commented Feb 4, 2022

joshmoore commented Feb 4, 2022

sbesson commented Feb 7, 2022

constantinpape commented Feb 7, 2022

constantinpape commented Feb 7, 2022

sbesson commented Feb 8, 2022

alexamici commented Feb 8, 2022

Store axes as `_ARRAY_DIMENSIONS` #166

Store axes as `_ARRAY_DIMENSIONS` #166

sbesson commented Feb 4, 2022 •

edited

Loading