bioformats2raw format change #403

manzt · 2021-03-30T22:22:07Z

Bioformats2Raw is changing to nested storage as well as moving METADATA.xml.

ngff spec 0.2: default to nested storage: ngff spec 0.2: default to nested storage glencoesoftware/bioformats2raw#94
Move METADATA.ome.xml under data.zarr: Move METADATA.ome.xml under data.zarr glencoesoftware/bioformats2raw#87

We'll need to update our tutorial and change our loadBioformats utility to accommodate these changes. It's noted that the bioformats2raw output is not a stable format, so I don't think we should worry about backward compatibility. Just need to update our LuCa example and tutorial.

EDIT: Dimension order is also forced TCZYX by default to reflect OME-Zarr.

The text was updated successfully, but these errors were encountered:

ilan-gold · 2021-03-31T16:07:55Z

cc: @andreasg123 I think this is relevant for you.

andreasg123 · 2021-04-02T00:42:48Z

Do you plan to maintain support for the old format? Otherwise, those images won't be viewable any more. It's not a big deal for me because we haven't started producing a large number of images. If I had many images, converting them would be painful.

ilan-gold · 2021-04-02T00:54:09Z

@andreasg123 I'm not sure. One option would be to start some sort of graveyard repo for the old loaders that the "community" maintains. @manzt thoughts? Out of curiosity, why did you end up going with zarr? Just curious to hear about real-world use-cases.

andreasg123 · 2021-04-02T03:04:28Z

Our images have multiple field-of-views and many channels and z-slices. Stitching is fastest done in parallel. Zarr images are suitable for parallel writes, unlike TIFF images. Proprietary formats such as ND2 are worse (and not supported by Viv). There may be a speed advantage in reading whole AWS S3 objects (Zarr chunks) instead of byte ranges but we haven't benchmarked that. lz4 compression is definitely faster than LZW but at the expense of larger size.

As the loader is outside of core Viv, I can see that I could just copy the current loadBioformatsZarr and use it for any old images. It would be great if you could export all relevant utility functions to keep that copy as small as possible.

manzt · 2021-04-02T13:34:53Z

Zarr is a very flexible format, and the output generated from bioformats2raw is not a standard or stable file format; it is part of a pipeline to generate OME-TIFF images. Previously bioformats2raw output a directory with the following structure:

├── METADATA.xml
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    └── ...

Notice how METADATA.xml lives outside the zarr hierarchy: It is not "zarr". The current implementation changes this layout to the following:

.
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    ├── ...
    └── METADATA.xml

Moving METADATA.xml within the Zarr hierarchy, but even this is likely to change. Previously we considered the "root" url to be the parent directory. Now, we should expect the root to contain a .zgroup/.zattrs file. It's worth noting that "converting" the previous output to the current entails moving METADATA.xml and changing the key separator for chunks from path/to/array/0.0.0.0.0 -> path/to/array/0/0/0/0/0.

The point I want to emphasize is that the outputs of bioformats2raw are not a standard, and also something that we don't control. OME-XML will not be in OME-Zarr, and the outputs generated from bioformats2raw is some hybrid of OME-TIFF and OME-Zarr that is evolving quickly without any sort of versioning or guarantees. Therefore, we should not make any stability guarantees for this ephemeral format either. It is my suggestion that we keep loadBioformatsZarr up to date with the latest outputs of bioformats2raw, but if users want stability for Zarr, they should target OME-Zarr which has a formal specification: https://ngff.openmicroscopy.org/latest/

andreasg123 · 2021-04-02T15:51:47Z

This separator change comes from NGFF 0.2. In can be determined from multiscales in .zattrs where version 0.1 would always have . as the separator and version 0.2 would default to / but may have other options: ome/ngff#40 (comment)

It would be great if loadBioformatsZarr could deal with both versions. With respect to converting images, moving METADATA.ome.xml or even putting its content into a .zattrs file is a relatively small operation compared to moving potentially thousands of chunks, especially if stored on S3 that does not support moves, only copies.

manzt · 2021-04-02T16:27:05Z

I should clarify, you are totally right. The separator change will be handled because both . and / are changes that are a part of NGFF and can be determined by the version of the multiscales metadata. This is a change we need to support loadOmeZarr as well.

However, there are other rapidly changing parts of bioformats2raw (e.g. where METADATA.xml lives), that are not and will not be a part of the NGFF spec. We just don't have the capacity to try to detect and handle all the possible variants of a liminal output that technically isn't even a format. It's close to NGFF, but not quite, and changing quickly.

ilan-gold · 2021-08-20T02:37:07Z

@manzt I think we should add a note to our docs at the minimum that they are out of date and we can't read in bioformats2raw output as of their latest version (i.e people have to use the old version of the bioformats2raw package). Does this sound reasonable?

andreasg123 · 2021-08-20T03:15:16Z

As I'm just now considering using bioformats2raw as another image source, I would appreciate it if the OME-Zarr loader could look in a few likely places for METADATA.xml. This could be done in parallel with Promise.any. I would be happy to submit a PR if this is something that you would accept.

joshmoore · 2022-01-21T12:50:46Z

As mentioned to @manzt elsewhere, this has raised its head for us as well recently (cc: @davehorsfall). For my part, I will be working try to pin down the specification of the location so that in the future, changes of this form would also lead to an OME-Zarr version bump (as opposed to just a bioformats2raw bump) .

manzt · 2022-01-24T14:41:35Z

changes of this form would also lead to an OME-Zarr version bump (as opposed to just a bioformats2raw bump) .

That's great! In that case it will make sense to implement this in Viv. In the meantime, we can discuss looking up the OME-XML metadata in our other tools (Vitessce/Vizarr), but I'd like to avoid exposing that functionality to library users until something more stable is decided.

manzt mentioned this issue Nov 19, 2021

Update tutorial with latest bioformats2raw & raw2ometiff pipeline #526

Closed

ilan-gold mentioned this issue Apr 20, 2022

OME Metadata Support ome/ngff#104

Open

2 tasks

manzt closed this as completed Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bioformats2raw format change #403

bioformats2raw format change #403

manzt commented Mar 30, 2021 •

edited

Loading

ilan-gold commented Mar 31, 2021

andreasg123 commented Apr 2, 2021

ilan-gold commented Apr 2, 2021 •

edited

Loading

andreasg123 commented Apr 2, 2021

manzt commented Apr 2, 2021 •

edited

Loading

andreasg123 commented Apr 2, 2021

manzt commented Apr 2, 2021

ilan-gold commented Aug 20, 2021

andreasg123 commented Aug 20, 2021

joshmoore commented Jan 21, 2022

manzt commented Jan 24, 2022

bioformats2raw format change #403

bioformats2raw format change #403

Comments

manzt commented Mar 30, 2021 • edited Loading

ilan-gold commented Mar 31, 2021

andreasg123 commented Apr 2, 2021

ilan-gold commented Apr 2, 2021 • edited Loading

andreasg123 commented Apr 2, 2021

manzt commented Apr 2, 2021 • edited Loading

andreasg123 commented Apr 2, 2021

manzt commented Apr 2, 2021

ilan-gold commented Aug 20, 2021

andreasg123 commented Aug 20, 2021

joshmoore commented Jan 21, 2022

manzt commented Jan 24, 2022

manzt commented Mar 30, 2021 •

edited

Loading

ilan-gold commented Apr 2, 2021 •

edited

Loading

manzt commented Apr 2, 2021 •

edited

Loading