Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bioformats2raw format change #403

Closed
manzt opened this issue Mar 30, 2021 · 11 comments
Closed

bioformats2raw format change #403

manzt opened this issue Mar 30, 2021 · 11 comments

Comments

@manzt
Copy link
Member

manzt commented Mar 30, 2021

Bioformats2Raw is changing to nested storage as well as moving METADATA.xml.

We'll need to update our tutorial and change our loadBioformats utility to accommodate these changes. It's noted that the bioformats2raw output is not a stable format, so I don't think we should worry about backward compatibility. Just need to update our LuCa example and tutorial.

EDIT: Dimension order is also forced TCZYX by default to reflect OME-Zarr.

@ilan-gold
Copy link
Collaborator

cc: @andreasg123 I think this is relevant for you.

@andreasg123
Copy link
Contributor

Do you plan to maintain support for the old format? Otherwise, those images won't be viewable any more. It's not a big deal for me because we haven't started producing a large number of images. If I had many images, converting them would be painful.

@ilan-gold
Copy link
Collaborator

ilan-gold commented Apr 2, 2021

@andreasg123 I'm not sure. One option would be to start some sort of graveyard repo for the old loaders that the "community" maintains. @manzt thoughts? Out of curiosity, why did you end up going with zarr? Just curious to hear about real-world use-cases.

@andreasg123
Copy link
Contributor

Our images have multiple field-of-views and many channels and z-slices. Stitching is fastest done in parallel. Zarr images are suitable for parallel writes, unlike TIFF images. Proprietary formats such as ND2 are worse (and not supported by Viv). There may be a speed advantage in reading whole AWS S3 objects (Zarr chunks) instead of byte ranges but we haven't benchmarked that. lz4 compression is definitely faster than LZW but at the expense of larger size.

As the loader is outside of core Viv, I can see that I could just copy the current loadBioformatsZarr and use it for any old images. It would be great if you could export all relevant utility functions to keep that copy as small as possible.

@manzt
Copy link
Member Author

manzt commented Apr 2, 2021

Zarr is a very flexible format, and the output generated from bioformats2raw is not a standard or stable file format; it is part of a pipeline to generate OME-TIFF images. Previously bioformats2raw output a directory with the following structure:

├── METADATA.xml
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    └── ...

Notice how METADATA.xml lives outside the zarr hierarchy: It is not "zarr". The current implementation changes this layout to the following:

.
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    ├── ...
    └── METADATA.xml

Moving METADATA.xml within the Zarr hierarchy, but even this is likely to change. Previously we considered the "root" url to be the parent directory. Now, we should expect the root to contain a .zgroup/.zattrs file. It's worth noting that "converting" the previous output to the current entails moving METADATA.xml and changing the key separator for chunks from path/to/array/0.0.0.0.0 -> path/to/array/0/0/0/0/0.

The point I want to emphasize is that the outputs of bioformats2raw are not a standard, and also something that we don't control. OME-XML will not be in OME-Zarr, and the outputs generated from bioformats2raw is some hybrid of OME-TIFF and OME-Zarr that is evolving quickly without any sort of versioning or guarantees. Therefore, we should not make any stability guarantees for this ephemeral format either. It is my suggestion that we keep loadBioformatsZarr up to date with the latest outputs of bioformats2raw, but if users want stability for Zarr, they should target OME-Zarr which has a formal specification: https://ngff.openmicroscopy.org/latest/

@andreasg123
Copy link
Contributor

This separator change comes from NGFF 0.2. In can be determined from multiscales in .zattrs where version 0.1 would always have . as the separator and version 0.2 would default to / but may have other options: ome/ngff#40 (comment)

It would be great if loadBioformatsZarr could deal with both versions. With respect to converting images, moving METADATA.ome.xml or even putting its content into a .zattrs file is a relatively small operation compared to moving potentially thousands of chunks, especially if stored on S3 that does not support moves, only copies.

@manzt
Copy link
Member Author

manzt commented Apr 2, 2021

I should clarify, you are totally right. The separator change will be handled because both . and / are changes that are a part of NGFF and can be determined by the version of the multiscales metadata. This is a change we need to support loadOmeZarr as well.

However, there are other rapidly changing parts of bioformats2raw (e.g. where METADATA.xml lives), that are not and will not be a part of the NGFF spec. We just don't have the capacity to try to detect and handle all the possible variants of a liminal output that technically isn't even a format. It's close to NGFF, but not quite, and changing quickly.

@ilan-gold
Copy link
Collaborator

@manzt I think we should add a note to our docs at the minimum that they are out of date and we can't read in bioformats2raw output as of their latest version (i.e people have to use the old version of the bioformats2raw package). Does this sound reasonable?

@andreasg123
Copy link
Contributor

As I'm just now considering using bioformats2raw as another image source, I would appreciate it if the OME-Zarr loader could look in a few likely places for METADATA.xml. This could be done in parallel with Promise.any. I would be happy to submit a PR if this is something that you would accept.

@joshmoore
Copy link

As mentioned to @manzt elsewhere, this has raised its head for us as well recently (cc: @davehorsfall). For my part, I will be working try to pin down the specification of the location so that in the future, changes of this form would also lead to an OME-Zarr version bump (as opposed to just a bioformats2raw bump) .

@manzt
Copy link
Member Author

manzt commented Jan 24, 2022

changes of this form would also lead to an OME-Zarr version bump (as opposed to just a bioformats2raw bump) .

That's great! In that case it will make sense to implement this in Viv. In the meantime, we can discuss looking up the OME-XML metadata in our other tools (Vitessce/Vizarr), but I'd like to avoid exposing that functionality to library users until something more stable is decided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants