Single-FOV wells vs. multi-fov wells #12

jluethi · 2022-06-20T13:54:23Z

We're building ome-zarr reader support for multi-fov wells, where we would save each fov/site from the microscope as its own nested directory in the OME-Zarr file.

While building this and while talking to Kevin Yamauchi, some thoughts came up on pros and cons of the different strategies and I'll try to summarize them here in this high level issue, so that we can keep an overview here.

Definitions:

Single-FOV wells: Describes a way of saving image data to OME-Zarr in which data for each well (each unit of a plate) is saved in a single field of view.

Single FOV schematic adapted from OME-NGFF:

5966.zarr                 # One plate (id=5966) converted to Zarr
    ├── .zgroup
    ├── .zattrs               # Implements "plate" specification
    ├── A                     # First row of the plate
    │   ├── .zgroup
    │   │
    │   ├── 1                 # First column of row A
    │   │   ├── .zgroup
    │   │   ├── .zattrs       # Implements "well" specification
    │   │   │
    │   │   └── 0             # Single field of view of well A1
    │   │       │
    │   │       ├── .zgroup
    │   │       ├── .zattrs   # Implements "multiscales", "omero"
    │   │       ├── 0
    │   │       │   ...       # Resolution levels
    │   │       ├── n
    │   │       └── labels    # Labels (optional)
    │   ├── ...               # Columns
    │   └── 12
    ├── ...                   # Rows
    └── H

Multi-FOV wells: Describes a way of saving image data to OME-Zarr in which data for each original image acquisition region (=field of view, site) is saved to a separate folder under the well (unit of a plate) folder.

5966.zarr                 # One plate (id=5966) converted to Zarr
    ├── .zgroup
    ├── .zattrs               # Implements "plate" specification
    ├── A                     # First row of the plate
    │   ├── .zgroup
    │   │
    │   ├── 1                 # First column of row A
    │   │   ├── .zgroup
    │   │   ├── .zattrs       # Implements "well" specification
    │   │   │
    │   │   ├── 0             # First field of view of well A1
    │   │   │   │
    │   │   │   ├── .zgroup
    │   │   │   ├── .zattrs   # Implements "multiscales", "omero"
    │   │   │   ├── 0
    │   │   │   │   ...       # Resolution levels
    │   │   │   ├── n
    │   │   │   └── labels    # Labels (optional)
    │   │   ├── ...           # Multiple fields of views, saved separately
    │   │   └── m             # Last field of view (e.g. in a well with 6 rows & 7 columns, fov 41)
    │   ├── ...               # Columns
    │   └── 12
    ├── ...                   # Rows
    └── H

Benefits of using multi-FOV approach

This follows the OME-NGFF v0.4 (& v0.5-dev) standard
Makes for an easy parallelization scheme: The same way we parallelize over plates and wells, we could also parallelize over sites. Makes it easy and robust to run illumination correction per site, run image analysis per site etc.
Easily understandable on where original images go => the folder of the site
Allows a user to load a single site or a few sites (and if the metadata is parsed, those sites are placed correctly, which works in our test cases)
Easy support for search-first data: We only save the fovs & their position, the reader handles placement

Concerns with multi-FOV approach

How will read performance in the napari viewer be? Given that we have everything saved per site again, the pyramids are also built by site. That means we can't chunk pyramids across sites anymore, thus have the same amounts of files per level, with files just becoming smaller. Improving chunking was one of the things we changed in May that improved viewer performance (see here: Generalize pyramid creation fractal-client#32)
With increased pyramid files, the number of overall files also increase, thus increasing the burden on IT infrastructure if we don't use an object storage approach or Zarr zipping (see here: https://github.com/fractal-analytics-platform/mwe_fractal/issues/59)

Benefits of single-FOV approach

Already works with the current OME-Zarr napari reader plugin
"Fast" / decent visualization speeds
Single-FOV setups would be nice to display stitched data (though what would we do before stitching? Tile them next to each other and just have an image of a different size after stitching?)

Concerns with single-FOV approach

How do we handle parallelization per site? We could base it on chunk size at the lowest level (but on some level, that is something that could always be changed or that we'd want to optimize for either processing or visualization)
Parallelization: If we process in smaller chunks, but write to an AnnData object that is saved "per site" (and now there is only 1 site) => How do we handle parallel write access to that file? Or how do we work around having to write to it in parallel?
How do we handle site-wise label images if all sites are fused? Or do we process per site and then find a way to relabel once the whole well is processed? (we wouldn't want multiple objects with "label" = 1 in the same label image)
=> This opens the larger question though: How will we handle "region of interest" based computation, e.g. for an organoid that crosses site boundaries or for multiple sites that we want to stitch together and then process as one?

We will test some of the concerns regarding performance with the planned work on the ome-zarr-py plugin. Once we have a better understanding of its performance implications, we should know better how to judge the trade-offs illustrated above.

The text was updated successfully, but these errors were encountered:

tcompa · 2022-06-21T10:53:46Z

Some related discussions:

And an interesting approach: https://github.com/VolkerH/DaskFusion ("This repo contains proof-of-concept code that fuses many image tiles from a microscopy scan, where the position of each tile is known from the microscopy stage metadat, into a large "fused" array.")
EDIT: See related blogpost https://blog.dask.org/2021/12/01/mosaic-fusion

jluethi · 2022-06-21T11:04:16Z

Ah, very good links, I remember that discussion about number of layers, but it does actually go deeper with relevant ideas. In Volker's implementation, the data is parsed to an OME-Zarr file consisting of a single site though in the end, right?
But I wonder whether we could use the logic here to build the dask images on the fly in the plugin:

So what I’ve been doing for the use in our lab is to fuse the tiles using dask-image map_blocks. This is without fine registration based on image content, it is just using stage metadata information to place the tiles.

I wonder whether this Dask Fusion would work well enough on the fly

…el scheme (ref #61 #74)

jluethi · 2022-06-23T14:14:00Z

The multi-FOV performance hits seems quite clear, let's see how it scales for larger experiments: fractal-analytics-platform/fractal-client#66 (comment)

Another thing to consider: Getting to full, randomly placed search first representation may be harder than expected. Even if we can parse coordinates, we seem to require a dask array that is itself chunked, probably along a classical grid, that we lazily build for napari. If that holds true, saving things off grid will be another performance hit even if we figure out a way to do so...

jluethi · 2022-06-23T14:41:52Z

One performance benefit of multi-site FOV: Processing was much faster when a single well was parallelized by sites instead of by well. Couldn't be changed for initial parsing, but maybe for downstreaming analysis if we're clever about it.

If the performance of multi-site-FOV remains a large issue, investigate whether we could save the field of view positional information either as metadata or as part of the OME-NGFF table spec. Also, if it remains such an issue, start the conversation with the OME-NGFF group about revising the plate specification, as it likely won't scale for other approaches either...

tcompa · 2022-06-24T09:19:26Z

One performance benefit of multi-site FOV: Processing was much faster when a single well was parallelized by sites instead of by well. Couldn't be changed for initial parsing, but maybe for downstreaming analysis if we're clever about it.

Quick check of the timings for a workflow made of yokogawa_to_zarr+MIP, for a single-well 9x8 dataset.
Workflow durations show a 5x improvement when using multi-FOV:

Single-FOV: 1276 s
Multi-FOV: 256 s

Numbers may obviously depend on current parsl config (e.g. now I ask for 8 cores/node, which seems to never be saturated) but the comparison is still useful.

tcompa · 2022-06-24T12:49:43Z

Another quick test on running multi-FOV calculations. It looks like parsl has no big problem in handling hundreds of tasks (e.g. for 10 5x5 wells there will be 250 yokogawa_to_zarr tasks plus 250 MIP tasks, which in the 23-9x8 case will reach a few thousands).

This was expected, but it's good to see it directly.

Here is the monitoring status during execution of this example (notice that the cluster is quite busy, so that only a few of our jobs can run at the same time).

jluethi · 2022-06-27T12:06:42Z

While multi-FOV approaches would have nice benefits for easier parallelization and for saving single FOV label images without having to worry about label uniqueness between FOVs (and others, see above), the viewing performance for plates saved in multi-FOV setup just does not cut it.

The problem: In a multi-FOV case, every FOV has all the pyramid levels. Thus, every pyramid level consists of at least as many chunks as there are FOVs. In the 23 well case, that would mean having 1656 chunks per channel (23 wells * 72 FOVs), while single FOV approach can represent the same data in 23 chunks at the high pyramid levels (each well has a ~50x50 pixel representation fused for the whole well, instead of many 10x10 pixel representations).

The access to many more small files is significantly slower than accessing a few larger files (~5-10x slower in our test settings for high pyramid levels). The details are documented here: ome/ome-zarr-py#200 (comment) and our work leading up to this conclusion is here fractal-analytics-platform/fractal-client#66

For the time being, we're pushing ahead a single FOV approach. We are having the conversation with the OME-NGFF standard on whether their multi-FOV spec makes sense giving the scaling issues and whether we can define a specification that would actually scale to our requirements.

I'm keeping this high level issue open so we can record decisions taken and more observations that come up regarding parallel processing.

jluethi · 2022-07-26T15:04:40Z

It looks like parallelization over ROIs works well (#24) and the visualization scales well with this new approach.

The only downside to the multi-FOV approach seems to be handling overlap between images. We have a new issue specifically for this here, so let's close this one.

jluethi mentioned this issue Jun 21, 2022

Increase metadata parsing to OME-Zarr & update to v0.4 (or v0.5) ome-ngff standard fractal-analytics-platform/fractal-client#44

Closed

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jun 23, 2022

(probably BROKEN) First attempt for multifov parsing within new chann…

a0cce76

…el scheme (ref #61 #74)

tcompa referenced this issue in fractal-analytics-platform/fractal-client Jun 24, 2022

Adapt replicate_zarr_structure_mip.py to multifov case (ref #74 #50)

ad0c226

jluethi mentioned this issue Jun 27, 2022

Lazy loading of wells and multi-site plates fractal-analytics-platform/fractal-client#66

Closed

jluethi self-assigned this Jun 27, 2022

jluethi added this to the Support Yokogawa Search-First Modality milestone Jun 27, 2022

jluethi closed this as completed Jul 26, 2022

jluethi transferred this issue from fractal-analytics-platform/fractal-client Sep 2, 2022

jluethi added the Overview label Sep 2, 2022

jluethi mentioned this issue Sep 2, 2022

Field of view parallelization via OME-NGFF ROI tables #24

Closed

jluethi modified the milestones: Support Yokogawa Search-First Modality, Create ROIs & support processing by ROIs Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-FOV wells vs. multi-fov wells #12

Single-FOV wells vs. multi-fov wells #12

jluethi commented Jun 20, 2022

tcompa commented Jun 21, 2022 •

edited

Loading

jluethi commented Jun 21, 2022

jluethi commented Jun 23, 2022

jluethi commented Jun 23, 2022

tcompa commented Jun 24, 2022

tcompa commented Jun 24, 2022 •

edited

Loading

jluethi commented Jun 27, 2022

jluethi commented Jul 26, 2022

Single-FOV wells vs. multi-fov wells #12

Single-FOV wells vs. multi-fov wells #12

Comments

jluethi commented Jun 20, 2022

Definitions:

Benefits of using multi-FOV approach

Concerns with multi-FOV approach

Benefits of single-FOV approach

Concerns with single-FOV approach

tcompa commented Jun 21, 2022 • edited Loading

jluethi commented Jun 21, 2022

jluethi commented Jun 23, 2022

jluethi commented Jun 23, 2022

tcompa commented Jun 24, 2022

tcompa commented Jun 24, 2022 • edited Loading

jluethi commented Jun 27, 2022

jluethi commented Jul 26, 2022

tcompa commented Jun 21, 2022 •

edited

Loading

tcompa commented Jun 24, 2022 •

edited

Loading