-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single-FOV wells vs. multi-fov wells #12
Comments
Some related discussions:
And an interesting approach: https://github.com/VolkerH/DaskFusion ("This repo contains proof-of-concept code that fuses many image tiles from a microscopy scan, where the position of each tile is known from the microscopy stage metadat, into a large "fused" array.") |
Ah, very good links, I remember that discussion about number of layers, but it does actually go deeper with relevant ideas. In Volker's implementation, the data is parsed to an OME-Zarr file consisting of a single site though in the end, right?
I wonder whether this Dask Fusion would work well enough on the fly |
…el scheme (ref #61 #74)
The multi-FOV performance hits seems quite clear, let's see how it scales for larger experiments: fractal-analytics-platform/fractal-client#66 (comment) Another thing to consider: Getting to full, randomly placed search first representation may be harder than expected. Even if we can parse coordinates, we seem to require a dask array that is itself chunked, probably along a classical grid, that we lazily build for napari. If that holds true, saving things off grid will be another performance hit even if we figure out a way to do so... |
One performance benefit of multi-site FOV: Processing was much faster when a single well was parallelized by sites instead of by well. Couldn't be changed for initial parsing, but maybe for downstreaming analysis if we're clever about it. If the performance of multi-site-FOV remains a large issue, investigate whether we could save the field of view positional information either as metadata or as part of the OME-NGFF table spec. Also, if it remains such an issue, start the conversation with the OME-NGFF group about revising the plate specification, as it likely won't scale for other approaches either... |
Quick check of the timings for a workflow made of yokogawa_to_zarr+MIP, for a single-well 9x8 dataset.
Numbers may obviously depend on current parsl config (e.g. now I ask for 8 cores/node, which seems to never be saturated) but the comparison is still useful. |
Another quick test on running multi-FOV calculations. It looks like parsl has no big problem in handling hundreds of tasks (e.g. for 10 5x5 wells there will be 250 yokogawa_to_zarr tasks plus 250 MIP tasks, which in the 23-9x8 case will reach a few thousands). This was expected, but it's good to see it directly. Here is the monitoring status during execution of this example (notice that the cluster is quite busy, so that only a few of our jobs can run at the same time). |
While multi-FOV approaches would have nice benefits for easier parallelization and for saving single FOV label images without having to worry about label uniqueness between FOVs (and others, see above), the viewing performance for plates saved in multi-FOV setup just does not cut it. The problem: In a multi-FOV case, every FOV has all the pyramid levels. Thus, every pyramid level consists of at least as many chunks as there are FOVs. In the 23 well case, that would mean having 1656 chunks per channel (23 wells * 72 FOVs), while single FOV approach can represent the same data in 23 chunks at the high pyramid levels (each well has a ~50x50 pixel representation fused for the whole well, instead of many 10x10 pixel representations). The access to many more small files is significantly slower than accessing a few larger files (~5-10x slower in our test settings for high pyramid levels). The details are documented here: ome/ome-zarr-py#200 (comment) and our work leading up to this conclusion is here fractal-analytics-platform/fractal-client#66 For the time being, we're pushing ahead a single FOV approach. We are having the conversation with the OME-NGFF standard on whether their multi-FOV spec makes sense giving the scaling issues and whether we can define a specification that would actually scale to our requirements. I'm keeping this high level issue open so we can record decisions taken and more observations that come up regarding parallel processing. |
We're building ome-zarr reader support for multi-fov wells, where we would save each fov/site from the microscope as its own nested directory in the OME-Zarr file.
While building this and while talking to Kevin Yamauchi, some thoughts came up on pros and cons of the different strategies and I'll try to summarize them here in this high level issue, so that we can keep an overview here.
Definitions:
Single-FOV wells: Describes a way of saving image data to OME-Zarr in which data for each well (each unit of a plate) is saved in a single field of view.
Single FOV schematic adapted from OME-NGFF:
Multi-FOV wells: Describes a way of saving image data to OME-Zarr in which data for each original image acquisition region (=field of view, site) is saved to a separate folder under the well (unit of a plate) folder.
Benefits of using multi-FOV approach
Concerns with multi-FOV approach
Benefits of single-FOV approach
Concerns with single-FOV approach
=> This opens the larger question though: How will we handle "region of interest" based computation, e.g. for an organoid that crosses site boundaries or for multiple sites that we want to stitch together and then process as one?
We will test some of the concerns regarding performance with the planned work on the ome-zarr-py plugin. Once we have a better understanding of its performance implications, we should know better how to judge the trade-offs illustrated above.
The text was updated successfully, but these errors were encountered: