Check memory usage of `yokogawa_to_zarr` #72

tcompa · 2022-09-15T07:28:36Z

See fractal-analytics-platform/fractal-server#51

jluethi · 2022-09-15T08:59:21Z

This looks very much like the issues we had with ROIs early on, where all the computation happens first and it only writes to the zarr file in the end. I wonder if a similar trick of writing to the file in defined region for each chunk would also do the trick here.
Also, which part of the code actually triggers the compute of the canvas dask array? Does only the write_pyramid in the end trigger compute? Maybe the same issue again where that worked for mapblocks compositions, but not when we use indexing on the canvas

tcompa · 2022-09-16T09:20:44Z

And because everyone will always have to run parsing to get the data into OME-Zarr + that can run for a while given the IO, I think it's worth it to write that in the somewhat more memory efficient 2D approach. Let's have the discussion in the other issue. If 2D parsing does get too complex, it's not blocking to do it in 3D. But let's see if we can get this running as a memory-optimized 2D workflow :)

@jluethi, @mfranzon

I thought the issue with processing 2D images was that we would generate tens of thousands of dask operations (one per image), although it's true that each graph should be very light (one image gets read from disk and written to zarr). I don't know whether dask builds 10000 small graphs or a global one. Given your information on the dask-overhead issue, do you think it matters here?

At the moment we started preparing the 3D version, but we can switch back to 2D if it's preferred.

jluethi · 2022-09-16T09:25:14Z

Hmm, I mostly had this intuition from accessing ROIs sequentially, thus having indexing to load parts of the data. While we also have a large zarr array to fill lazily, I'm not sure we'll have the same issue here when every chunk is loaded individually from disk.

In the end, both ways should work and it shouldn't be that different to refactor from one to the other. Let's test whichever way you implement first and see if either there is a large memory overhead for 3D or a large dask overhead (=> also in memory) from 2D. I can see pros & cons for either (e.g. maybe writing 3D arrays is faster? But maybe 2D is more memory efficient? Or the difference is actually negligible? 🤷‍♂️)

tcompa · 2022-09-16T10:53:32Z

Quick comment from a first test (we will add a full report later): With the new task, writing the first 11G of zarr file took ~250 seconds (with four processes, for the first four wells) and memory was under control (3 G max, in total). Notice that this task is taking something like 100% CPU per process (i.e. much less than available on the node, obviously), which is coherent with fractal-analytics-platform/fractal-server#34 (comment).

EDIT: I need to check that I was using the right version of the tasks! I think it was right, but let's re-discuss this on Monday.

jluethi · 2022-09-16T10:55:16Z

Makes sense. We could then run such tasks in a cpu-intermediate setting (or even cpu-low, though probably not once FOVs have more Z stacks).
Sounds promising, looking forward to the further tests of this on Monday! :)

tcompa mentioned this issue Sep 15, 2022

Parsl jobs lost on larger workflows: parsl.executors.high_throughput.interchange.ManagerLost fractal-analytics-platform/fractal-server#51

Closed

tcompa added the High Priority Current Priorities & Blocking Issues label Sep 15, 2022

This was referenced Sep 16, 2022

Options of processing ROIs in parallel within wells. #44

Open

Use zarr's regions in illumination correction #75

Closed

mfranzon added a commit that referenced this issue Sep 16, 2022

refering issue #72, implementing region

0e043bc

tcompa mentioned this issue Sep 16, 2022

Use regions in yokogawa_to_zarr #80

Merged

mfranzon closed this as completed in #80 Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check memory usage of `yokogawa_to_zarr` #72

Check memory usage of `yokogawa_to_zarr` #72

tcompa commented Sep 15, 2022

jluethi commented Sep 15, 2022

tcompa commented Sep 16, 2022

jluethi commented Sep 16, 2022

tcompa commented Sep 16, 2022 •

edited

Loading

jluethi commented Sep 16, 2022

Check memory usage of yokogawa_to_zarr #72

Check memory usage of yokogawa_to_zarr #72

Comments

tcompa commented Sep 15, 2022

jluethi commented Sep 15, 2022

tcompa commented Sep 16, 2022

jluethi commented Sep 16, 2022

tcompa commented Sep 16, 2022 • edited Loading

jluethi commented Sep 16, 2022

Check memory usage of `yokogawa_to_zarr` #72

Check memory usage of `yokogawa_to_zarr` #72

tcompa commented Sep 16, 2022 •

edited

Loading