Detect original chunks in `imread` (improve RAM usage) #356

quentinblampey · 2024-03-15T08:55:39Z

Hello,

When using imread, I get an image with only one chunk, even though the .tif image I'm reading has chunks. This can lead to very high memory usage, as sometimes we work on images of sizes up to 2TB (as mentioned in these issues: 1, 2)

To reproduce the examples below, you can download one of these large .tif files. With this image of size 20GB, even accessing a small portion of the data will require loading the full image in memory with dask_image, while using rioxarray requires 40x less RAM, and is 4500x faster (see examples below).

Case 1: using `dask_image`

from dask_image.imread import imread

image = imread("mosaic_DAPI_z3.tif")

image[:, :1024, :1024].compute()

Max RAM usage: 34.244 GB
Time: 22.4 s ± 247 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Case 2: using `rioxarray`

import rioxarray

array = rioxarray.open_rasterio("mosaic_DAPI_z3.tif", chunks=(1, 1024, 1024))
image = array.data # get dask array from DataArray

image[:, :1024, :1024].compute()

Max RAM usage: 0.765 GB
Time: 5.04 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The text was updated successfully, but these errors were encountered:

m-albert · 2024-03-18T12:55:19Z

Hi @quentinblampey,

from your code it seems that your file is chunked with chunksizes of (1, 1024, 1024), however dask_image.imread only supports chunking along the first axis for now.

Thanks for posting the issue you ran into, we should probably document this better. Since there are many readers that return properly chunked dask arrays (and loading strongly depends on the file format and reader), this functionality currently has rather low priority in dask-image.

Also check out these readers:

dask.array.image.imread
https://github.com/AllenCellModeling/aicsimageio

quentinblampey · 2024-03-19T09:01:54Z

Thanks @m-albert for your answer, I definitely understand!
I'll check the readers you mentioned, thanks for the notice 👍

m-albert closed this as completed Mar 19, 2024

This was referenced May 8, 2024

Multi-series and multi-channel nd2 files are loaded incompletely without error/warning #364

Open

jp2 slicing #359

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect original chunks in `imread` (improve RAM usage) #356

Detect original chunks in `imread` (improve RAM usage) #356

quentinblampey commented Mar 15, 2024

m-albert commented Mar 18, 2024

quentinblampey commented Mar 19, 2024

Detect original chunks in imread (improve RAM usage) #356

Detect original chunks in imread (improve RAM usage) #356

Comments

quentinblampey commented Mar 15, 2024

Case 1: using dask_image

Case 2: using rioxarray

m-albert commented Mar 18, 2024

quentinblampey commented Mar 19, 2024

Detect original chunks in `imread` (improve RAM usage) #356

Detect original chunks in `imread` (improve RAM usage) #356

Case 1: using `dask_image`

Case 2: using `rioxarray`