Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect original chunks in imread (improve RAM usage) #356

Closed
quentinblampey opened this issue Mar 15, 2024 · 2 comments
Closed

Detect original chunks in imread (improve RAM usage) #356

quentinblampey opened this issue Mar 15, 2024 · 2 comments

Comments

@quentinblampey
Copy link

Hello,

When using imread, I get an image with only one chunk, even though the .tif image I'm reading has chunks. This can lead to very high memory usage, as sometimes we work on images of sizes up to 2TB (as mentioned in these issues: 1, 2)

To reproduce the examples below, you can download one of these large .tif files. With this image of size 20GB, even accessing a small portion of the data will require loading the full image in memory with dask_image, while using rioxarray requires 40x less RAM, and is 4500x faster (see examples below).

Case 1: using dask_image

from dask_image.imread import imread

image = imread("mosaic_DAPI_z3.tif")

image[:, :1024, :1024].compute()

Max RAM usage: 34.244 GB
Time: 22.4 s ± 247 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Case 2: using rioxarray

import rioxarray

array = rioxarray.open_rasterio("mosaic_DAPI_z3.tif", chunks=(1, 1024, 1024))
image = array.data # get dask array from DataArray

image[:, :1024, :1024].compute()

Max RAM usage: 0.765 GB
Time: 5.04 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@m-albert
Copy link
Collaborator

Hi @quentinblampey,

from your code it seems that your file is chunked with chunksizes of (1, 1024, 1024), however dask_image.imread only supports chunking along the first axis for now.

Thanks for posting the issue you ran into, we should probably document this better. Since there are many readers that return properly chunked dask arrays (and loading strongly depends on the file format and reader), this functionality currently has rather low priority in dask-image.

Also check out these readers:

@quentinblampey
Copy link
Author

Thanks @m-albert for your answer, I definitely understand!
I'll check the readers you mentioned, thanks for the notice 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants