Imread performance: reduced overhead of pim.open calls when reading from image sequence #182

m-albert · 2021-01-20T18:29:50Z

As reported by #181 the performance of dask_image.imread is bad in the case of reading from many input files. Also related are #121 #161.

One problem with the current implementation that I noticed is that when calling dask_image.imread.imread with a file pattern such as im_*.tif, for each tile that is loaded pims.open is called on the entire file pattern. This then leads to many unnecessary instantiations of pims.ImageSequenceNDs producing a large overhead.

In this proposed fix I use glob to match filenames and frames to call pims.open only on the files that are actually being loaded.

files. Solved by resolving input filenames using glob and encoding them into a dask array used as input to the `map_blocks` reading step.

m-albert · 2021-01-20T18:31:05Z

Here some performance tests (inspired by #181):

Data prep:

import numpy as np
import skimage.io
import dask_image.imread
import glob
from dask import delayed
import dask.array as da

Nz, Ny, Nx = 1000, 100, 100
im = np.random.randint(0, 1000, (Nz, Ny, Nx))

for z in range(Nz):
    skimage.io.imsave('data/im_%05d.tif' %z, im[z])

Timings:

%%time
all_images = sorted(glob.glob("data/im_*.tif"))
imgs = []
for idx, image in enumerate(all_images):
    imgs.append(skimage.io.imread(image))
imgs = np.array(imgs)

Serial read without dask, 420ms

%%time
lazy_imread = delayed(skimage.io.imread)  # lazy reader
lazy_arrays = [lazy_imread(image) for image in all_images]
dask_arrays = [
    da.from_delayed(delayed_reader, shape=(Ny, Nx), dtype=np.uint16)
    for delayed_reader in lazy_arrays
]
using_dask = da.stack(dask_arrays, axis=0).compute()

Using dask delayed: 1.05s

%%time
lazy_imread = delayed(skimage.io.imread)  # lazy reader
lazy_arrays = [lazy_imread(image) for image in all_images]
dask_arrays = [
    da.from_delayed(delayed_reader, shape=(Ny, Nx), dtype=np.uint16)
    for delayed_reader in lazy_arrays
]
using_dask = da.stack(dask_arrays, axis=0).compute()

Master: 16s
This PR: 1.04s

GenevieveBuckley · 2021-02-19T03:21:41Z

Genuinely sorry, not sure how this managed to fall off my radar. Adding it back to the to-do list now!

GenevieveBuckley · 2021-02-19T03:33:57Z

Looks good to me too.

At some point we should consider adding performance tests, but that's a conversation for another day.

m-albert · 2021-02-19T14:33:13Z

@GenevieveBuckley Great, thanks for reviewing and merging!

imread: Reduce overhead of pim.open calls when reading from many input

d00410e

files. Solved by resolving input filenames using glob and encoding them into a dask array used as input to the `map_blocks` reading step.

m-albert changed the title ~~Imread: reduced overhead of pim.open calls when reading from many files~~ Imread: reduced overhead of pim.open calls when reading from image sequence Jan 20, 2021

m-albert changed the title ~~Imread: reduced overhead of pim.open calls when reading from image sequence~~ Imread performance: reduced overhead of pim.open calls when reading from image sequence Jan 20, 2021

m-albert mentioned this pull request Jan 21, 2021

Improve dask-image imread #179

Closed

jmontoyam mentioned this pull request Jan 22, 2021

dask_image imread performance issue #181

Open

Base automatically changed from master to main February 2, 2021 01:18

m-albert mentioned this pull request Feb 18, 2021

imread : passing a list of filenames instead of '.../*.tif' #191

Closed

GenevieveBuckley merged commit 91fe6e1 into dask:main Feb 19, 2021

GenevieveBuckley mentioned this pull request Feb 19, 2021

Loading stack of images slow #121

Open

GenevieveBuckley mentioned this pull request Feb 25, 2021

dask-image imread v0.5.0 not working with dask distributed Client & napari #194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imread performance: reduced overhead of pim.open calls when reading from image sequence #182

Imread performance: reduced overhead of pim.open calls when reading from image sequence #182

m-albert commented Jan 20, 2021

m-albert commented Jan 20, 2021

GenevieveBuckley commented Feb 19, 2021

GenevieveBuckley commented Feb 19, 2021

m-albert commented Feb 19, 2021

Imread performance: reduced overhead of pim.open calls when reading from image sequence #182

Imread performance: reduced overhead of pim.open calls when reading from image sequence #182

Conversation

m-albert commented Jan 20, 2021

m-albert commented Jan 20, 2021

GenevieveBuckley commented Feb 19, 2021

GenevieveBuckley commented Feb 19, 2021

m-albert commented Feb 19, 2021