Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: provide exisiting output array to ndinterp.affine #217

Open
VolkerH opened this issue May 7, 2021 · 3 comments
Open

feature request: provide exisiting output array to ndinterp.affine #217

VolkerH opened this issue May 7, 2021 · 3 comments

Comments

@VolkerH
Copy link
Contributor

VolkerH commented May 7, 2021

With the recent activity around ndinterp.affine and many eyes on that functionality I will open an issue for the following feature request:

It would be great if one could provide an existing dask array as an output array through an optional argument.

I will describe the use case briefly.

Given a number of tiles (2D or 3D) each with a corresponding affine transformations that map these tiles into a joint coordinate system, it would be great if one could provide a pre-allocated output array that represents this joint coordinate system. In the output array only the chunks that map into the source array would be processed. Existing data in each output chunk would be overwritten, but only if the output pixel maps to a valid input pixel.

An application for this would for example be implementing something roughly equivalent to "fusion" step in Big Stitcher or other panorama stitching software.

I had discussed this use case before @GenevieveBuckley via email but at the time I didn't have time to look into details.
I believe @m-albert has worked on similar applications.

@m-albert
Copy link
Collaborator

m-albert commented May 7, 2021

Hey @VolkerH :)

Definitely affine_transform can be used for stitching transformed values!

Hmm I was thinking about whether it would make sense to be able to provide an output dask.array to affine_transform and I'm not sure about it. If I understand it correctly the main reason for having this functionality in the ndimage implementation is that you avoid allocating memory for a new array. However, when working with dask arrays this is not a concern.

Regarding other aspects you mention:

In the output array only the chunks that map into the source array would be processed.

This is already the case.

Existing data in each output chunk would be overwritten, but only if the output pixel maps to a valid input pixel.

I'm not sure how well-defined this behaviour would be. For example whether or not a chunk would be changed would depend on the interpolation order right? Actually, ndimage.affine_transform writes the full output array.

Also, currently you could already achieve what you want by doing this:

import dask.array as da
import numpy as np
import dask_image.ndinterp as da_ndinterp

im = da.from_array(np.ones((6,)), chunks=(2,))
preexisting = da.from_array(np.ones((6,))*3, chunks=(2,))

out = da_ndinterp.affine_transform(im, np.eye(1), offset=[1], cval=np.nan, mode='constant', output_chunks=(2,))

mask = ~da.isnan(out)
preexisting[mask] = out

preexisting.compute(): array([1., 1., 1., 1., 1., 3.])

Does that make sense?

@VolkerH
Copy link
Contributor Author

VolkerH commented May 9, 2021

Hi @m-albert

thanks for your reply. I think I get the idea, and I think (need to confirm) that it is somewhat similar to something I had tried previously. So previously I basically created an output dask array of the full required output size for every input tile. I then stacked these and did a .max along the stacked dimension. The problem was that while this worked it didn't scale nicely to large output arrays and many tiles.

I will have a play with your idea in the coming week, but am I wrong thinking that the mask creation step with the isnan would have to go through all chunks and all pixels of the potentially very large output array? (And for adding many tiles to the output, it would have to be performed after adding each tile).

@GenevieveBuckley
Copy link
Collaborator

Here's a link to the gist with the example code Volker is talking about: https://gist.github.com/VolkerH/bb51cfe35405b6cf30f74e8ec5e8bb26

Volker's main problem is that has has many small image image tiles he needs to fuse together. Repeated calling of a function that generates a large array output where almost all of the values are zero isn't very efficient.

He doesn't want to use something like da.block, because each tile overlaps slightly with the others and he wants to be able to fuse the edges together in a nice way.

@VolkerH I think you might have to keep track of the expected output bounding box positions, and then try to update only those chunks of the Dask array. This should be knowable, if you have the transformation matrix, and the input array corner locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants