Use xarray.map_blocks to speed up pyramid_reproject #10

jhamman · 2021-11-23T00:38:58Z

@norlandrhagen and I have been experimenting with approaches for speeding up pyramid generation when using rio-xarray's reproject functionality. We have this rough prototype to share:

def pyramid_reproject(
    ds, levels: int = None, pixels_per_tile=128, resampling="average", extra_dim=None
) -> dt.DataTree:
    from rasterio.transform import Affine
    from rasterio.warp import Resampling

    # multiscales spec
    save_kwargs = {"levels": levels, "pixels_per_tile": pixels_per_tile}
    attrs = {
        "multiscales": _multiscales_template(
            datasets=[{"path": str(i) for i in range(levels)}],
            type="reduce",
            method="pyramid_reproject",
            version=_get_version(),
            kwargs=save_kwargs,
        )
    }

    # set up pyramid
    root = xr.Dataset(attrs=attrs)
    pyramid = dt.DataTree(data_objects={"root": root})

    def make_template(da, dim, dst_transform, shape=None):

        template = xr.DataArray(
            data=dask.array.empty(shape, chunks=shape), dims=("y", "x"), attrs=da.attrs
        )
        template = make_grid_ds(template, dim, dst_transform)
        template.coords["spatial_ref"] = xr.DataArray(np.array(1.0))
        return template

    def reproject(da, shape=None, dst_transform=None, resampling="average"):
        return da.rio.reproject(
            "EPSG:3857",
            resampling=Resampling[resampling],
            shape=shape,
            transform=dst_transform,
        )

    for level in range(levels):
        lkey = str(level)
        dim = 2 ** level * pixels_per_tile

        dst_transform = Affine.translation(-20026376.39, 20048966.10) * Affine.scale(
            (20026376.39 * 2) / dim, -(20048966.10 * 2) / dim
        )

        pyramid[lkey] = xr.Dataset(attrs=ds.attrs)
        for k, da in ds.items():
            template = make_template(ds[k], dim, dst_transform, (dim, dim))
            pyramid[lkey].ds[k] = xr.map_blocks(
                reproject,
                da,
                kwargs=dict(shape=(dim, dim), dst_transform=dst_transform),
                template=template,
            )

    return pyramid

The text was updated successfully, but these errors were encountered:

dcherian · 2022-07-29T15:37:27Z

I thought @djhoese was working hard at dask aware reprojection in pyresample?

dcherian · 2022-07-29T15:40:24Z

Oh this is very cool!

https://github.com/pytroll/pyresample/blob/93df018c3a5cbdd3ae52766adf0b2cea04c5c019/pyresample/resampler.py#L153-L160

djhoese · 2022-07-29T16:09:34Z

"working hard" has mostly been in my head as I haven't had time for any of the "real" work. Luckily, I'm not the only one worried about this. @mraspaud did the work on that resample_blocks function and it looks like it might be a game changer for some of our algorithms. The basic idea is:

Get the bounds of each target/output chunk.
Slice the input array that covers this output chunk.
Resample that input slice to the output chunk.

I initially wasn't a fan of this strategy as it requires slicing and rechunking of the input data, but @mraspaud's experience shows that it performs much better than resampling all overlapping input chunks and then merging/reducing them later.

dcherian · 2022-07-29T16:24:23Z

resampling all overlapping input chunks and then merging/reducing them later.

I think @gjoseph92 had something clever in stackstac for doing something like this and avoiding shuffling a bunch of NaNs (or other useless data) around. Can't find it now though. I might have totally misinterpreted though

djhoese · 2022-07-29T16:27:14Z

I have another very hacky implementation in pyresample for the "EWA" resampling algorithm (very specific to VIIRS and MODIS instruments) where I do a dask reduction but use tuples of values between functions. If the data is destined for the output chunk then the tuple contains arrays, if not then it contains Nones. It isn't how dask intends the function to be used (array functions should return arrays), but it works for me to prevent unnecessary processing of chunks that I know would be all NaNs/fills.

This was referenced Nov 23, 2021

Generating multiscale pyramids for raw CMIP6 data carbonplan/cmip6-downscaling#42

Closed

add pyramid_esmf #11

Merged

Feature/pyramid reproject map blocks #12

Closed

TomAugspurger mentioned this issue Jul 27, 2022

Blogpost idea: how to generate multiscale image arrays dask/dask-blog#141

Open

jakirkham mentioned this issue Jul 29, 2022

Feature Request: Hierarchical storage and processing in xarray pydata/xarray#4118

Closed

maxrjones mentioned this issue Feb 6, 2024

pyramid_coarsen (or a more generalized function) should allow a custom coarsen func #94

Closed

This was referenced Mar 19, 2024

Status of dask support in pyresample? pytroll/pyresample#206

Open

WIP: Reproject via map_blocks #115

Closed

ahuang11 mentioned this issue Mar 26, 2024

Pyramid create for creating pyramids with custom funcs #120

Merged

maxrjones mentioned this issue May 25, 2024

Resample dask arrays blockwise using pyresample #128

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use xarray.map_blocks to speed up pyramid_reproject #10

Use xarray.map_blocks to speed up pyramid_reproject #10

jhamman commented Nov 23, 2021

dcherian commented Jul 29, 2022

dcherian commented Jul 29, 2022

djhoese commented Jul 29, 2022

dcherian commented Jul 29, 2022

djhoese commented Jul 29, 2022

Use xarray.map_blocks to speed up pyramid_reproject #10

Use xarray.map_blocks to speed up pyramid_reproject #10

Comments

jhamman commented Nov 23, 2021

dcherian commented Jul 29, 2022

dcherian commented Jul 29, 2022

djhoese commented Jul 29, 2022

dcherian commented Jul 29, 2022

djhoese commented Jul 29, 2022