-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status of dask support in pyresample? #206
Comments
Sadly, the old interfaces do not support dask. We've been playing around with support for Xarray DataArray objects backed by dask when using Satpy and have had a lot of success, but not everything we want lives in pyresample at the moment. That said I think you could do what you want if you are willing to wrap things in a DataArray first and are willing to deal with some less than perfect interfaces. import numpy as np
import dask.array as da
import xarray as xr
from pyresample import geometry
from pyresample.kd_tree import XArrayResamplerNN
area_def = geometry.AreaDefinition('areaD', 'Europe (3km, HRV, VTC)', 'areaD',
{'a': '6378144.0', 'b': '6356759.0',
'lat_0': '50.00', 'lat_ts': '50.00',
'lon_0': '8.00', 'proj': 'stere'},
800, 800,
[-1370912.72, -909968.64,
1029087.28, 1490031.36])
msg_area = geometry.AreaDefinition('msg_full', 'Full globe MSG image 0 degrees',
'msg_full',
{'a': '6378169.0', 'b': '6356584.0',
'h': '35785831.0', 'lon_0': '0',
'proj': 'geos'},
3712, 3712,
[-5568742.4, -5568742.4,
5568742.4, 5568742.4])
# Here I have 10 "timesteps" (pyresample calls these "channels")
# The channels are the final axis (axis=2) of the array
# works if I use numpy
# data = np.random.rand(3712, 3712, 10)
data = da.random.random((3712, 3712, 10), chunks=(3712, 3712, 1))
data = xr.DataArray(data, dims=('y', 'x', 'time'))
resampler = XArrayResamplerNN(msg_area, area_def, 50000)
resampler.get_neighbour_info()
result = resampler.get_sample_from_neighbour_info(data) Note that nearest neighbor in pyresample uses the KDTree from Edit: The point of the last paragraph was to say: I've never tested these dask interfaces on a cluster or multiprocess scheduler. Threaded scheduler will probably perform best. |
And...because I have you here, keep an eye out for https://github.com/geoxarray/geoxarray which should simplify some of this in the future (taking a Dataset from anywhere and remap it with pyresample). |
@rabernat I don't know if you question is still relevant, but I just want to name the recent work on the |
@djhoese @mraspaud are there any examples of using the |
@maxrjones I'm just coming back from paternity leave and also don't have much experience with customizing calls to |
|
Sounds good, thanks! |
We've been working with Currently I've implemented a wrapper that uses |
@maxrjones thanks for the feedback! We haven't ported the bucket resampling to |
The latest pyresample docs state:
However, there is no further mention of xarray / dask support in the rest of the docs. There seem to be a few dask issues (e.g. #148), but I could not ascertain the status of xarray / dask support based on browsing the docs and the repo.
Could you clarify where things stand? My use case is that I would like to use pyresample lazily on dask arrays, where the data is chunked contiguously in space but has many samples in time. Here's what I have tried, representing each timestep as a different channel:
I would expect this to lazily return a dask array with the same chunk structure as the input array, with resampling performed on demand as the chunks are loaded. Instead I hit an error when creating the
ImageContainerNearest
object:The text was updated successfully, but these errors were encountered: