Description
Code Sample, a copy-pastable example if possible
import pandas as pd
import xarray as xr
import numpy as np
t = pd.date_range(start='2018-01-01', end='2018-02-01', freq='H')
foo = np.sin(np.arange(len(t)))
bar = np.cos(np.arange(len(t)))
foo[1] = np.NaN
bar[2] = np.NaN
ds_test = xr.Dataset(data_vars={'foo': ('time', foo),
'bar': ('time', bar)},
coords={'time': t}).chunk()
print(ds_test)
print("\n\n### After `.interpolate_na(dim='time')`\n")
print(ds_test.interpolate_na(dim='time'))
print("\n\n### After `.interpolate_na(dim='time', limit=5)`\n")
print(ds_test.interpolate_na(dim='time', limit=5))
print("\n\n### After `.interpolate_na(dim='time', limit=20)`\n")
print(ds_test.interpolate_na(dim='time', limit=20))
Output of the above code. Note the different chunk sizes, depending on the value of limit
:
<xarray.Dataset>
Dimensions: (time: 745)
Coordinates:
* time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
foo (time) float64 dask.array<shape=(745,), chunksize=(745,)>
bar (time) float64 dask.array<shape=(745,), chunksize=(745,)>
### After `.interpolate_na(dim='time')`
<xarray.Dataset>
Dimensions: (time: 745)
Coordinates:
* time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
foo (time) float64 dask.array<shape=(745,), chunksize=(745,)>
bar (time) float64 dask.array<shape=(745,), chunksize=(745,)>
### After `.interpolate_na(dim='time', limit=5)`
<xarray.Dataset>
Dimensions: (time: 745)
Coordinates:
* time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
foo (time) float64 dask.array<shape=(745,), chunksize=(3,)>
bar (time) float64 dask.array<shape=(745,), chunksize=(3,)>
### After `.interpolate_na(dim='time', limit=20)`
<xarray.Dataset>
Dimensions: (time: 745)
Coordinates:
* time (time) datetime64[ns] 2018-01-01 2018-01-01T01:00:00 ... 2018-02-01
Data variables:
foo (time) float64 dask.array<shape=(745,), chunksize=(10,)>
bar (time) float64 dask.array<shape=(745,), chunksize=(10,)>
Problem description
When using xarray.DataArray.interpolate_na()
with the limit
kwarg this changes the chunksize of the resulting dask.arrays
.
Expected Output
The chunksize should not change. Very small chunks which results from typical small values of limit
are not optimal for the performance of dask
. Also, things like .rolling()
will fail if the chunksize is smaller than the window length of the rolling window.
Output of xr.show_versions()
xarray: 0.10.9
pandas: 0.23.3
numpy: 1.13.3
scipy: 1.0.0
netCDF4: 1.4.1
h5netcdf: 0.5.0
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: 1.2.1
cyordereddict: 1.0.0
dask: 0.19.4
distributed: 1.23.3
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: 0.8.1
setuptools: 38.5.2
pip: 9.0.1
conda: 4.5.11
pytest: 3.4.2
IPython: 5.5.0
sphinx: None