Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising behaviour of Dataset/DataArray.interp() with NaN entries #5852

Open
johnomotani opened this issue Oct 11, 2021 · 0 comments
Open

Comments

@johnomotani
Copy link
Contributor

I think this is due to documented 'undefined behaviour' of scipy.interpolate.interp1d, so not really a bug, but I think it would be more user-friendly if xarray gave an error in this case rather than producing an 'incorrect' result.

What happened:

If a DataArray contains a NaN value and is interpolated, output values that do not depend on the entry that was NaN may still be NaN.

What you expected to happen:

The docs for scipy.interpolate.interp1d say

Calling interp1d with NaNs present in input values results in undefined behaviour.

which explain the output below, and presumably mean it is not fixable on the xarray side (short of some ugly work-around). I think it would be good though to check for NaNs in DataArray/Dataset.interp(), and if they are present raise an exception (or possibly a warning?) about 'undefined behaviour'.

scipy.interpolate.interp2d has a similar note, while scipy.interpolate.interpn does not mention it (but has very limited information).

What I'd initially expected was an output would be valid at locations in the array that shouldn't depend on the NaN input: interpolating a 2d DataArray (with dims x and y) in the x-dimension, if only one y-index in the input has a NaN value, that y-index in the output might contain NaNs, but the others should be OK.

Minimal Complete Verifiable Example:

import numpy as np
import xarray as xr

da = xr.DataArray(np.ones([3, 4]), dims=("x", "y"))

da[0, 0] = float("nan")

newx = np.linspace(0., 3., 5)

interp_da = da.interp(x=newx)

print(interp_da)

On my system, this gives output:

<xarray.DataArray (x: 5, y: 4)>
array([[nan,  1.,  1.,  1.],
       [nan,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [nan, nan, nan, nan],
       [nan, nan, nan, nan]])
Coordinates:
  * x        (x) float64 0.0 0.75 1.5 2.25 3.0
Dimensions without coordinates: y

[Surprisingly, I get the same output even using method="nearest".]

You might expect at least the following, with NaN only at y=0:

<xarray.DataArray (x: 5, y: 4)>
array([[nan, 1., 1., 1.],
       [nan, 1., 1., 1.],
       [ 1., 1., 1., 1.],
       [nan, 1., 1., 1.],
       [nan, 1., 1., 1.]])
Coordinates:
  * x        (x) float64 0.0 0.75 1.5 2.25 3.0
Dimensions without coordinates: y

Environment:

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48)
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.11.0-37-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.8.0

xarray: 0.19.0
pandas: 1.3.1
numpy: 1.21.1
scipy: 1.7.1
netCDF4: 1.5.7
pydap: None
h5netcdf: None
h5py: 3.3.0
Nio: None
zarr: None
cftime: 1.5.0
nc_time_axis: 1.3.1
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.07.2
distributed: 2021.07.2
matplotlib: 3.4.2
cartopy: None
seaborn: 0.11.1
numbagg: None
pint: 0.17
setuptools: 49.6.0.post20210108
pip: 21.2.4
conda: 4.10.3
pytest: 6.2.4
IPython: 7.26.0
sphinx: 4.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant