Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linear interpolation gives negative output values with non-negative inputs #9404

Closed
5 tasks done
jameswilburlewis opened this issue Aug 27, 2024 · 5 comments
Closed
5 tasks done

Comments

@jameswilburlewis
Copy link

What happened?

I have some time-series data that contains non-negative values (with a few zeroes). When I call the interp() method with method='linear', and pass a time corresponding exactly to one of the zero values, the result is a (tiny) negative value. This causes problems later on when I pass the interpolated data to another routine that requires non-negative inputs.

What did you expect to happen?

I expected the interpolated data to contain zeroes rather than negative values where the input data had zeroes.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

time_strings_input = np.array(['2018-07-01T13:02:16.892474880',
                               '2018-07-01T13:02:16.922475008',
                               '2018-07-01T13:02:16.952474880'])
values_input = np.array([0.028584518, 0., 0.013626526],dtype=np.float32)

input_times_npdt64 = np.array([np.datetime64(t) for t in time_strings_input])
interp_to_times_npdt64 = np.array(input_times_npdt64[1])

input_times_float64 = input_times_npdt64.astype(np.float64)
interp_to_time_float64 = interp_to_times_npdt64.astype(np.float64)

data_array = xr.DataArray(values_input,dims=['time'],coords={'time':('time',input_times_float64)})

result = data_array.interp({"time": interp_to_time_float64},method='linear')
print(result.values)
# Expected output: 0.0
# Actual output: -3.469446951953614e-18

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

None

Anything else we need to know?

This is actually an issue with scipy.interpolate.interp1d. See: scipy/scipy#21459

A comment on that issue suggests using make_interp_spline() with k=1, which does give the desired output. numpy.interp() also gives the desired output.

It seems unlikely that scipy will fix the issue in interp1d, due to it being considered legacy code. Is there any chance that xarray might support the make_interp_spline way of doing it?

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.18 (main, Sep 11 2023, 08:25:10)
[Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2024.7.0
pandas: 2.1.0
numpy: 1.25.2
scipy: 1.13.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
h5py: 3.10.0
zarr: None
cftime: 1.6.2
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.3
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.5.1
pip: 22.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: 7.3.7

@jameswilburlewis jameswilburlewis added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 27, 2024
Copy link

welcome bot commented Aug 27, 2024

Thanks for opening your first issue here at xarray! Be sure to follow the issue template!
If you have an idea for a solution, we would really welcome a Pull Request with proposed changes.
See the Contributing Guide for more.
It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better.
Thank you!

@hollymandel
Copy link
Contributor

I will work on this in conjunction with #7704

@hollymandel
Copy link
Contributor

@jameswilburlewis thanks for suggesting the workaround. Passing method='slinear' will call make_interp_spline via scipy.interpolate.interp1d.

import xarray as xr
import numpy as np

time_strings_input = np.array(['2018-07-01T13:02:16.892474880',
                               '2018-07-01T13:02:16.922475008',
                               '2018-07-01T13:02:16.952474880'])
values_input = np.array([0.028584518, 0., 0.013626526],dtype=np.float32)

input_times_npdt64 = np.array([np.datetime64(t) for t in time_strings_input])
interp_to_times_npdt64 = np.array(input_times_npdt64[1])

input_times_float64 = input_times_npdt64.astype(np.float64)
interp_to_time_float64 = interp_to_times_npdt64.astype(np.float64)

data_array = xr.DataArray(values_input,dims=['time'],coords={'time':('time',input_times_float64)})

bad_result = data_array.interp({"time": interp_to_time_float64},method='linear')
good_result = data_array.interp({"time": interp_to_time_float64},method='slinear')

print(bad_result.values) # -3.469446951953614e-18
print(good_result.values) # 0.0

Another observation is that your example works as intended if you drop upgrade values_input to float64--I'm not sure if the lower precision is required by your use case. Having the values in float32 causes scipy.interpolate.interp1d to call its own linear interpolation method (not numpy.interpolate) which claims to be slower and has the issue you have described.

I'm inclined to leave the behavior of linear as it is and resolve this with a warning in the code and/or documentation.

As far as I can tell the question of deprecating interp1d is orthogonal to this issue. It might be a good idea on its own merits. I think it will require adding some nan handling prior to the call of make_interp_spline and implementing nearest/next/previous ourselves.

@dcherian
Copy link
Contributor

dcherian commented Sep 27, 2024

Amazing.

I'm inclined to leave the behavior of linear as it is and resolve this with a warning in the code and/or documentation.

👍 An addition to the docstring and the narrative docs would be massively helpful

@jameswilburlewis
Copy link
Author

@hollymandel : The example code was extracted from a somewhat messier use case, where the data type involved was indeed np.float32. I did notice that the problem didn't seem to occur with 64-bit inputs, but doing that promotion would have incurred an undesirable time/memory penalty for our use case. But thanks very much for pointing out that I can specify 'slinear' interpolation in the xarray.interp() call to get the behavior I'm looking for....somehow I missed that option when I was looking at the xarray and scipy docs. I think that suggestion resolves this issue, so I'll go ahead and close it. Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants