Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling argmin() or argmax() gives the wrong result (are they inverted?) #8541

Closed
5 tasks done
JulianGiles opened this issue Dec 11, 2023 · 10 comments · Fixed by #8552
Closed
5 tasks done

rolling argmin() or argmax() gives the wrong result (are they inverted?) #8541

JulianGiles opened this issue Dec 11, 2023 · 10 comments · Fixed by #8552

Comments

@JulianGiles
Copy link

What happened?

I was trying to compute a rolling argmax and argmin on a DataArray, something like:
data.rolling(time=3).argmax()

What did you expect to happen?

The results of the operations look flipped, the result of rolling argmax looks like what a rolling argmin should give, and viceversa.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

# Create a sample DataArray
data = xr.DataArray(np.arange(10), dims='time', coords={'time': np.arange(10)})

# Apply rolling argmax
data.rolling(time=3, center=False).argmax()

# The result should be 2 for every index starting from the third time step. However, the result is always 0 (like argmin?)

# If we try applying argmin
data.rolling(time=3, center=False).argmin()

# Then, the result is always 2, which should be the result for argmax. Are these methods switched?

# By doing it using the reduce() method and passing np.argmax, it works as expected
data.rolling(time=3, center=False).reduce(np.argmax)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:43:22) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.14.21-150400.24.81-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: ('de_DE', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.11.0
pandas: 2.1.3
numpy: 1.26.2
scipy: 1.11.4
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.10.0
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: 1.3.7
dask: 2023.12.0
distributed: 2023.12.0
matplotlib: 3.8.2
cartopy: 0.22.0
seaborn: None
numbagg: None
fsspec: 2023.12.1
cupy: None
pint: 0.22
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.2.2
pip: 23.3.1
conda: None
pytest: None
mypy: None
IPython: 8.18.1
sphinx: 7.2.6

@JulianGiles JulianGiles added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 11, 2023
@max-sixty
Copy link
Collaborator

max-sixty commented Dec 11, 2023

This seems to be something to do with our implementation of bottleneck:

[nav] In [9]: with xr.set_options(use_bottleneck=False):
         ...:     print(data.rolling(time=3).argmax())
<xarray.DataArray (time: 10)>
array([nan, nan,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.])
Coordinates:
  * time     (time) int64 0 1 2 3 4 5 6 7 8 9

[nav] In [10]: with xr.set_options(use_bottleneck=True):
          ...:     print(data.rolling(time=3).argmax())
<xarray.DataArray (time: 10)>
array([nan, nan,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
Coordinates:
  * time     (time) int64 0 1 2 3 4 5 6 7 8 9

...but not bottleneck itself — .reduce(bn.nanargmax) returns the correct result:


[ins] In [12]: data.rolling(time=3).reduce(bn.nanargmax)
Out[12]:
<xarray.DataArray (time: 10)>
array([nan, nan,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.])
Coordinates:
  * time     (time) int64 0 1 2 3 4 5 6 7 8 9

FWIW numbagg doesn't have these functions.

@JulianGiles do you know whether this previously worked differently?

@JulianGiles
Copy link
Author

@max-sixty thanks for looking into it. I do not know if this used to worked correctly, is the first time I try to combine rolling with argmax/argmin

@max-sixty max-sixty removed the needs triage Issue that has not been reviewed by xarray team member label Dec 11, 2023
@max-sixty
Copy link
Collaborator

max-sixty commented Dec 11, 2023

OK I think this is actually a bottleneck issue:

import bottleneck as bn

data = xr.DataArray(np.arange(10), dims='time', coords={'time': np.arange(10)})
bn.move_argmax(data.data, window=3, axis=0)
array([nan, nan,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

I'm a bit surprised this didn't get spotted earlier. It's quite possible I'm missing something / misunderstanding what these should be returning. (But the cases above are different whether bottleneck is enabled, so I'm really confident something is wrong somewhere!)

Do you want to put an issue in upstream @JulianGiles ? That would be helpful.

We could also remove them from xarray if others confirm this is wrong and a fix in bottleneck isn't forthcoming. (Unfortunately writing these in numbagg isn't easy, since numba doesn't support the data structures that are required for implementing these; or I'd get onto that...)

@kmuehlbauer
Copy link
Contributor

@max-sixty @JulianGiles It looks like that this is bottleneck's standard behaviour.

Index 0 is at the rightmost edge of the window

See https://kwgoodman.github.io/bottleneck-doc/reference.html#bottleneck.move_argmax

Or did I get this totally wrong?

@max-sixty
Copy link
Collaborator

Ah! OK. Thanks for finding that. I do find that very surprising!

We should really strive to have consistent numerical output regardless of the xarray settings & dependencies. So if we can align the cases with & without bottleneck, that would be better.

We could do result = window - 1 - result to the result from the bottleneck function? Or just not use bottleneck for those two methods? WDYT?

@JulianGiles
Copy link
Author

I guess just doing result = window - 1 - result should do the trick

@max-sixty
Copy link
Collaborator

Yup. Contributions welcome...

@JulianGiles
Copy link
Author

I looked into the code but I am completely lost on how xarray works internally 😢 . I see no direct call to bn.move_argmax or something similar.

@kmuehlbauer
Copy link
Contributor

@JulianGiles I think I've found the right location. You might check #8552. This should fix the issue for bottleneck.

@JulianGiles
Copy link
Author

Thanks a lot @kmuehlbauer ! Looks good to me (with my limited knowledge) 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants