Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest failure with numpy 1.20 #4858

Closed
mathause opened this issue Feb 3, 2021 · 4 comments · Fixed by #4865
Closed

doctest failure with numpy 1.20 #4858

mathause opened this issue Feb 3, 2021 · 4 comments · Fixed by #4865

Comments

@mathause
Copy link
Collaborator

mathause commented Feb 3, 2021

What happened:

Our doctests fail since numpy 1.20 came out:

https://github.com/pydata/xarray/pull/4760/checks?check_run_id=1818512841#step:8:69

What you expected to happen:

They don't ;-)

Minimal Complete Verifiable Example:

The following fails with numpy 1.20 while it converted np.NaN to an integer before (xarray.DataArray.pad at the bottom)

import numpy as np

x = np.arange(10)
x = np.pad(x, 1, "constant", constant_values=np.nan)

requires numpy 1.20

Anything else we need to know?:

@mathause
Copy link
Collaborator Author

mathause commented Feb 4, 2021

@mark-boer @dcherian @max-sixty

  1. Should we be more clever about incompatible dtypes? (numpy is not)
    a. By raising an error?
    b. By casting arr?
    c. Add an argument for the behavior on incompatible dtypes?
  2. If not - is the numpy error explicit enough?
  3. Should we just remove the example or should we replace it with something like arr.pad(x=1, constant_values=1.5) and mention that 1.5 is cast to 1

Currently there are 3 different behaviors, depending on constant_values

import xarray as xr
import numpy as np
arr = xr.DataArray([5, 6, 7], coords=[("x", [0, 1, 2])])

arr.pad(x=1, constant_values=np.nan)
# ValueError: cannot convert float NaN to integer

arr.pad(x=1, constant_values=None)
# casts arr to float

arr.pad(x=1, constant_values=1.5)
# casts constant_values to int

>>> da.pad(x=1, constant_values=np.nan)
<xarray.DataArray (x: 4, y: 4)>
array([[-9223372036854775808, -9223372036854775808, -9223372036854775808,
-9223372036854775808],
[ 0, 1, 2,
3],
[ 10, 11, 12,
13],
[-9223372036854775808, -9223372036854775808, -9223372036854775808,
-9223372036854775808]])

@mark-boer
Copy link
Contributor

mark-boer commented Feb 4, 2021

Hi, we had a similar discussion in de #3596, xarray makes a distinction between np.nan and xarray.dtypes.NaN. The current behaviour is consistent with that of other xarray functions such as shift. Though, I am personally not a big fan of this distinction.

Check e.g. this comment: #3596 (comment)

The example I posted in this comment:

>>> da = xr.DataArray(np.arange(9).reshape(3,3), dims=("x", "y"))
>>> da.shift(x=1, fill_value=np.nan)
array([[-9223372036854775808, -9223372036854775808, -9223372036854775808],
       [                   0,                    1,                    2],
       [                   3,                    4,                    5]])
Dimensions without coordinates: x, y

>>> da.rolling(x=3).construct("new_axis", stride=3, fill_value=np.nan)
<xarray.DataArray (x: 1, y: 3, new_axis: 3)>
array([[[-9223372036854775808, -9223372036854775808, 0],
        [-9223372036854775808, -9223372036854775808, 1],
        [-9223372036854775808, -9223372036854775808, 2]]])
Dimensions without coordinates: x, y, new_axis

Hmm, so numpy changed its behaviour? Then this example, should probably also fail in numpy 1.20.

On a side note: I am not a big fan of the example in the doctest, it displays an edge case, which is not unique to pad.

I think the nicest solution would be to make the usage xarray.dtypes.NaN and np.nan equivalent. But this would require changes in all xarray functions that take some kind of fill_value.

@mathause
Copy link
Collaborator Author

mathause commented Feb 4, 2021

Thanks for the other examples. Yes these now also raise an error with numpy 1.20. What still does not raise is the following

arr[0, 0] = np.nan

(because this gets converted to da.variable._data[0:1, 0:1] = np.array([np.nan]) (approximately).


My suggestion is:

  1. replace the example with arr.pad(x=1, constant_values=1.23456789) and mention that the float is cast to int (or would you leave the example away?)
  2. open a new issue to discuss the issue of assigning float to int

@mark-boer
Copy link
Contributor

mark-boer commented Feb 4, 2021

My suggestion is:

  1. replace the example with arr.pad(x=1, constant_values=1.23456789) and mention that the float is cast to int (or would you leave the example away?)
  2. open a new issue to discuss the issue of assigning float to int

I agree, I think that would be a good solutions for now, I think replacing the example is fine. Maybe we could even open a new issue, to discuss how xarray functions handle np.nan.

Is dask.array.pad gonna handle casting the same way? It would be strange if the cast to float happens, depending on the underlying array type. But that discussion should probably happen in the newly opened issue ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants