doctest failure with numpy 1.20 #4858

mathause · 2021-02-03T08:57:43Z

What happened:

Our doctests fail since numpy 1.20 came out:

https://github.com/pydata/xarray/pull/4760/checks?check_run_id=1818512841#step:8:69

What you expected to happen:

They don't ;-)

Minimal Complete Verifiable Example:

The following fails with numpy 1.20 while it converted np.NaN to an integer before (xarray.DataArray.pad at the bottom)

import numpy as np

x = np.arange(10)
x = np.pad(x, 1, "constant", constant_values=np.nan)

requires numpy 1.20

Anything else we need to know?:

that's probably related to https://numpy.org/doc/stable/release/1.20.0-notes.html#numpy-scalars-are-cast-when-assigned-to-arrays
I asked if this behavior will stay: BUG: numpy.pad pad constant is cast to input array dtype numpy/numpy#16499 (comment)
One possibility is to add a check np.can_cast(constant_values.dtype, array.dtype) (or similar) for a better error message.

The text was updated successfully, but these errors were encountered:

mathause · 2021-02-04T10:48:27Z

@mark-boer @dcherian @max-sixty

Should we be more clever about incompatible dtypes? (numpy is not)
a. By raising an error?
b. By casting arr?
c. Add an argument for the behavior on incompatible dtypes?
If not - is the numpy error explicit enough?
Should we just remove the example or should we replace it with something like arr.pad(x=1, constant_values=1.5) and mention that 1.5 is cast to 1

Currently there are 3 different behaviors, depending on constant_values

import xarray as xr
import numpy as np
arr = xr.DataArray([5, 6, 7], coords=[("x", [0, 1, 2])])

arr.pad(x=1, constant_values=np.nan)
# ValueError: cannot convert float NaN to integer

arr.pad(x=1, constant_values=None)
# casts arr to float

arr.pad(x=1, constant_values=1.5)
# casts constant_values to int

xarray/xarray/core/dataarray.py

Lines 3883 to 3892 in 5735e16

    
                   >>> da.pad(x=1, constant_values=np.nan) 
        
                   <xarray.DataArray (x: 4, y: 4)> 
        
                   array([[-9223372036854775808, -9223372036854775808, -9223372036854775808, 
        
                           -9223372036854775808], 
        
                          [                   0,                    1,                    2, 
        
                                              3], 
        
                          [                  10,                   11,                   12, 
        
                                             13], 
        
                          [-9223372036854775808, -9223372036854775808, -9223372036854775808, 
        
                           -9223372036854775808]])

mark-boer · 2021-02-04T11:33:32Z

Hi, we had a similar discussion in de #3596, xarray makes a distinction between np.nan and xarray.dtypes.NaN. The current behaviour is consistent with that of other xarray functions such as shift. Though, I am personally not a big fan of this distinction.

Check e.g. this comment: #3596 (comment)

The example I posted in this comment:

>>> da = xr.DataArray(np.arange(9).reshape(3,3), dims=("x", "y"))
>>> da.shift(x=1, fill_value=np.nan)
array([[-9223372036854775808, -9223372036854775808, -9223372036854775808],
       [                   0,                    1,                    2],
       [                   3,                    4,                    5]])
Dimensions without coordinates: x, y

>>> da.rolling(x=3).construct("new_axis", stride=3, fill_value=np.nan)
<xarray.DataArray (x: 1, y: 3, new_axis: 3)>
array([[[-9223372036854775808, -9223372036854775808, 0],
        [-9223372036854775808, -9223372036854775808, 1],
        [-9223372036854775808, -9223372036854775808, 2]]])
Dimensions without coordinates: x, y, new_axis

Hmm, so numpy changed its behaviour? Then this example, should probably also fail in numpy 1.20.

On a side note: I am not a big fan of the example in the doctest, it displays an edge case, which is not unique to pad.

I think the nicest solution would be to make the usage xarray.dtypes.NaN and np.nan equivalent. But this would require changes in all xarray functions that take some kind of fill_value.

mathause · 2021-02-04T13:23:59Z

Thanks for the other examples. Yes these now also raise an error with numpy 1.20. What still does not raise is the following

arr[0, 0] = np.nan

(because this gets converted to da.variable._data[0:1, 0:1] = np.array([np.nan]) (approximately).

My suggestion is:

replace the example with arr.pad(x=1, constant_values=1.23456789) and mention that the float is cast to int (or would you leave the example away?)
open a new issue to discuss the issue of assigning float to int

mark-boer · 2021-02-04T22:33:45Z

My suggestion is:

replace the example with arr.pad(x=1, constant_values=1.23456789) and mention that the float is cast to int (or would you leave the example away?)

open a new issue to discuss the issue of assigning float to int

I agree, I think that would be a good solutions for now, I think replacing the example is fine. Maybe we could even open a new issue, to discuss how xarray functions handle np.nan.

Is dask.array.pad gonna handle casting the same way? It would be strange if the cast to float happens, depending on the underlying array type. But that discussion should probably happen in the newly opened issue ;-)

mathause added topic-documentation maintenance labels Feb 3, 2021

mathause mentioned this issue Feb 5, 2021

fix da.pad example for numpy 1.20 #4865

Merged

2 tasks

dcherian closed this as completed in #4865 Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doctest failure with numpy 1.20 #4858

doctest failure with numpy 1.20 #4858

mathause commented Feb 3, 2021 •

edited

Loading

mathause commented Feb 4, 2021

mark-boer commented Feb 4, 2021 •

edited

Loading

mathause commented Feb 4, 2021

mark-boer commented Feb 4, 2021 •

edited

Loading

doctest failure with numpy 1.20 #4858

doctest failure with numpy 1.20 #4858

Comments

mathause commented Feb 3, 2021 • edited Loading

mathause commented Feb 4, 2021

mark-boer commented Feb 4, 2021 • edited Loading

mathause commented Feb 4, 2021

mark-boer commented Feb 4, 2021 • edited Loading

mathause commented Feb 3, 2021 •

edited

Loading

mark-boer commented Feb 4, 2021 •

edited

Loading

mark-boer commented Feb 4, 2021 •

edited

Loading