-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iris.cube.Cube.lazy_data
method results in wrong chunk array type for masked arrays
#5800
Comments
This bug will affect the functions This issue was introduced in #4135 to speed up loading NetCDF files. |
It looks like arrays loaded from NetCDF files are always masked arrays since netCDF4 1.4 (released almost 6 years ago): |
We're keen to get this fixed, will be discussed during refinement of the two remaining Iris releases this year |
See notes on #5801 One thing that still bothers me, though, is what From experiment, Dask gives a stacked "mixture" of unmasked and masked arrays a meta of "masked" type. # A stack formed of normal+masked arrays is masked-type
>>> lazy_nomask = da.from_array([1., 2, 3, 4], meta=np.ndarray)
>>> lazy_masked = da.from_array(
... np.ma.masked_array([1., 2, 3, 4], [0, 0, 1, 1]),
... meta=np.ma.MaskedArray((), dtype=float)
... )
>>> combined = da.stack([lazy_nomask, lazy_masked])
>>> print(' ', combined)
dask.array<stack, shape=(2, 4), dtype=float64, chunksize=(1, 4), chunktype=numpy.MaskedArray>
>>>
# Sections are of different type depending on the source array they derive from
>>> sec1 = combined[0, 1:3]
>>> sec2 = combined[1, 1:3]
>>> print(' section1 [0, 1:3] =', sec1)
section1 [0, 1:3] = dask.array<getitem, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.MaskedArray>
>>> print(' section2 [1, 1:3] =', sec2)
section2 [1, 1:3] = dask.array<getitem, shape=(2,), dtype=float64, chunksize=(2,), chunktype=numpy.MaskedArray>
>>> print(' section1 compute=', repr(sec1.compute()), ' type=', type(sec1.compute()))
section1 compute= array([2., 3.]) type= <class 'numpy.ndarray'>
>>> print(' section2 compute=', repr(sec2.compute()), ' type=', type(sec2.compute()))
section2 compute= masked_array(data=[2.0, --],
mask=[False, True],
fill_value=1e+20) type= <class 'numpy.ma.core.MaskedArray'>
# "Mixed" sections are unified as masked-type
>>> print('[:, 2:3] compute :\n', repr(combined[:, 2:3].compute()))
[:, 2:3] compute :
masked_array(
data=[[3.0],
[--]],
mask=[[False],
[ True]],
fill_value=1e+20)
# An unmasked portion of masked origin is still masked-type
>>> print('[1, :2] compute :\n', repr(combined[1, :2].compute()))
[1, :2] compute :
masked_array(data=[1.0, 2.0],
mask=[False, False],
fill_value=1e+20) So, this is a potential problem, showing that the 'meta' of a lazy array cannot quite be "trusted" in terms of what it will return. @bouweandela what is your view on this ? : I assume is it still valuable to do our best to preserve a correct |
Yes, I think so. If the inconsistency is limited to |
@SciTools/peloton We are happy to move forward with this, understanding that if it breaks Iris in other places we can roll it back. This brings us more inline with dask, which is a good thing. |
🐛 Bug Report
How To Reproduce
Steps to reproduce the behaviour:
Run the following code
Note that chunktype is
numpy.ndarray
, while the data type is actually a masked array. This causes problems when inspecting the content of the Dask array withdask.array.utils.meta_from_array
, because that will return the wrong chunk type shown above.Expected behaviour
Environment
The text was updated successfully, but these errors were encountered: