Skip to content

Rolling mean on dask array does not preserve dtype #7062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
slevang opened this issue Sep 21, 2022 · 2 comments · Fixed by #7063
Closed
4 tasks done

Rolling mean on dask array does not preserve dtype #7062

slevang opened this issue Sep 21, 2022 · 2 comments · Fixed by #7063
Labels
bug needs triage Issue that has not been reviewed by xarray team member

Comments

@slevang
Copy link
Contributor

slevang commented Sep 21, 2022

What happened?

Calling rolling().mean() on a dask-backed array sometimes outputs a different dtype than with a numpy backed array, for example with a float32 input. This is due to the optimized _mean function introduced in #4915.

What did you expect to happen?

This is a simple enough operation that if you start with float32 I would expect to get float32 back.

Minimal Complete Verifiable Example

>>> import xarray as xr
>>> da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32')
>>> da.rolling(x=3, min_periods=1).mean().dtype
dtype('float32')
>>> da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype
dtype('float64')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

#5877 is somewhat related.

Environment

INSTALLED VERSIONS ------------------ commit: e679185 python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 2022.6.1.dev63+ge6791852.d20220921
pandas: 1.4.2
numpy: 1.21.6
scipy: 1.8.1
netCDF4: 1.5.8
pydap: installed
h5netcdf: 1.0.2
h5py: 3.6.0
Nio: None
zarr: 2.12.0
cftime: 1.6.0
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.2.10
cfgrib: 0.9.10.1
iris: 3.2.1
bottleneck: 1.3.4
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 62.0.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None

@slevang slevang added bug needs triage Issue that has not been reviewed by xarray team member labels Sep 21, 2022
@dcherian
Copy link
Contributor

Is this also true with xr.set_options(use_bottleneck=False)?

@slevang
Copy link
Contributor Author

slevang commented Sep 21, 2022

Without bottleneck, I guess both options from the MVCE above get routed through _mean:

>>> xr.set_options(use_bottleneck=False)
>>> da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32')
>>> da.rolling(x=3, min_periods=1).mean().dtype
dtype('float64')
>>> da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype
dtype('float64')

And after #7063:

>>> xr.set_options(use_bottleneck=False)
>>> da = xr.DataArray([1,2,3], coords={'x':[1,2,3]}).astype('float32')
>>> da.rolling(x=3, min_periods=1).mean().dtype
dtype('float32')
>>> da.chunk({'x':1}).rolling(x=3, min_periods=1).mean().dtype
dtype('float32')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug needs triage Issue that has not been reviewed by xarray team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants