Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nansum vs nanmean for all-nan vectors #2889

Closed
mathause opened this issue Apr 11, 2019 · 3 comments
Closed

nansum vs nanmean for all-nan vectors #2889

mathause opened this issue Apr 11, 2019 · 3 comments

Comments

@mathause
Copy link
Collaborator

import xarray as xr
import numpy as np

ds = xr.DataArray([np.NaN, np.NaN])

ds.mean()
ds.sum()

Problem description

ds.mean() returns NaN, ds.sum() returns 0. This comes from numpy (cp np.nanmean vs. np.nansum), so it might have to be discussed upstream, but I wanted to ask the xarray community first on their opinion. This is also relevant for #422 (what happens if the all weights are NaN or sum up to 0).

Expected Output

I would expect both to return np.nan.

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.4.176-96-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.1
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: 1.5.0.1
pydap: None
h5netcdf: 0.7.1
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: 1.2.0
PseudonetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.1.5
distributed: 1.26.1
matplotlib: 3.0.3
cartopy: 0.17.0
seaborn: 0.9.0
setuptools: 41.0.0
pip: 19.0.3
conda: None
pytest: 4.4.0
IPython: 7.4.0
sphinx: 2.0.1

@dcherian
Copy link
Contributor

This is correct though isn't it?

Mean = nansum(not nan values)/count(not nan values) = 0/0

@fujiisoup
Copy link
Member

fujiisoup commented Apr 11, 2019

Thanks @mathause
I also think the current behavior is not perfect but the best.

I would expect both to return np.nan

I expect that np.nansum(ds) is equivalent to np.sum(not nan values) and thus should be 0, while np.mean should be NaN as @dcherian pointed out.

To me, the future average function would also return np.nan for all nan slices.

@mathause
Copy link
Collaborator Author

Thanks for the feedback. I think I see the reasoning behind it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants