Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polyfit with weights alters the DataArray in place #5644

Closed
scottstanie opened this issue Jul 28, 2021 · 6 comments · Fixed by #7900
Closed

polyfit with weights alters the DataArray in place #5644

scottstanie opened this issue Jul 28, 2021 · 6 comments · Fixed by #7900
Labels

Comments

@scottstanie
Copy link
Contributor

scottstanie commented Jul 28, 2021

What happened:

After running da.polyfit on a DataArray with weights, the data has been overwritten

What you expected to happen:

I didn't see this documented anywhere, but I did not expect that creating a polyfit dataset would clobber the original data that I'm fitting to. The data isn't altered in the case of unweighted fitting, only weighted.

Minimal Complete Verifiable Example:

In [2]: import xarray as xr; import numpy as np
In [3]: nz, ny, nx = (10, 20, 30)
In [4]: da = xr.DataArray(np.random.rand(nz, ny ,nz), dims=['z','y','x'])
In [6]: da.mean(), da.max()
Out[6]:
(<xarray.DataArray ()>
 array(0.4963857),
 <xarray.DataArray ()>
 array(0.99996494))

In [7]: pf = da.polyfit("z", deg=2) # This will not alter the data
In [9]: da.mean(), da.max()
Out[9]:
(<xarray.DataArray ()>
 array(0.4963857),
 <xarray.DataArray ()>
 array(0.99996494))

# Non-zero `w` argument alters the data
In [11]: pf = da.polyfit("z", deg=2, w=np.arange(nz))

In [12]: da.mean(), da.max()
Out[12]:
(<xarray.DataArray ()>
 array(2.24317611),
 <xarray.DataArray ()>
 array(8.95963569))

Anything else we need to know?:

I assume it's happening here

lhs *= w[:, np.newaxis]

My question is whether this is supposed to be the case to avoid copies? Or if it's accidental?

Environment:

Output of xr.show_versions()

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56)
[Clang 10.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.19.0
pandas: 1.1.2
numpy: 1.20.2
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 3.2.1
Nio: None
zarr: 2.6.1
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.1
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.14.0
distributed: 2.20.0
matplotlib: 3.3.0
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20200814
pip: 21.1.2
conda: 4.8.4
pytest: 6.2.4
IPython: 7.18.1
sphinx: 3.5.1

@scottstanie
Copy link
Contributor Author

As a temporary workout for my case, I'm just going to do

In [3]: pf = (da.copy(True)).polyfit("z", deg=2, w=np.arange(nz))
In [5]: da.max(), da.mean()
Out[5]:
(<xarray.DataArray ()>
 array(0.99878237),
 <xarray.DataArray ()>
 array(0.50869358))

The thing I don't understand is that _to_temp_dataset seems to be trying to do a deep copy (based on the argument names)

def _to_temp_dataset(self) -> Dataset:
return self._to_dataset_whole(name=_THIS_ARRAY, shallow_copy=False)

But it is acting like a shallow copy:

In [6]: pf = (da.copy(False)).polyfit("z", deg=2, w=np.arange(nz))
In [7]: da.max(), da.mean()
Out[7]:
(<xarray.DataArray ()>
 array(8.92217147),
 <xarray.DataArray ()>
 array(2.29014397))

@dcherian
Copy link
Contributor

yikes thanks for the nice report!

@malmans2
Copy link
Contributor

Hi!
I was about to open a new issue about this, but looks like it's a known issue and there's a stale PR...
Let me know if I can help to get this fixed!

@headtr1ck
Copy link
Collaborator

You can always reach out to the creator of the original PR and comment in the PR.

But it looks like this particular PR was reaching a dead end and should be completely rewritten.
But anyway the reviewers left helpful remarks on how to proceed.

@dcherian
Copy link
Contributor

Do you know where the in-place modification is happening? We could just copy there and fix this particular issue.

@malmans2
Copy link
Contributor

Do you know where the in-place modification is happening? We could just copy there and fix this particular issue.

Not sure, but I'll take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants