-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as_numpy
changes MultiIndex
#8001
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
note that for computing That said, this seems to be a bug in the implementation of |
Thanks for the report @cgahr.
The docstrings of >>> da.as_numpy()._coords["z"]._data # not a numpy array!
PandasIndexingAdapter(array=Int64Index([0, 1, 2, 3, 4], dtype='int64'), dtype=dtype('int64')) If we want to actually coerce all the coordinates into numpy arrays, then we should remove their index (if any). When the same underlying data is shared between the coordinates and their index (as it is the case for Xarray's Alternatively, we could fix the multi-index issue reported here and clarify in the documentation that Or we could add an option for choosing between these two behaviors. |
I wanted to convert the DataArrays underlying data type to numpy. And as far as I understood, the proper way to do this is using
For me, it was actually not clear at all that |
I mostly agree, but I'm not sure considering that:
The behavior that would make the most sense to me in terms of clarity and predictability would be to either convert all the coordinates or not touch them at all. Practically, however, perhaps best is to convert all "pure" array data (i.e., data variables + unindexed coordinates but not the indexed coordinates). |
What happened?
I have a DataArray with a MultiIndex. In some cases, I use dask, so I call
.as_numpy()
to compute the array and store it in memory. I would expect that this call does NOT change the MultiIndex.This is the case for
.persist()
, however, it's not the case for.as_numpy()
.In the following MWE the original coordinates are:
After
.persist()
we get the same result. After.as_numpy()
we getwhich is not the same and can lead to issues down the line.
What did you expect to happen?
No response
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
xarray: 2023.6.0
pandas: 1.5.3
numpy: 1.25.0
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.2.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.4.1
distributed: 2023.4.1
matplotlib: 3.7.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.4.0
mypy: 1.4.1
IPython: 8.13.2
sphinx: None
The text was updated successfully, but these errors were encountered: