-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alignment fails when multiple indexes are set along one common dimension #7695
Comments
Thanks for the report @dmkriete and sorry for the wait. This one would be tricky to solve, I'm afraid. We currently don't support the alignment of xarray objects when any of them has more than one index along common dimension(s), like in your example the "time" and "time_relative" indexed coordinates in xr.align(ds.signal, da) # fails with the same error The reason is that we would need to either combine somehow the results from all those indexes or just pick one of them in order to reindex the coordinates and variables along the common dimension(s). And it wouldn't be easy to choose a default behavior that would fit a possibly wide range of use cases. The same issue happens with data selection, actually (see #8028 (comment)). The part of the error message that is relevant here is Although probably not the most convenient, you could still drop one of indexes before using the dataset in an operation that performs automatic alignment: ds.drop_indexes("time_relative").signal - da # should work |
Here's a really simple example: import numpy as np
import xarray as xr
a = xr.DataArray(
np.ones(5), dims="x", coords={"x": np.arange(5), "y": ("x", np.arange(5))}
).set_xindex("y")
b = xr.DataArray(np.ones(5), dims="z", coords={"z": np.arange(5)})
xr.align(a, b) It requires no reindexing since they are already aligned, so it seems like this could be made to work. I think we'd want |
Yeah we should try aligning those objects and raise later during reindexing if conflicting indexers are found along common dimensions, as suggested in #8236 (comment). That won't be the case in your example @dcherian so alignment will work (with any method for |
this also affects the creation of datasets using the In [1]: from xarray import Dataset, Variable, Coordinates
...: from xarray.core.indexes import PandasIndex
...:
...: u = Variable("x", [0, 1, 2])
...: x = Variable("x", [-1, 0, 1])
...: coords = Coordinates(
...: coords={"u": u, "x": x},
...: indexes={
...: "u": PandasIndex.from_variables({"u": u}, options={}),
...: "x": PandasIndex.from_variables({"x": x}, options={}),
...: },
...: )
...: Dataset({"a": ("x", [3, 4, 5])}, coords=coords)
<traceback>
ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'x' (2 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions In this case I'd argue that the coordinates are already guaranteed to be internally consistent, so it might be fine to simply align the data variables with the dimension coordinates (I might be missing something, though). |
@keewis it might not be that simple as I think there are still some edge cases where a data variable can be promoted as coordinate(s) (e.g., pandas multi-index). Otherwise, there should be no re-indexing in that case either so it should be supported too if we look for potential conflicts (along common dimensions) when re-indexing the objects instead of when extracting all the indexes from the objects. |
this probably won't cover all the possible edge cases (I didn't care about def merge_data_and_coords(data_vars: DataVars, coords) -> _MergeResult:
"""Used in Dataset.__init__."""
if isinstance(coords, Coordinates):
coords = coords.copy()
# full alignment in case there are data variables that should be promoted to dimension coordinates
if any(name == as_variable(var).dims for name, var in data_vars.items()):
indexes = coords.xindexes
else:
# align using just the dimension coordinates
xindexes = coords.xindexes
indexes_ = {name: index for name, index in coords.xindexes.items() if name in coords.dims}
variables_ = {name: xindexes.variables[name] for name in indexes_}
indexes = type(xindexes)(indexes_, variables_)
else:
coords = create_coords_with_default_indexes(coords, data_vars)
indexes = coords.xindexes
# exclude coords from alignment (all variables in a Coordinates object should
# already be aligned together) and use coordinates' indexes to align data_vars
return merge_core(
[data_vars, coords],
compat="broadcast_equals",
join="outer",
explicit_coords=tuple(coords),
indexes=indexes,
priority_arg=1,
skip_align_args=[1],
) (and I'm sure there's a cleaner implementation) |
What happened?
I am trying to use xarray's smart broadcasting to do computation with two DataArrays, one of which has two indexes set for one of its dimensions. When taking the difference between these two DataArrays, I receive an error saying there are conflicting indexes.
What did you expect to happen?
I expected the computation to return a DataArray with all the original dimensions, coordinates, and indexes
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:14:58) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('de_DE', 'cp1252')
libhdf5: 1.12.2
libnetcdf: 4.9.1
xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.11.0
sphinx: None
The text was updated successfully, but these errors were encountered: