Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment fails when multiple indexes are set along one common dimension #7695

Open
4 tasks done
dmkriete opened this issue Mar 29, 2023 · 6 comments
Open
4 tasks done

Comments

@dmkriete
Copy link

What happened?

I am trying to use xarray's smart broadcasting to do computation with two DataArrays, one of which has two indexes set for one of its dimensions. When taking the difference between these two DataArrays, I receive an error saying there are conflicting indexes.

What did you expect to happen?

I expected the computation to return a DataArray with all the original dimensions, coordinates, and indexes

Minimal Complete Verifiable Example

import numpy as np
import xarray as xr

ds = xr.Dataset(data_vars={"signal": xr.DataArray(np.zeros((3, 2)), dims=("time", "position"), coords={"time": [12, 13, 14]})})
ds = ds.assign_coords(time_relative=(ds.time - 12))
ds = ds.set_xindex("time_relative")

da = xr.DataArray(np.zeros(2), dims="position")

ds.signal - da  # Error occurs here

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'time' (2 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:14:58) [MSC v.1929 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 0, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: ('de_DE', 'cp1252')
libhdf5: 1.12.2
libnetcdf: 4.9.1

xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.0
netCDF4: 1.6.2
pydap: None
h5netcdf: None
h5py: 3.8.0
Nio: None
zarr: None
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.7.0
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.1
pip: 23.0.1
conda: None
pytest: None
mypy: None
IPython: 8.11.0
sphinx: None

@dmkriete dmkriete added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 29, 2023
@dcherian dcherian added topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Mar 29, 2023
@benbovy
Copy link
Member

benbovy commented Aug 22, 2023

Thanks for the report @dmkriete and sorry for the wait.

This one would be tricky to solve, I'm afraid. We currently don't support the alignment of xarray objects when any of them has more than one index along common dimension(s), like in your example the "time" and "time_relative" indexed coordinates in ds.signal that are both along the "time" dimension:

xr.align(ds.signal, da)   # fails with the same error

The reason is that we would need to either combine somehow the results from all those indexes or just pick one of them in order to reindex the coordinates and variables along the common dimension(s). And it wouldn't be easy to choose a default behavior that would fit a possibly wide range of use cases. The same issue happens with data selection, actually (see #8028 (comment)).

The part of the error message that is relevant here is they may be used to reindex data along common dimensions. That said, it seems that in general this error more confusing than helpful for users. It would be nice to find a way to improve that error message, but it won't be easy because conflicting indexes may happen in a lot of different contexts.

Although probably not the most convenient, you could still drop one of indexes before using the dataset in an operation that performs automatic alignment:

ds.drop_indexes("time_relative").signal - da    # should work

@benbovy benbovy changed the title Broadcasting by dimension name fails when multiple indexes are set to one dimension Alignment fails when multiple indexes are set along one common dimension Aug 22, 2023
@dcherian
Copy link
Contributor

Here's a really simple example:

import numpy as np
import xarray as xr

a = xr.DataArray(
    np.ones(5), dims="x", coords={"x": np.arange(5), "y": ("x", np.arange(5))}
).set_xindex("y")
b = xr.DataArray(np.ones(5), dims="z", coords={"z": np.arange(5)})
xr.align(a, b)

It requires no reindexing since they are already aligned, so it seems like this could be made to work. I think we'd want join="exact" to work for this case.

@benbovy
Copy link
Member

benbovy commented Nov 2, 2023

Yeah we should try aligning those objects and raise later during reindexing if conflicting indexers are found along common dimensions, as suggested in #8236 (comment). That won't be the case in your example @dcherian so alignment will work (with any method for join I guess).

@keewis
Copy link
Collaborator

keewis commented Dec 26, 2023

this also affects the creation of datasets using the Coordinates object (encountered when working on the PintIndex):

In [1]: from xarray import Dataset, Variable, Coordinates
   ...: from xarray.core.indexes import PandasIndex
   ...: 
   ...: u = Variable("x", [0, 1, 2])
   ...: x = Variable("x", [-1, 0, 1])
   ...: coords = Coordinates(
   ...:     coords={"u": u, "x": x},
   ...:     indexes={
   ...:         "u": PandasIndex.from_variables({"u": u}, options={}),
   ...:         "x": PandasIndex.from_variables({"x": x}, options={}),
   ...:     },
   ...: )
   ...: Dataset({"a": ("x", [3, 4, 5])}, coords=coords)
<traceback>
ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'x' (2 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions

In this case I'd argue that the coordinates are already guaranteed to be internally consistent, so it might be fine to simply align the data variables with the dimension coordinates (I might be missing something, though).

@benbovy
Copy link
Member

benbovy commented Dec 26, 2023

@keewis it might not be that simple as I think there are still some edge cases where a data variable can be promoted as coordinate(s) (e.g., pandas multi-index). Otherwise, there should be no re-indexing in that case either so it should be supported too if we look for potential conflicts (along common dimensions) when re-indexing the objects instead of when extracting all the indexes from the objects.

@keewis
Copy link
Collaborator

keewis commented Jan 2, 2024

this probably won't cover all the possible edge cases (I didn't care about MultiIndex, for example), but replacing xarray.core.dataset.merge_data_and_coords with the following has the desired effect:

def merge_data_and_coords(data_vars: DataVars, coords) -> _MergeResult:
    """Used in Dataset.__init__."""
    if isinstance(coords, Coordinates):
        coords = coords.copy()
        # full alignment in case there are data variables that should be promoted to dimension coordinates
        if any(name == as_variable(var).dims for name, var in data_vars.items()):
            indexes = coords.xindexes
        else:
            # align using just the dimension coordinates
            xindexes = coords.xindexes
            indexes_ = {name: index for name, index in coords.xindexes.items() if name in coords.dims}
            variables_ = {name: xindexes.variables[name] for name in indexes_}
            indexes = type(xindexes)(indexes_, variables_)
    else:
        coords = create_coords_with_default_indexes(coords, data_vars)
        indexes = coords.xindexes

    # exclude coords from alignment (all variables in a Coordinates object should
    # already be aligned together) and use coordinates' indexes to align data_vars
    return merge_core(
        [data_vars, coords],
        compat="broadcast_equals",
        join="outer",
        explicit_coords=tuple(coords),
        indexes=indexes,
        priority_arg=1,
        skip_align_args=[1],
    )

(and I'm sure there's a cleaner implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants