Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment: allow flexible index coordinate order #8111

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Aug 24, 2023

This PR relaxes some of the rules used in alignment for finding the indexes to compare or join together. Those indexes must still be of the same type and must relate to the same set of coordinates (and dimensions), but the order of coordinates is now ignored.

It is up to the index to implement the equal / join logic if it needs to care about that order.

Regarding pandas.MultiIndex, it seems that the level names are ignored when comparing indexes:

midx = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("one", "two")))
midx2 = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("two", "one"))

midx.equals(midx2)  # True

However, in Xarray the names of the multi-index levels (and their order) matter since each level has its own xarray coordinate. In this PR, PandasMultiIndex.equals() and PandasMultiIndex.join() thus check that the level names match.

@benbovy
Copy link
Member Author

benbovy commented Aug 24, 2023

Considering

midx = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("one", "two"))
midx_coords = xr.Coordinates.from_pandas_multiindex(midx, "x")
ds = xr.Dataset(coords=midx_coords)
# <xarray.Dataset>
# Dimensions:  (x: 4)
# Coordinates:
#   * x        (x) object MultiIndex
#   * one      (x) object 'a' 'a' 'b' 'b'
#   * two      (x) int64 0 1 0 1
# Data variables:
#     *empty*

midx2 = pd.MultiIndex.from_product([["a", "b"], [0, 1]], names=("two", "one"))
midx2_coords = xr.Coordinates.from_pandas_multiindex(midx2, "x")
ds2 = xr.Dataset(coords=midx2_coords)
# <xarray.Dataset>
# Dimensions:  (x: 4)
# Coordinates:
#   * x        (x) object MultiIndex
#   * two      (x) object 'a' 'a' 'b' 'b'
#   * one      (x) int64 0 1 0 1
# Data variables:
#     *empty*

Still with this PR, we have this:

ds.equals(ds2)   # True

Which doesn't make sense. This is because Dataset.equals() doesn't make any call to Index.equals() and still relies here on pd.MultiIndex.equals(). We'll need to change that.

@benbovy
Copy link
Member Author

benbovy commented Aug 25, 2023

Hmm pd.MultiIndex.equals ignoring the level names seems to be a bug in pandas actually: pandas-dev/pandas#32569.

Hashable is too generic for a sort key.
@benbovy benbovy requested a review from shoyer August 25, 2023 14:46
@dcherian
Copy link
Contributor

It is up to the index to implement the equal / join logic if it needs to care about that order.

I think this is the right decision.

ping @shoyer for input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Custom indexes and coordinate (re)ordering
2 participants