-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add MultiPandasIndex helper class #7182
base: main
Are you sure you want to change the base?
Conversation
|
||
return type(self)(new_indexes) | ||
|
||
def copy(self: T_MultiPandasIndex, deep: bool = True) -> T_MultiPandasIndex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated following #7140.
I wonder if it is possible to create a generic ds.set_xindex(
["a", "b"],
MultiIndex([("a", PandasIndex), ("b", PandasIndex), (["a", "b"], BallTreeIndex)),
) (but I'm sure we can find a better syntax... maybe create a hashable sequence that is not a tuple, since that is already taken? and change the list of tuples to a dict). For that concept to work, the return value of What do you think? |
|
||
seen = set() | ||
dup_dims = [] | ||
for d in dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since dims
is a dict, shouldn't the keys be unique? I don't think you will have repetitions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, sure 🙂. But we should probably check for duplicate dimensions so a dict is not what we want for dims
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just do
dim_names = [idx.dim for idk in indexes]
if len(dim_names) != len(set(dim_names)): raise...
|
||
__slots__ = ("indexes", "dims") | ||
|
||
def __init__(self, indexes: Mapping[Hashable, PandasIndex]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always better to use Any for Mapping inputs, due to covariance.
Use Mapping[Any, PandasIndex]
def _get_unmatched_names( | ||
self: T_MultiPandasIndex, other: T_MultiPandasIndex | ||
) -> set: | ||
return set(self.indexes).symmetric_difference(other.indexes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always find the operator notation easier to read: set(self.indexes) ^ set(other.indexes)
|
||
def _get_unmatched_names( | ||
self: T_MultiPandasIndex, other: T_MultiPandasIndex | ||
) -> set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use set[Hashable]
|
||
return type(self)(new_indexes) | ||
|
||
def copy(self: T_MultiPandasIndex, deep: bool = True) -> T_MultiPandasIndex: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well implement the correct deep-copy behavior which passes the memo dict.
else: | ||
return type(self)(new_indexes) | ||
|
||
def sel(self, labels: dict[Any, Any], **kwargs) -> IndexSelResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Mapping[Any, Any]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method should be called internally in Xarray (always with a dict), but maybe there are some cases where one wants to call it directly so Mapping[Any, Any]
would be better..
Hmm that could be possible but it think there are just too many possible edge cases for something generic like that. In your specific example ds.set_xindex(
["a", "b"],
MultiIndex([("a", PandasIndex), ("b", PandasIndex), (["a", "b"], BallTreeIndex)),
) we could probably use the BallTreeIndex for point-wise indexing (i.e., with I guess your suggestion is a way around the constraint in the Xarray data model that a coordinate cannot have multiple indexes? I'm afraid there's no easy solution that is generic enough. Maybe some cache to avoid rebuilding the indexes? I.e., |
whats-new.rst
api.rst
This PR adds a
xarray.indexes.MultiPandasIndex
helper class for building custom, meta-indexes that encapsulate multiplePandasIndex
instances. UnlikePandasMultiIndex
, the meta-index classes inheriting from this helper class may encapsulate loosely coupled (pandas) indexes, with coordinates of arbitrary dimensions (each coordinate must be 1-dimensional but an Xarray index may be created from coordinates with differing dimensions).Early prototype in this notebook
TODO / TO FIX:
__init__
options in subclasses be passed to all thetype(self)(new_indexes)
calls inside theMultiPandasIndex
"base" class? This could be done via**kwargs
passed through... However, mypy will certainly complain (Liskov Substitution Principle).MultiPandasIndex
a good name for this helper class?