-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explicit indexes: next steps #6293
Comments
Following thoughts and discussions in various issues (e.g., #6836), I'd like to suggest another section to the ones in the top comment: Deprecate
|
Yes yes -- the sooner we can get rid of MultiIndex special cases the better! |
Any progress on this? I 'd love to see #2233 get resolved. |
#5692 is
not merged yetnow mergedbutand we canalreadystart thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.Continue the refactoring of the internals
Although in #5692 everything seems to work with the current pandas index wrappers for dimension coordinates, not all of Xarray's internals have been refactored yet to fully support (or at least be compatible with) custom indexes. Here is a list of
Dataset
/DataArray
methods that still need to be checked / updated (this list may be incomplete):as_numpy
(as_numpy
changes MultiIndex #8001)broadcast
(Bug in broadcasting with multi-indexes #6430, refactor broadcast for flexible indexes #6481 )drop_sel
(DataArray.drop_isel / .drop_sel with duplicated initial time stamp - InvalidIndexError #6605, drop_sel returns KeyError for abbreviated dates #7699)drop_isel
drop_dims
drop_duplicates
('drop_duplicates' behaves differently when using 1 vs many coordinates for an index #8499)transpose
interpolate_na
ffill
bfill
reduce
map
apply
quantile
rank
integrate
cumulative_integrate
filter_by_attrs
idxmin
idxmax
argmin
argmax
concat
(partially refactored, may not fully work with multi-dimension indexes)polyfit
I ended up following a common pattern in #5692 when adding explicit / flexible index support for various features (it is quite generic, though, the actual procedure may vary from one case to another and many steps may be skipped):
Index
base class. There may be several motivations:PandasIndex
orPandasMultiIndex
wrapper classes for clarity and also if eventually we want to make Xarray less dependent on Pandas)Variable
’s corresponding method for speed-up or for other reasons, e.g.,IndexVariable.concat
exists to avoid unnecessary Pandas/Numpy conversions ; in Explicit indexes #5692PandasIndex.concat
has the same logic and will fully replace the former if/once we get rid ofIndexVariable
PandasIndex.roll
reusespandas.Index
indexing andappend
capabilitiesIndex
API closely follows DataArray, Dataset and Variable API (i.e., same method names) for consistencyIndex
API (if it exists) to create new indexesIndexes
class (i.e., the.xindexes
property returns an instance of this class) provides convenient API for iterating through indexes (e.g., get a list of unique indexes, get all coordinates or dimensions for a given index, etc.)Index
API, either raise an error or fallback to calling theVariable
API (below) depending on the caseIndex.create_variables
Index.create_variables
; it is used to propagate variable metadata (dtype
,attrs
andencoding
)Variable
API (if it exists)filter_indexes_from_coords
andassert_no_index_corrupted
_replace
,_replace_with_new_dims
or_overwrite_indexes
methodsRelax all constraints related to “dimension (index) coordinates” in Xarray
Indexes repr
Indexes
section to Dataset and DataArray reprsIndexes
(i.e.,.xindexes
property) consistent with the repr ofCoordinates
(.coords
property)Index._repr_inline_
for tweaking the inline representation of each index shown in the reprs above_repr_inline_
for indexes that define it #7183Public API for assigning and (re)setting indexes
There is no public API yet for creating and/or assigning existing indexes to Dataset and DataArray objects.
indexes
parameter in Dataset and DataArray constructorsdata
,data_vars
orcoords
arguments in favor of a more explicit way to pass it.set_xindex
anddrop_indexes
methodsset_index
andreset_index
? See Improve naming and standardize functionality of .reset_index() and .reset_coords() methods #4366 (comment)We still need to figure out how best we can (1) assign existing indexes (possibly with their coordinates) and (2) pass index build options.
Other public API for index-based operations
To fully leverage the power and flexibility of custom indexes, we might want to update some parts of Xarray’s public API in order to allow passing arbitrary options per index. For example:
sel
: the currentmethod
andtolerance
may not be relevant for all indexes, pass extra arguments to Scipy's cKDTree.query, etc. Pass arbitrary options to sel() #7099align
: tolerance for alignment #2217Also:
Indexes
API as it provides convenient methods that might be useful for end-usersIndex
base class into Xarray’s main namespace (i.e.,xr.Index
)? AlsoPandasIndex
andPandasMultiIndex
? The latter may be useful if we depreciateset_index(append=True)
and/or if we depreciate “unpacking”pandas.MultiIndex
objects to coordinates when given ascoords
in the Dataset / DataArray constructors.Documentation
Indexes
APIIndex
API: Add documentation on custom indexes #6975Index types and helper classes built in Xarray
Index
abstract subclass that would basically dispatch the given arguments to the corresponding, encapsulatedPandasIndex
instances and then merge the resultsPandasMultiIndex
dimension coordinate?3rd party indexes
The text was updated successfully, but these errors were encountered: