-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation on custom indexes #6975
Conversation
Maybe an inefficient "numpy" index with basic lookup (like in #3925 (comment)) would be a good example? |
yes! I used this recently to describe what an index does. I think most people are familiar with the argmin way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I have just minor suggestions
@@ -461,6 +461,21 @@ | |||
CFTimeIndex.values | |||
CFTimeIndex.year | |||
|
|||
Index.from_variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's just make it public?
raise NotImplementedError( | ||
f"{self!r} doesn't support alignment with inner/outer join method" | ||
) | ||
|
||
def reindex_like(self: T_Index, other: T_Index) -> dict[Hashable, Any]: | ||
"""Query the index with another index of the same type. | ||
|
||
Implementation is optional but required in order to support alignment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to be specific about types of alignment here. Do we also need to mention reindexing?
I find the way in which the Index objects are involved in a method call (e.g.
Perhaps these questions are too specific to xarray's internals but I do think there should be some kind of mental model given as to the role the Index objects play. (This could be a white lie, similar to how our page on data structures says that I also think we need multiple simple examples. How about
I find it helps to think of documentation using this 4-part system. This PR should cover "Explanation" pretty well, but we should still aim for other content to better cover "Tutorial", "How-to Guides", and "Reference". "Tutorial" could be like a notebook walking through creating a simple index (e.g. That all said, this is already a great start! |
Thanks @dcherian and @TomNicholas for your feeback! @dcherian I will reply to your inline comments when I'll integrate your suggestions in this PR. @TomNicholas I answer to your comments below.
That's exactly why your feedback is valuable!
I agree this could be detailed more in the Index API docstrings in a consistent way. For some methods like
We should clarify that the aim of Some Index API like
I've tried to explain it in the "Index base class" section and the sections below, but maybe it should be emphasized more?
I guess you mean it is shown through
I agree, although Overall, I think that the whole "Xarray Internals" section could be streamlined beyond a bunch of loosely-coupled document pages.
I agree that we need more examples, but I also think that too much examples may tend to make things more confused. One thing that I like very much in https://fastapi.tiangolo.com/ is how a small example is picked for each tutorial and then is shown by highlighting the relevant code for every subsection. Is it possible to do that with Sphinx / RST? It's hard to show all features through one succinct example, though. Like @dcherian says in #6975 (comment), we could invite people to look into the |
I think this should be one of the first things said. It defines what all the following discussion of Indexes does and does not affect.
Yeah I think you do actually have that one covered, I just included it as another example of a naive question that everyone will have that is worth heading off very explicitly.
I meant like when did these indexes get automatically built? (Presumably on coordinate assignment)
1000% yes we need a page that explains what
Probably, but having a loosely coupled page for each aspect of the internals would be a good initial aim.
That's why I like the "Explanation" vs "How-to" vs "Tutorials" distinction: use minimal code in the "Explanation" section (this PR) but put multiple more complex examples under "How to create a functionally-derived index", "how-to create a lazy index" etc.
No idea, but that does look cool! |
|
||
@classmethod | ||
def from_variables(cls, variables: Mapping[Any, Variable]) -> Index: | ||
def from_variables( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tangential but couldn't this be decorated to make it an abstract (class) method? Then an error would be raised if subclasses don't implement it. i.e. this:
from abc import ABC, abstractmethod
class Index(ABC):
@classmethod
@abstractmethod
def from_variables(cls, vars):
...
def sel(self, indexers):
raise NotImplementedError()
for more information, see https://pre-commit.ci
Just a note that there are still some unmerged suggestions that could easily be incorporated before merge. |
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
for more information, see https://pre-commit.ci
@benbovy do you mind if we merge this and leave the comments to address in a follow-up PR? At SciPy we met some advanced developers from other fields (core devs of napari and astropy) who are very interested in creating functional indexes for their use cases, so it would be nice to merge this. |
Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>
@TomNicholas I don't mind at all! Let's merge this even though it is not perfect. I'll improve it and address the comments made here in a follow-up PR. I'll also add |
There is an example of a functional index in this discussion (last item): #7041 (comment) |
…lazy-array * upstream/main: (153 commits) Add HDF5 Section to read/write docs page (pydata#8012) [pre-commit.ci] pre-commit autoupdate (pydata#8014) Update interpolate_na in dataset.py (pydata#7974) improved docstring of to_netcdf (issue pydata#7127) (pydata#7947) Expose "Coordinates" as part of Xarray's public API (pydata#7368) Core team member guide (pydata#7999) join together duplicate entries in the text `repr` (pydata#7225) Update copyright year in README (pydata#8007) Allow opening datasets with nD dimenson coordinate variables. (pydata#7989) Move whats-new entry [pre-commit.ci] pre-commit autoupdate (pydata#7997) Add documentation on custom indexes (pydata#6975) Use variable name in all exceptions raised in `as_variable` (pydata#7995) Bump pypa/gh-action-pypi-publish from 1.8.7 to 1.8.8 (pydata#7994) New whatsnew section Remove future release notes before this release Update whats-new.rst for new release (pydata#7993) Remove hue_style from plot1d docstring (pydata#7925) Add new what's new section (pydata#7986) Release summary for v2023.07.0 (pydata#7979) ...
This PR documents the API of the
Index
base class and adds a guide for creating custom indexes (reworked from https://hackmd.io/Zxw_zCa7Rbynx_iJu6Y3LA). Hopefully it will help anyone experimenting with this feature.@pydata/xarray your feedback would be very much appreciated! I've been into this for quite some time, so there may be things that seem obvious to me but that you can still find very confusing or non-intuitive. It would then deserve some extra or better explanation.
More specifically, I'm open to any suggestion on how to better illustrate this with clear and succinct examples.
There are other parts of the documentation that still need to be updated regarding the indexes refactor (e.g., "dimension" coordinates,
xindexes
property, set/drop indexes, etc.). But I suggest to do that in separate PRs and focus here on creating custom indexes.