Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass arbitrary options to sel() #7099

Open
benbovy opened this issue Sep 28, 2022 · 4 comments
Open

Pass arbitrary options to sel() #7099

benbovy opened this issue Sep 28, 2022 · 4 comments

Comments

@benbovy
Copy link
Member

benbovy commented Sep 28, 2022

Is your feature request related to a problem?

Currently .sel() accepts two options method and tolerance. These are relevant for default (pandas) indexes but not necessarily for other, custom indexes.

It would be also useful for custom indexes to expose their own selection options, e.g.,

  • index query optimization like the dualtree flag of sklearn.neighbors.KDTree.query
  • k-nearest neighbors selection with the creation of a new "k" dimension (+ coordinate / index) with user-defined name and size.

From #3223, it would be nice if we could also pass distinct options values per index.

What would be a good API for that?

Describe the solution you'd like

Some ideas:

A. Allow passing a tuple (labels, options_dict) as indexer value

ds.sel(x=([0, 2], {"method": "nearest"}), y=3)

B. Expose an options kwarg that would accept a nested dict

ds.sel(x=[0, 2], y=3, options={"x": {"method": "nearest"}})

Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great.

Any other ideas? Some sort of context manager? Some Index specific API?

Describe alternatives you've considered

The API proposed in #3223 would look great if method and tolerance were the only accepted options, but less so for arbitrary options.

Additional context

No response

@benbovy
Copy link
Member Author

benbovy commented Sep 28, 2022

Another difficulty regarding multi-coordinate indexes: ideally options should be set per index, not per coordinate.

@benbovy
Copy link
Member Author

benbovy commented Sep 28, 2022

Or we could simply decide that .sel() should not accept arbitrary options and handle special cases, e.g., via accessors.

It would actually make sense to have something like .my_accessor.sel_k_neighbors(). Not so great to have a separate method just for an optimization option, though.

@keewis
Copy link
Collaborator

keewis commented Sep 28, 2022

another option would be to allow passing a custom object, like

class Indexer:
    def __init__(self, indexer, **options):
        ...

ds.sel(x=Indexer([0, 2], method="nearest"))

I think we wanted to have something like that, anyways, to be able to specify other behaviors of a slice, like right-exclusive?

@benbovy
Copy link
Member Author

benbovy commented Sep 28, 2022

Or use Indexer objects to group labels + options? This is slightly different than what you suggest:

class Dataset:

    def sel(
        self,
        indexers: Mapping[Any, Any] | Indexer | Iterable[Indexer],
        **indexers_kwargs: Any,
    ):
        ...


class Indexer:
    def __init__(self, labels=None, options=None, **label_kwargs):
        ...

Let's assume a Dataset with lat / lon coordinates both sharing the same geographic index + another time dimension coordinate, then we could write:

indexers = [
    Indexer(lon=[2, 15], lat=[45, 48], options={"foo": "bar"}),
    Indexer(time="2022-01-01"),
]

ds.sel(indexers)

This could also be used to avoid code duplication when using common selection options for different indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants