Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make _pick_to_idx public #11913

Open
larsoner opened this issue Aug 22, 2023 · 6 comments
Open

ENH: Make _pick_to_idx public #11913

larsoner opened this issue Aug 22, 2023 · 6 comments

Comments

@larsoner
Copy link
Member

@drammock other than mne.io.constants.FIFF (which is pretty painless to keep) this is the list of what we have to keep around to make sibling packages happy. Do you think we should make some variant of _picks_to_idx public?

Originally posted by @larsoner in #11903 (comment)

We'll have to think about the right API and we might not want to expose all options but it should be doable

@mscheltienne
Copy link
Member

I would like to rework the channel selection/pick API with the following goal in mind:

  1. Reduce code duplication
  2. Reduce the number of functions/methods in the public API
  3. Harmonize the function names/arguments
  4. Propose a public API for _pick_to_idx
  5. Propose an API with input validation and a faster version skipping the validation

After going through mne._fiff.pick, I propose:

  • Keep the functions returning a channel type as is, get_channel_type_constants, channel_type, channel_indices_by_type. IMO, the naming schema is not consistent but those functions are not the priority.
  • Transform functions which operates on an object and return the same object into methods.
    • pick_channels_cov: move to mne.Covariance.pick
    • pick_channels_forward and pick_types_forward moved to mne.Forward.pick
    • pick_info moved to mne.Info.pick: this one is debatable as we need to prevent picking on an Info object attached to another object, e.g. Raw. IMO, we have 2 options:
      • (1) Keep pick_info as is and don't add a channel selection method to an Info object
      • (2) Remove pick_info, add mne.Info._pick (private) which modifies the instance in place and mne.Info.pick (public) which always returns a copy, prevent assignment to object.info, e.g. prevent raw.info = "foo" (which might already be the case?)
  • Merge _picks_str_to_idx, _picks_to_idx, pick_channels, pick_channels_regexp, pick_types into:
    • Functions selecting on a list-like of channel names
    • Functions selecting on an Info
    • Private functions with additional with_ref_meg: bool and return_kind: bool argument, if needed.
def pick_ch_names_to_idx(
    ch_names: list[str] | tuple[str] | set[str] | NDArray[str],
    picks: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
    exclude: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
) -> NDArray[np.int32]:
    pass
def pick_info_to_idx(  # or pick_to_idx?
    info: Info,
    picks: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
    exclude: list[str | int], tuple[str, int], set[str | int], NDArray[str], NDArray[int], str, int,
) -> NDArray[np.int32]:
    pass

Both picks or exclude support the same types, with str being either:

  • a channel name
  • a channel type
  • 'all' or 'data'
  • a regex

I'm not sure if allow_empty: bool should be part of the public API. IMO, this argument is linked to a second one validation: bool which controls the presence or absence of validation of the input, to e.g. speed-up the selection if you know your inputs are valid and/or if you don't 'care'. IMO, 2 options:

  • (1): include 2 arguments, allow_empty: bool = False and validation: bool = True, to the public API. Thus, you can now chose to skip input-validation and/or have some control on it.
  • (2) Keep the signature of the public API as above (pros: very simple) and add private functions _pick_ch_names_to_idx and _pick_info_to_idx which skip input validation. allow_empty: bool = False should likely be kept in those private functions.

Simultaneously, I would also deprecate for good the legacy pick_channels methods to tend towards a simpler and intuitive API. I believe those changes cover the majority of use-cases both in mne API and in external code. Those changes can be implemented piece by piece:

  • the current mne._fiff.pick module can become private e.g. mne._fiff._pick. Exisitng imports are fixed to keep using the existing function.
  • the new mne._fiff.pick module is implemented and MNE code-base can be gradually updated, one module at a time, to transition from mne._fiff._pick to mne._fiff.pick

The removal of the existing public pick methods in mne._fiff.pick is backward incompatible. The impact can be mitigated by either (1) a long deprecation cycle or (2) keeping those methods as legacy for now (especially as the documentation of those methods is at the mne-level, e.g. mne.pick_info, mne.pick_types).


I believe I have the time to work on this change before 1.7 is out, but let's take the time to think about the API first ;)

@larsoner
Copy link
Member Author

larsoner commented Jan 8, 2024

Without getting into too much detail, I would add to the priority list above:

  1. @deprecateding as few things as possible in favor of @legacying them

In theory any new functionality like cov.pick should just extend existing functionality, so hopefully that function becomes a one-line wrapper like return cov.copy().pick(...), in which case keeping pick_channels_cov in the namespace carries minimal maintenance burden. We could even probably remove it from the API page to help with your point (2) above. I like the idea of modernizing the interfaces but leaving the (decade-?)old functions around for backward compat when the cost is minimal like this.

Beyond that I would suggest breaking this up as much as possible, otherwise the diff and review will be unmanageable. Hardest starting point (but maybe most satisfying?) might be to try to simplify the internal code to be more DRY. Simplest might be with something like implementing a cov.pick. A sort of in between would probably be to add the two new public functions you propose that directly address the _pick_to_idx-to-public issue in the title. But maybe you have some other plan...

There are a lot of details to work out with the above proposal, so I suggest @drammock and others who are interested first comment as well on the general idea, then @mscheltienne you propose what a first PR could include, then once we converge on that you can open it. That way we don't have to discuss everything all at once but can make incremental progress with a shared vision.

For me I'm overall +1 on these ideas subject to my suggestion to prefer @legacy over @deprecated.

@drammock
Copy link
Member

drammock commented Jan 8, 2024

I'll take a deeper look at the proposal soon, but meanwhile I'm adding a crossref to #11531 and especially #11531 (comment)

@mscheltienne
Copy link
Member

With the proposed approach, we can keep most, if nit all functions with the legacy decorator.

I agree we should break things apart. I propose to get a first PR with some minor changes and the structure of the new functions, i. e. function names, arguments and type-hints for the arguments and returns. We can iterate on the structure before coding the logic.

@drammock
Copy link
Member

drammock commented Jan 8, 2024

regarding pick_info

  • pick_info moved to mne.Info.pick: this one is debatable as we need to prevent picking on an Info object attached to another object, e.g. Raw.

another possibility that occurs to me here is that raw.pick() and raw.info.pick() could do the same thing (i.e. both could modify raw inplace). On its face that seems like a good API to me, but it opens up the possibility of

my_info = raw.info
# ...many lines later:
my_info.pick()  # modifies `raw` (possibly unwittingly)

So I think we should avoid that. And obviously we can't modify an attached info without modifying the instance it's attached to, so a public raw.info.pick() must return a copy, which is at odds with how .pick() works on all the instances; so I feel like it's an unintuitive API in that sense. To my mind that leaves only two possibilities: your option (1):

Keep pick_info as is and don't add a channel selection method to an Info object

or an option where info becomes aware of whether it's attached to an instance, and if so, refuses to execute its .pick() method. That feels a bit overengineered and possibly fragile, so I'm voting for your option(1).

regarding functions selecting on a list-like of channel names

You'll see in #11531 (comment) that I advocated for @legacying pick_channels and pick_channels_regexp, but that idea was not popular with @agramfort and @jasmainak at least. Now, you're at least proposing to keep the functionality in a new, more flexible function, so maybe the idea will be more popular, but I worry a little bit about a single function being "too magic" in deciding when to interpret its input as a regexp and when to treat it like a plain channel name. Perhaps we take inspiration from Pandas here? https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html#pandas.DataFrame.filter has items, like, and regex params which are mutually exclusive.

regarding pick and exclude

Maybe I'm misunderstanding but you seem to suggest that it will be possible to pass exclude="all" or exclude="data"; to me this seems pointless to support. Are there use cases you're imagining/encountering here?

@mscheltienne
Copy link
Member

mscheltienne commented Jan 9, 2024

For pick_info, I agree, let's keep the existing function as it would be counter-intuitive to have a method not modify the instance in-place.


For pick_channels and pick_channels_regexp, you wrote:

hey should also be marked @legacy or even deprecated/removed

I believe @jasmainak and @agramfort were not advocating against @legacy, but against deprecation. I would prefer to mark them as @legacy as well along the rest of the functions mentioned. IMO, @legacy functions should also be removed from the documentation build.

I worry a little bit about a single function being "too magic"

I usually agree and prefer to split functionalities, but here it doesn't seem that difficult. If a string is provided, is it a channel name? is it a channel type? is it a valid regex? We could even drop this last point by supporting re.Pattern directly for regex.


For pick and exclude, they should accept the same arguments.. beside those 2 non-sense cases and beside exclude=None since the default is () in other part of the API 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants