-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: indexing with multi-dimensional integer arrays #669
Comments
What's an "indexer"? Each element in the tuple? Or the entire tuple? |
Do you suggest a dedicated function for this? Or is the |
Clarified -- each element in the tuple should be an integer or integer array. |
I think we should standardize on a subset of vectorized indexing which is common to both
|
Sorry, I was wondering if any of the existing array libs chose to use |
Was there ever an analysis done of what integer array indexing the different array libraries support? I don't think it would show up in the existing API comparison data because that data only looks at function definitions. |
No, not in explicit detail. |
Based on the history of NEP 21 (https://numpy.org/neps/nep-0021-advanced-indexing.html) and the following related discussions
I wonder if we could standardize functional APIs which cater to the different indexing "flavors" instead of mandating a special form of advanced square bracket indexing. While bracket syntax is convenient in end-user code (e.g., in scripting and the REPL), functional APIs could be more readily leveraged in library code where typing extra characters is, IMO, less of an ergonomic concern. Functional APIs for retrieving elements would also avoid the mutability discussion (ref: #177). There's precedent for such functional APIs in other languages (e.g., Julia), and one can argue that functional APIs would make intent explicit and avoid ambiguity across array libraries which should be allowed to, e.g., support "orthogonal indexing" (as in MATLAB) or some variant of NumPy's advanced vectorized indexing. If we standardized even a minimal version of NumPy's vectorized indexing semantics via bracket syntax, given conflicting semantics, this might preclude otherwise compliant array libraries from choosing to support MATLAB/Julia style indexing semantics. Instead, I could imagine something like def coordinate_index(x: array, *index_args: Union[int, Sequence[int, ...], array]) -> array where I think it is also worth including def orthogonal_index(x: array, *index_args: Union[int, Sequence[int, ...], array]) -> array where The biggest omission in the above is the absence of def coordinate_index(x: array, *index_args: Union[int, Sequence[int, ...], array], axes: Optional[Sequence[int]] = None) -> array def orthogonal_index(x: array, *index_args: Union[int, Sequence[int, ...], array], axes: Optional[Sequence[int]] = None) -> array When Optionally, instead of variadic interfaces, one could do something like def coordinate_index(x: array, index_args: List[Union[int, Sequence[int, ...]], array], /, *, axes: Optional[Sequence[int]] = None) -> array def orthogonal_index(x: array, index_args: List[Union[int, Sequence[int, ...]], array], /, *, axes: Optional[Sequence[int]] = None) -> array where While the >>> y = xp.orthogonal_index(x, [0], [0,1], [1,1], [2], axes=(1,3,4,6))
>>> z = y[::-1,...,xp.newaxis,::-2,:] Another possible future extension is the support of integer index arrays having more than one dimension. As in Julia, the effect could be creation of new dimensions (i.e., the rank of the output array would be the sum of the ranks of the index arguments minus any reduced dimensions). In short, my sense is that standardizing indexing semantics in functional APIs gives us a bit more flexibility in terms of delineating behavior and more readily incrementally evolving standardized behavior and avoids some of the roadblocks encountered with the adoption of NEP 21 and discussions around backward compatibility. AddendumThere are also various |
Sorry, maybe this is answered elsewhere but, why not making As a property: x.coordinate_index[[1, 3, 4], :, [7, 1, 2], ...] As a function: xp.coordinate_index(x)[[1, 3, 4], :, [7, 1, 2], ...] |
@vnmabus That is essentially NEP 21. Not opposed, but also that proposal was written in 2015 and still draft. There the naming conventions were In general, in the standard, we've tried to ensure a minimal array object and moved most logic to functions. |
I will note I liked |
Just a brief ntoe: NEP 21 never was implemented in NumPy, but I still think the minimal option of adding support for all integer arrays like NumPy in
Even libraries that don't currently support array-based indexing (like TensorFlow) could add this pretty easily as long as they have an underlying primitive for coordinate based indexing. There are lots of variations of vectorized/orthogonal indexing, but ultimately if zip/coordinate based indexing is available, that's enough to express pretty much all desired operations -- everything else is just sugar. |
The only comment I have is that I do not like the idea of having such functions in the main namespace if we think that Now, I don't mind having the functions. I just think NumPy should have one obvious solution and that should probably be EDIT: To be clear, happy to be convinced that this is useful to NumPy on its own, beyond just adding another way to do the same thing. |
What do you mean by "the subclass problem"? I think the functional suggestion was done for primarily two reasons:
My personal view is that it's fine to make In fact, my general impression has been that standardizing |
We discussed this topic in a call today, and concluded that given the number of incoming links and demand for it, it's time to move ahead with allowing indexing with a combination of integers and integer arrays. It seems like there is enough demand and consensus that this is useful, and it looks like there are no serious problems as long as it's integer-only (combining with slices, ellipses, and boolean arrays is still not supported). A few more outcomes and follow-up actions:
|
Some other details from the meeting:
We can always implement the minimal semantics now and expand what else is supported later. For instance, potentially in the future we could allow ellipses, slices, and newaxes to be mixed with integer arrays as long as they are all either at the start or end of the index (the confusing case in NumPy is when integer arrays are on either side of slices, which I think everyone agrees we should never try to standardize). Slices in particular might be important to support as that allows inserting empty slices in the index to select over a particular axis (like @shoyer pointed out that a downside to implementing only a subset of NumPy behavior is that it isn't obvious to users what indexing semantics are and aren't part of the standard, but 1) this is already the case (for instance, the standard only allows a single boolean array index, and the standard leaves out-of-bounds integer and slices unspecified), and 2) users can check this using array-api-strict, which errors on unspecified indexing behavior. |
Also it's probably worth pointing out: if any behavior isn't implemented in a library, we can't easily work around it in the compat library (the best we could do would be to add a helper function). So we should take that into consideration if anything actually isn't implemented somewhere. |
👍 This is important for a bunch of my use cases (e.g. |
Another question to consider for >>> a = np.array([0])
>>> b = np.array([1], dtype=np.int32)
>>> a[np.array([0])] = b >>> a = torch.tensor([0])
>>> b = torch.tensor([1], dtype=torch.int32)
>>> a[torch.tensor([0])] = b
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: Index put requires the source and destination dtypes match, got Long for the destination and Int for the source. (curiously, PyTorch does allow this sort of thing in other indexing contexts. For instance, |
NumPy allows assignment of an array with any dtype, which is clearly broken:
This results in an array with garbage values, and a warning: If we allow casting, it should definitely be restricted to "safe" casting (e.g., assigning a strictly smaller dtype). The conservative choice would only be to allow matching dtypes or compatible Python scalar types. |
Is anything in particular blocking this from moving forward? This is probably the highest priority array API feature for Xarray users. |
@shoyer No known blockers. This should make it in v2024 and be included, ideally, sometime in January (after the holidays). |
I'd like to revisit the discussion from #177 about adding support for indexing with arrays of indices, because this is by far the largest missing gap in functionality required for Xarray.
My specific proposal would be to standardize indexing with a tuple of length equal to the number of array dimensions, where each element in the indexing tuple is either (1) an integer or (2) an integer array. This avoids all the messy edge cases in NumPy related to mixing slices & arrays.
The last time we talked about it, integer indexing via
__getitem__
somehow got held up on discussions of mutability with__setitem__
. Mutability would be worth resolving at some point, for sure, but IMO is a rather different topics.The text was updated successfully, but these errors were encountered: